More Data, More Problems

The SEC should promote data standardization to protect investors from information overload.

Federal securities laws are designed to protect investors by ensuring that they get the information they need to make smart investment decisions. These laws have supported U.S. capital markets for nearly a century. Now, however, there is more information available than ever before, but it is a challenge for even sophisticated market participants to sort through all of the data. The U.S. Securities and Exchange Commission (SEC) should help.

The SEC’s Office of the Investor Advocate supports data standardization for corporate reporting because it ultimately makes access to the information reported easier and less costly for the investing public. Data standardization enables straight-through processing—an automated retrieval process conducted purely through electronic transfers with no manual intervention. This automated retrieval, in turn, enables enhanced search capabilities. Although data standardization may seem technical and prosaic, it is vitally important for the future of investor protection.

The following example illustrates how standardized data works. Suppose someone wants to cross-reference one dataset with another to discover information about a company, such as the company’s liability exposure.

To extract data about the company from multiple datasets, one needs to identify the same company within each dataset. This task is challenging because different datasets use different entity identifiers. The most common entity identifier is “company name,” but there can be numerous variations, such as “Inc.” or “Incorporated,” or names of predecessors of companies that have engaged in business combinations. Although U.S. federal agencies sometimes use numerical entity identifiers when referencing companies, these identifiers tend to be used only for internal agency functions and are incompatible with one another.

When Lehman Brothers collapsed in 2008, for example, major financial institutions were slow to ascertain their aggregate exposures because their data systems could not quickly aggregate all of Lehman Brothers’ several thousand legal entities.

Without a uniform and specific identifier, one must rely on mapping tables to compare datasets. Yet mapping tables require significant maintenance and updates that are manual, duplicative, expensive, and error-prone.

This basic problem has long stymied investors’ ability to understand market links and exposures. The U.S. Congress sought to address the problem by creating the Office of Financial Research, an agency within the U.S. Department of Treasury with a mandate to improve financial data. The office was created over a decade ago, yet much more work remains to be done.

The Financial Transparency Act—a bill introduced in the U.S. House of Representatives in 2019—would advance this work.

This legislation would require the nine financial regulatory member-agencies of the U.S. Financial Stability Oversight Council to adopt and apply uniform data standards for the information collected from entities under their jurisdiction. The legislation would make U.S. financial regulators adopt a uniform legal entity identifier, such as the G-20 backed Legal Entity Identifier (LEI).

The Financial Transparency Act would resolve a collective action problem among financial regulators, which have been reluctant to invest in changing systems due to resource constraints, concerns about the imposition of upfront costs and knock-on effects on regulated entities, and uncertainty as to whether the LEI would be useful in practice. Indeed, investors can only realize the LEI’s full benefits if there is widespread adoption of the system.

Even without new legislation, however, Congress already requires federal agencies to implement data standardization practices more broadly.

In 2018, Congress passed the Open, Public, Electronic, and Necessary Government Data Act, which codifies and builds on federal policies and data infrastructure investments supporting information quality, access, protection, and use. This law provides a sweeping, government-wide mandate for all federal agencies to publish government information in a machine-readable language by default. The extent to which the mandate applies to various types of information, however, is subject to debate.

The Office of the Investor Advocate has encouraged the SEC to incorporate machine-readability specifications in regulations and forms when it updates reporting requirements for other substantive reasons. Veterans of the LEI initiative have written about the importance of sustained, top-level support in government and industry—both to break through entrenched private interests and to maintain momentum when implementing data standardization over a period of years. Without such leadership, progress tends to be gradual and piecemeal.

One of President Joseph R. Biden’s first actions upon taking office was to sign an executive order establishing new processes and expectations to improve federal data collection and collaboration for public health initiatives. The order includes a directive to “make data open to the public in human- and machine-readable formats as rapidly as possible,” which would enable a standard digital process for reporting case statistics and other epidemiological information.

We hope that the President’s action will lead to a prioritization of data standardization across the federal government—not only in the context of public health, but also in corporate reporting broadly.

Rick A. Fleming is the Investor Advocate at the U.S. Securities and Exchange Commission.

Alexandra M. Ledbetter is the Senior Corporation Finance Counsel in the U.S. Securities and Exchange Commission’s Office of the Investor Advocate.

This essay is part of an 11-part series, entitled Regulation in the Era of Fintech.