How Federal Agencies Have Used Rigorous Policy Pilots to Learn

Rigorous policy pilots have tremendous potential to improve governance.

As candidates for the 2020 election continue to pitch their big ideas, perhaps the single biggest question in the minds of voters is, “Will it work?” This essay series has explored how rigorous tests and evaluation, widely used in the private sector, can be used to discover what works in law and policy.

Why now? Calls for rigorous evaluation of public policy are hardly new and neither are the challenges. Implementation or contextual factors can make data-driven results difficult to replicate. History is replete with examples of politics or ideology trumping evidence. But even in relatively stable, uncontroversial areas of law, governmental actions are not always taken with an eye toward policy learning, because rigorous policy piloting is perceived to be unfair and possibly illegal, hard, and often not worthwhile.

In a recent paper, I surveyed numerous examples of policy piloting across federal agencies and in case law and reached the opposite conclusion: that rigorous policy piloting is in many cases feasible and worthwhile, and that law does not stand in the way. Before describing some of these examples, it is worth taking stock of some of the reasons motivating the greater uptake of experimental approaches in agencies.

First, the costs of rigorous evaluation are going down. Helpful factors include the advance of open data and administrative data sharing, reductions in the cost of outcome tracking, and the growth of policy lab collaborations.

Second, the broader embrace of agile approaches, used to develop software iteratively, have contributed to a cultural shift in government and a greater, bipartisan willingness to test and learn.

President Barack Obama’s 2013 management agenda called for “experimentation and innovation” to produce a “smarter, more innovative, and more accountable government.” Last year, President Donald Trump’s agenda declared that “smarter use of data and evidence is needed to orient decisions and accountability around service and results.” The landmark Evidence Act, described earlier in this series, will make it easier for agencies to do so.

Against this backdrop, a number of agencies are already using pilots extensively to address open questions in law and policy.

For example, the U.S. Patent and Trademark Office’s pilot program of randomized trademark audits has helped document inaccuracies in the trademark register, support a consensus among stakeholders, and inform strategies to strengthen the integrity of the register. The Office has also used pilots to drive continuous improvement in trademark and patent quality.

The U.S. Department of Labor’s internal collaboration across 12 operational sub-agencies has helped the Labor Department identify agency learning goals and research questions, fund projects to investigate them, and evaluate the results by prioritizing independence, rigor, transparency, relevance, and ethics.

The Office of Evaluation Sciences (OES) within the U.S. General Services Administration has implemented dozens of rigorous, controlled pilots leveraging behavioral science insights to improve program implementation in areas ranging from retirement security to economic opportunity.

Controlled tests like those carried out by the OES can provide strong insights into the impacts of varying agency behavior. But rigorous policy pilots are not limited to randomized control trials.

For example, the Office of the Investor Advocate at the U.S. Securities and Exchange Commission has used a variety of approaches—nationally representative surveys, behavioral choice experiments, and large qualitative studies—to advance its mission of providing evidence to support investors and Commission deliberations.

The U.S. Food and Drug Administration’s (FDA) precertification pilot program for software-based medical devices shows how much can be learned from even a handful of case studies. As digital health products become increasingly ubiquitous, FDA’s pilot aims to test a streamlined regulatory framework for providing timely evaluation, pre-certification, and monitoring of software health products, all while ensuring safety. Although just nine companies are in the pilot, the data and feedback that they provide will be used to improve and refine the regulatory framework, and, later, to inform legislation.

As the FDA pilot shows, when regulating emerging technologies about which there is much speculation, pilots are a good way to generate hard data. For example, the Federal Aviation Administration’s drone pilot program invited “visionary participants” from state, local, and tribal governments to conduct a variety of advanced operational tests under controlled criteria, so that the agency could accelerate its development of policies to advance both innovation and safety.

In another emerging technology area, the Office of the Assistant Secretary for Research and Technology within the U.S. Department of Transportation is funding connected vehicle pilots in three cities to gather data needed for appropriate policy. On the local level, cities across the country are implementing pilot programs to gather data for local policymaking needs, and states are passing laws that encourage pilots.

As the U.S. Court of Appeals for the D.C. Circuit has said, in some situations “a month of experience will be worth a year of hearings.”

These examples show that rigorous policy piloting is possible and worthwhile. Earlier pieces in this series have described ways of making it easier, more routine, and successful. To encourage successful piloting from the outset, proposals include establishing a PolicyPilots.gov hub to connect academics and other potential evaluators with agencies running pilots, pre-registering experiments to reduce false positive results, setting aside resources for evaluation, and applying a learning mindset to rulemaking. During pilots, streamlining institutional review board processes and Paperwork Reduction Act requirements would help experiments operate more efficiently. And finally, randomizations should be carried out with transparency and integrity to enhance government credibility.

All of these proposals share a common thread: when agencies know what works in policy, everyone benefits.