Improving Regulatory Oversight of Medical AI

Scholar urges policymakers to reduce the role of clinicians in overseeing AI use in health care.

Consider a technology that can identify pediatric brain tumors faster than a human.

That future has arrived. Two thirds of health care professionals reported incorporating artificial intelligence (AI) into their work in a recent American Medical Association study. But many clinicians also expressed the need for “increased oversight” of medical AI.

In a recent article, W. Nicholson Price II of the University of Michigan Law School argues that current oversight of medical AI places too much reliance on a “human in the loop”—an approach that requires individual clinicians to review and incorporate each AI recommendation in a safe and effective manner. Price contends that most clinicians lack the expertise or time to evaluate medical AI, and he calls for policymakers to develop regulatory structures that can promote safe and effective medical AI without depending on individual clinicians.

Price argues that medical AI requires oversight because of “basic” design flaws. He points to one study that found an algorithm trained to detect pneumonia from X-rays performed well at the hospital where it was developed, but failed at other hospitals. The algorithm had learned to detect features that reflected the original hospital’s physical environment rather than patients’ symptoms, Price explains.

As Price notes, bias compounds these design flaws. Fewer than five percent of AI systems approved by the Food and Drug Administration (FDA) between 2012 and 2020 disclosed racial demographics in their datasets, raising the risk that unrepresentative training data could worsen outcomes for minority racial groups, including those with less frequent participation in data collection.

According to Price, current regulatory frameworks address these design flaws and biases through two layers of governance—central and local.

At the central level, federal agencies and medical organizations develop nationwide frameworks and rules to monitor and test medical AI, Price explains. For instance, FDA requires many AI systems to meet safety and efficacy standards before they can enter the market. But current FDA guidance emphasizes that a product’s AI recommendations should equip clinicians with “patient-specific information” so that clinicians can “review the basis for each recommendation and apply their own judgment when making final decisions,” leaving significant governance responsibility to individual clinicians.

In local governance, oversight shifts to hospitals that test AI systems within their own clinical environments. Price notes that local governance, however, demands substantial expertise, infrastructure, and staff resources—requirements that many smaller or underfunded hospitals cannot meet without turning to their clinicians because they lack support staff.

Price emphasizes that both central and local governance of medical AI converge at the point of care, where clinicians remain the last line of defense against algorithmic mistakes.

He explains that clinicians most often play a “corrective role”—checking errors, adjusting for “situation-specific” factors, and identifying bias. But scholars have found that clinicians often fail to identify flawed or biased models, even when provided with explanatory tools to make issues visible.

Price attributes clinicians’ difficulty in overseeing medical AI to gaps in knowledge. Many clinicians lack familiarity with general AI principles or systematic bias. Although this problem will improve over time as medical schools add AI lessons to their curriculums, Price argues that the tendency to trust AI recommendations and to ignore contradictory information made without automation—known as automation bias—will continue to cloud clinicians’ ability to evaluate the accuracy of individual AI recommendations in high-pressure situations.

According to Price, workloads further constrain individual clinicians’ capacity for oversight. He warns that the “rushed” nature of the modern health care system, which is most prevalent in less funded settings, leaves little room for the type of careful oversight that current governance frameworks hope to achieve.

Price first offers a short-term solution: When clinicians remain in the loop, regulators and hospitals should define clear and limited roles for clinicians, so that oversight of medical AI does not compromise clinicians’ other responsibilities or set them up for failure. Furthermore, he proposes institutional support such as onboarding, training, and ongoing monitoring to help clinicians oversee safe and effective use of AI technology.

But Price also envisions a long-term framework in which medical AI functions without relying on constant clinician oversight. He encourages regulators and medical organizations to evaluate AI systems as independent tools capable of operating in settings with limited medical expertise. Such systems, he argues, reduce dependence on overextended clinicians, can expand access to care, and can “democratize” medical expertise because they will require less funding.

To achieve this transition, Price urges FDA and medical organizations such as the Coalition for Health AI to adopt evaluation methods that test AI performance with minimal oversight by clinicians. He argues that FDA should require developers to evaluate the potential use of AI systems when seeking approval of products. Once medical AI products reach the market, Price contends that experts can conduct “spot-checking” or audit random samples over time to ensure that programs run as intended.

Recognizing the challenges ahead, Price concludes that medical AI will only improve health outcomes if oversight frameworks anticipate clinical limitations rather than assume perfect performance by clinicians.