Examining the Evolving Landscape of Medical AI

I. Glenn Cohen discusses the risks and rewards of using artificial intelligence in health care.

In a discussion with The Regulatory Review, I. Glenn Cohen offers his thoughts on the regulatory landscape of medical artificial intelligence (AI), the evolving ways in which patients may encounter AI in the doctor’s office, and the risks and opportunities of a rapidly evolving technological landscape.

The use of AI in the medical field poses new challenges and tremendous potential for scientific and technological advancement. Cohen highlights how AI is increasingly integrated into health care through tools such as ambient scribing and speaks to some of the ethical concerns around data bias, patient privacy, and gaps in regulatory oversight, especially for underrepresented populations and institutions lacking resources. He surveys several of the emerging approaches to liability for the use of medical AI and weighs the benefits and risks of permitting states to create their own AI regulations in the absence of federal oversight. Despite the challenges facing regulators and clinicians looking for ways to leverage these new technologies, Professor Cohen is optimistic about AI’s potential to expand access to care and improve health care quality.

A leading expert on bioethics and the law, Cohen is the James A. Attwood and Leslie Williams Professor of Law at Harvard Law School. He is an elected member of the National Academy of Medicine. He has addressed the Organisation for Economic Co-operation and Development, members of the U.S. Congress, and the National Assembly of the Republic of Korea on medical AI policy, as well as the North Atlantic Treaty Organization on biotechnology and human advancement. He has provided bioethical advising and consulting to major health care companies.

The Regulatory Review is pleased to share the following interview with I. Glenn Cohen.

The Regulatory Review: In what ways is the average patient today most likely to encounter artificial intelligence (AI) in the health care setting?

Cohen: Part of it will depend on what we mean by “AI.” In a sense, using Google Maps to get to the hospital is the most common use, but that’s probably not what you have in mind. I think one very common use we are already seeing deployed in many hospitals is ambient listening or ambient scribing. I wrote an article on that a few months ago with some colleagues. Inbox management—drafting initial responses to patient queries that physicians are meant to look over—is another way that patients may encounter AI soon. Finally, in terms of more direct usage in clinical care, AI involvement in radiology is one of the more typical use cases. I do want to highlight your use of “encounter,” which is importantly ambiguous between “knowingly” or “unknowingly” encounter. As I noted several years ago, patients may never be told about AI’s involvement in their care. That is even more true today.

TRR: Are some patient populations more likely to encounter or benefit from AI than others?

Cohen: Yes. There are a couple of ethically salient ways to press this point. First, because of contextual bias, those who are closer demographically or in other ways to the training data sets are more likely to benefit from AI. I often note that, as a middle-aged Caucasian man living in Boston, I am well-represented in most training data sets in a way that, say, a Filipino-American woman living in rural Arkansas may not be. There are many other forms of bias, but this form of missing data bias is pretty straightforward as a barrier to receiving the benefits from AI.

Second, we have to follow the money. Absent charitable investment, what gets built depends on what gets paid for. That may mean, to use the locution of my friend and co-author W. Nicholson Price II, that that AI may be directed primarily toward “pushing frontiers”—making excellent clinicians in the United States even better, rather than “democratizing expertise”—taking pretty mediocre physician skills and scaling access to them up via AI to improve access across the world and in parts of the United States without good access to healthcare.

Third, ethically and safely implementing AI requires significant evaluation, which requires expertise and imposes costs. Unless there are good clearinghouses for expertise or other interventions, this evaluation is something that leading academic medical centers can do, but many other kinds of facilities cannot.

TRR: What risks does the use of AI in the medical context pose to patient privacy? How should regulators address such challenges?

Cohen: Privacy definitely can be put at risk by AI. There are a couple of ways that come to mind. One is just the propensity to share information that AI invites. Take, for example, large language models such as ChatGPT. If you are a hospital system getting access for your clinicians, you are going to want to get a sandboxed instance that does not share queries back to OpenAI. Otherwise, there is a concern you may have transmitted protected information in violation of the Health Insurance Portability and Accountability Act (HIPAA), as well as your ethical obligations of confidentiality. But if the hospital system makes it too cumbersome to access the LLM, your clinicians are going to start using their phones to access it, and there goes your HIPAA protections. I do not want to make it sound like this is a problem unique to medical AI. In one of my favorite studies—now a bit dated—someone rode in elevators at a hospital and recorded the number of privacy and other violations.

A different problem posed by AI in general is that it worsens a problem I sometimes call data triangulation: the ability to reidentify users by stitching together our presence in multiple data sets, even if we are not directly identified in some of the sensitive data sets. I have discussed this issue in an article, where I include a good illustrative real-life example involving Netflix.

As for solutions, although I think there is space for improving HIPAA—a topic I have discussed along with the sharing of data with hospitals—I have not written specifically about AI privacy legislation in any great depth.

TRR: What are some emerging best practices for mitigating the negative effects of bias in the development and use of medical AI?

Cohen: I think the key starting point is to be able to identify biases. Missing data bias is a pretty obvious one to spot, though it is often hard to fix if you do not have resources to try to diversify the population represented in your data set. Even if you can diversify, some communities might be understandably wary of sharing information. But there are also many harder-to-spot biases.

For example, measurement or classification bias is where practitioner bias is translated into what is in the data set. What this may look like in practice is that women are less likely to receive lipid-lowering medications and procedures in the hospital compared to men, despite being more likely to present with hypertension and heart failure. Label bias is particularly easy to overlook, and it occurs when the outcome variable is differentially ascertained or has a different meaning across groups. A paper published in Science by Ziad Obermeyer and several coauthors has justifiably become the locus classicus example.

A lot of the problem is in thinking very hard at the front end about design and what is feasible given the data and expertise you have. But that is no substitute for auditing on the back end because even very well-intentioned designs may prove to lead to biased results on the back end. I often recommend a paper by Lama H. Nazer and several coauthors, published in PLOS Digital Health, to folks as a summary of the different kinds of bias.

All that said, I often finish talks by saying, “If you have listened carefully, you have learned that medical AI often makes errors, is bad at explaining how it is reaching its conclusion and is a little bit racist. The same is true of your physician, though. The real question is what combination of the two might improve on those dimensions we care about and how to evaluate it.”

TRR: You have written about the limited scope of the U.S. Food and Drug Administration (FDA) in regulating AI in the medical context. What health-related uses of AI currently fall outside of the FDA’s regulatory authority?

Cohen: Most is the short answer. I would recommend a paper written by my former post-doc and frequent coauthor, Sara Gerke, which does a nice job of walking through it. But the punchline is: if you are expecting medical AI to have been FDA-reviewed, your expectations are almost always going to be disappointed.

TRR: What risks, if any, are associated with the current gaps in FDA oversight of AI?

The FDA framework for drugs is aimed at showing safety and efficacy. With devices, the way that review is graded by device classes means that some devices skirt by because they can show a predicate device—in an AI context, sometimes quite unrelated—or they are classified as devices rather than general wellness products. Then there is the stuff that FDA never sees—most of it. For all these products, there are open questions about safety and efficacy. All that said, some would argue that the FDA premarket approval process is a bad fit for medical AI. These critics may defend FDA’s lack of review by comparing it to areas such as innovation in surgical techniques or medical practices, where FDA largely does not regulate the practice of medicine. Instead, we rely on licensure of physicians and tort law to do a lot of the work, as well as on in-house review processes. My own instinct as to when to be worried—to give a lawyerly answer—is it depends. Among other things, it depends on what non-FDA indicia of quality we have, what is understood by the relevant adopters about how the AI works, what populations it does or does not work for, what is tracked or audited, what the risk level in the worst-case scenario looks like, and who, if anyone, is doing the reviewing.

TRR: You have written in the past about medical liability for harms caused to patients by faulty AI. In the current technological and legal landscape, who should be liable for these injuries?

Cohen: Another lawyerly answer: it’s complicated, and the answer will be different for different kinds of AI. Physicians ultimately are responsible for a medical decision at the end of the day, and there is a school of thought that treats AI as just another tool, such as an MRI machine, and suggests that physicians are responsible even if the AI is faulty.

The reality is that few reported cases have succeeded against physicians for a myriad of reasons detailed in a paper published last year by Michelle M. Mello and Neel Guha. W. Nicholson Price II and I have focused on two other legs of the stool in the paper you asked about: hospital systems and developers. In general, and this may be more understandable given that in tort liability for hospital systems is not all that common, it seems to me that most policy analyses place too little emphasis on the hospital system as a potential locus of responsibility. We suggest “the application of enterprise liability to hospitals—making them broadly liable for negligent injuries occurring within the hospital system—with an important caveat: hospitals must have access to the information needed for adaptation and monitoring. If that information is unavailable, we suggest that liability should shift from hospitals to the developers keeping information secret.”

Elsewhere, I have also mused as to whether this is a good space for traditional tort law at all and whether instead we ought to have something more like the compensation schemes we see for vaccine injuries or workers’ compensation. In those schemes, we would have makers of AI pay into a fund that could pay for injuries without showing fault. Given the cost and complexity of proving negligence and causation in cases involving medical AI, this might be desirable.

TRR: The U.S. Senate rejected adding a provision to the recently passed “megalaw” that would have set a 10-year moratorium on any state enforcing a law or regulation affecting “artificial intelligence models,” “artificial intelligence systems,” or “automated decision systems.” What are some of the pros and cons of permitting states to develop their own AI regulations?

Cohen: This is something I have not written about, so I am shooting from the hip here. Please take it with an even larger grain of salt than what I have said already. The biggest con to state regulation is that it is much harder for an AI maker to develop something subject to differential standards or rules in different states. One can imagine the equivalent of impossibility-preemption type effects: state X says do this, state Y says do the opposite. But even short of that, it will be difficult to design a product to be used nationally if there are substantial variations in the standards of liability.

On the flip side, this is a feature of tort law and choice of law rules for all products, so why should AI be so different? And unlike physical goods that ship in interstate commerce, it is much easier to geolocate and either alter or disable AI running in states with different rules if you want to avoid liability.

On the pro side for state legislation, if you are skeptical that the federal government is going to be able to do anything in this space—or anything you like, at least—due to the usual pathologies of Congress, plus lobbying from AI firms, action by individual states might be attractive. States have innovated in the privacy space. The California Consumer Privacy Act is a good example. For state-based AI regulation, maybe there is a world where states fulfill the Brandeisian ideal of laboratories of experimentation that can be used to develop federal law.

Of course, a lot of this will depend on your prior beliefs about federalism. People often speak about the “Brussels Effect,” relating to the effects of the General Data Protection Regulation on non-European privacy practices. If a state the size of California was to pass legislation with very clear rules that differ from what companies do now, we might see a similar California effect with companies conforming nationwide to these standards. This is particularly true given that much of U.S. AI development is centered in California. One’s views about whether that is good or bad depend not only on the content of those rules but also on the views of what American federalism should look like.

TRR: Overall, what worries you most about the use of AI in the medical context? And what excites you the most?

Cohen: There is a lot that worries me, but the incentives are number one. What gets built is a function of what gets paid for. We may be giving up on some of what has the highest ethical value, the democratization of expertise and improving access, for lack of a business model that supports it. Government may be able to step in to some extent as a funder or for reimbursement, but I am not that optimistic.

Although your questions have led me to the worry side of the house, I am actually pretty excited. Much of what is done in medicine is unanalyzed, or at least not rigorously so. Even the very best clinicians have limited experience, and even if they read the leading journals, go to conferences, and complete other standard means of continuing education for physicians, the amount of information they can synthesize is orders of magnitude smaller than that of AI. AI may also allow scaling of the delivery of some services in a way that can serve underrepresented people in places where providers are scarce.