Predicting Turbidity, Protecting Tap Water

Scholars analyze how the use of machine learning could reshape EPA drinking water standards.

New York City operates the largest unfiltered surface water supply in the United States, serving 10 million people under an Environmental Protection Agency (EPA) Filtration Avoidance Determination (FAD). The determination is a regulatory waiver that allows the city to avoid constructing a filtration plant—saving billions in infrastructure costs—provided it maintains stringent watershed protection standards and demonstrates continuous compliance with federal drinking water regulations.

Recent advances in machine learning (ML) may transform how water utilities meet these compliance obligations. Researchers have demonstrated that ML models can forecast tributary turbidity contributions with actionable accuracy, potentially enabling proactive watershed management rather than reactive responses to water quality events. This capability raises fundamental regulatory questions: Should the EPA incorporate predictive ML models into FAD compliance frameworks? What validation standards would apply? And how might algorithmic source attribution shift evidentiary burdens in enforcement proceedings?

The stakes are significant. Portland lost its FAD after detecting naturally occurring pathogens, forcing the city to pursue costly filtration infrastructure. New York City Council oversight hearings in October 2024 emphasized the critical need for New York City Department of Environmental Protection research investment to maintain the city’s FAD status. Meanwhile, the U.S. Supreme Court’s decision in Loper Bright Enterprises v. Raimondo eliminated Chevron deference, raising questions about judicial review of agency technical determinations—including those involving machine learning models.

The regulatory landscape is also evolving rapidly. OMB issued the first government-wide AI policy in March 2024 requiring minimum practices for safety-impacting AI by December 2024. And in January 2024, researchers published a machine learning model that can predict whether the U.S. Army Corps of Engineers—the federal agency responsible for issuing permits for activities affecting navigable waters—would classify a particular water body as falling under federal jurisdiction. The researchers’ model demonstrates that machine learning can predict Clean Water Act (CWA) jurisdictional determinations—a precedent for integrating ML into water regulation more broadly.

These developments also carry environmental justice implications. Heavily monitored watersheds, such as the Upper Esopus, benefit from extensive sensor networks and research investment, while under-resourced communities lack comparable infrastructure for predicting and preventing water quality threats.

In this week’s Saturday Seminar, scholars examine the regulatory challenges and opportunities posed by integrating machine learning into drinking water regulation.

In an article published in the JAWRA Journal of the American Water Resources Association, John T. Kemper of the University of Vermont and colleagues demonstrate that machine learning models can forecast turbidity in NYC’s drinking water watershed with actionable accuracy. Kemper and his coauthors combine high-frequency sensor data with U.S. National Water Model output to predict tributary turbidity contributions, enabling water managers to anticipate rather than react to water quality events. They find that their models can provide useful 24-hour forecasts, suggesting ML-based monitoring could enhance real-time watershed management and potentially transform how utilities demonstrate FAD compliance.
In a study published in Science, Simon Greenhill of the University of California, Berkeley, and colleagues introduce a machine learning model that predicts Clean Water Act jurisdictional determinations with high accuracy. The Greenhill team trains their model on more than 150,000 historical Army Corps of Engineers decisions to classify whether particular waters fall under federal protection. They argue that WOTUS-ML demonstrates the viability of using machine learning to support regulatory implementation in water law contexts. The Greenhill team estimates that under different regulatory interpretations, between one-fourth and two-thirds of U.S. wetlands and streams fall under CWA protection, providing a framework that could serve as precedent for integrating ML into other areas of environmental regulation.
In an article in the Administrative Law Review, Cary Coglianese of The University of Pennsylvania Carey Law School, and Daniel E. Walters of Texas A&M Law, examine the implications of Loper Bright for administrative governance. Coglianese and Walters argue that the elimination of Chevron deference creates significant uncertainty for agency technical determinations, including those that may involve machine learning models. They suggest that the decision might best be considered “something of a Rorschach test inside a crystal ball,” where different observers may see different implications. Coglianese and Walters contend that courts reviewing ML-based regulatory decisions will need new frameworks for evaluating agency expertise and interpretive authority, with particular implications for science-intensive determinations such as those underlying EPA drinking water standards.
In an article in the Minnesota Law Review, Joshua D. Blank of UC Irvine School of Law, and Leigh Osofsky of UNC School of Law, examine federal agencies’ use of automated tools to communicate legal guidance to the public. Blank and Osofsky argue that although chatbots, virtual assistants, and AI systems offer administrative efficiency, they may mislead the public about how law applies to their circumstances. Based on an Administrative Conference of the United States-commissioned study, Blank and Osofsky propose policy recommendations emphasizing transparency, appropriate reliance, and equity. They contend that these considerations are directly relevant to how EPA might deploy ML-based compliance tools while maintaining public trust in drinking water regulation.
In an article in Competition Policy International: TechReg Chronicle, Cary Coglianese of The University of Pennsylvania Law School addresses the challenges of regulating machine learning systems that serve diverse applications across contexts. Coglianese argues that ML’s heterogeneity—the fact that the same underlying techniques may produce vastly different tools with different risk profiles—complicates traditional regulatory approaches that assume more uniform technologies. He contends that regulators must develop flexible frameworks capable of addressing context-specific concerns rather than applying one-size-fits-all rules. Coglianese suggests that agencies like EPA should build capacity in data sciences, deploy management-based regulatory strategies, and remain vigilant when considering how to validate ML models for varied environmental applications such as watershed monitoring.
In an article in Science of the Total Environment, Seigi Karasaki and Rachel Morello-Frosch of the UC Berkeley School of Public Health, and Duncan Callaway of the UC Berkeley Energy and Resources Group, examine how machine learning modeling decisions for drinking water quality prediction can produce differential outcomes across demographic groups. The Karasaki team proposes practices for reducing algorithmic bias in water quality applications. They find that model design choices—including which variables to include and how to weight training data—can affect which communities receive accurate predictions. The Karasaki team suggests that modeling choice transparency and bias-vetting practices may be important when applying machine learning to questions of environmental science and justice.

The Saturday Seminar is a weekly feature that aims to put into written form the kind of content that would be conveyed in a live seminar involving regulatory experts. Each week, The Regulatory Review publishes a brief overview of a selected regulatory topic and then distills recent research and scholarly writing on that topic.