A New Paradigm for Fueling AI for the Public Good

Data donors can empower the use of artificial intelligence for socially desirable outcomes.

Imagine receiving this email in the near future: “Thank you for sharing data with the American Data Collective on May 22, 2025. After first sharing your workout data with SprintAI, a local startup focused on designing shoes for differently abled athletes, your data donation was also sent to an artificial intelligence research cluster hosted by a regional university. Your donation is on its way to accelerate artificial intelligence innovation and support researchers and innovators addressing pressing public needs!”

That is exactly the sort of message you could expect to receive if we made donations of personal data akin to blood donations—a pro-social behavior that may not immediately serve a donor’s individual needs but may nevertheless benefit the whole of the community. This vision of a future where data flow toward the public good is not science fiction—it is a tangible possibility if we address a critical bottleneck faced by innovators today.

Creating the data equivalent of blood banks may not seem like a pressing need or something that people should voluntarily contribute to, given widespread concerns about a few large artificial intelligence (AI) companies using data for profit-driven and, arguably, socially harmful ends. This narrow conception of the AI ecosystem fails to consider the hundreds of AI research initiatives and startups that have a desperate need for high-quality data. I was fortunate enough to meet leaders of those nascent AI efforts at Meta’s Open Source AI Summit in Austin, Texas. For example, I met with Matt Schwartz, who leads a startup that leans on AI to glean more diagnostic information from colonoscopies. I also connected with Edward Chang, a professor of neurological surgery at the University of California, San Francisco Weill Institute for Neurosciences, who relies on AI tools to discover new information on how and why our brains work. I also got to know Corin Wagen, whose startup is helping companies “find better molecules faster.” This is a small sample of the people leveraging AI for objectively good outcomes. They need your help. More specifically, they need your data.

A tragic irony shapes our current data infrastructure. Most of us share mountains of data with massive and profitable private parties—smartwatch companies, diet apps, game developers, and social media companies. Yet, AI labs, academic researchers, and public interest organizations best positioned to leverage our data for the common good are often those facing the most formidable barriers to acquiring the necessary quantity, quality, and diversity of data. Unlike OpenAI, they are not going to use bots to scrape the internet for data. Unlike Google and Meta, they cannot rely on their own social media platforms and search engines to act as perpetual data generators. And, unlike Anthropic, they lack the funds to license data from media outlets. So, while commercial entities amass vast datasets, frequently as a byproduct of consumer services and proprietary data acquisition strategies, mission-driven AI initiatives dedicated to public problems find themselves in a state of chronic data scarcity. This is not merely a hurdle—it is a systemic bottleneck choking off innovation where society needs it most, delaying or even preventing the development of AI tools that could significantly improve lives.

Individuals are, quite rightly, increasingly hesitant to share their personal information, with concerns about privacy, security, and potential misuse being both rampant and frequently justified by past breaches and opaque practices. Yet, in a striking contradiction, troves of deeply personal data are continuously siphoned by app developers, by tech platforms, and, often opaquely, by an extensive network of data brokers. This practice often occurs with minimal transparency and without informed consent concerning the full lifecycle and downstream uses of that data. This lack of transparency extends to how algorithms trained on this data make decisions that can impact individuals’ lives—from loan applications to job prospects—often without clear avenues for recourse or understanding, potentially perpetuating existing societal biases embedded in historical data.

Consider the case of OMNY Health, a Georgia-based health data platform. It has impressively compiled a dataset of around 85 million de-identified patient records and four billion clinical notes spanning 200 specialties. This rich repository is made available through a secure platform, enabling users to tap into a diverse and detailed array of clinical information. Researchers and health tech firms already rely on this data to train AI models to achieve several beneficial ends, such as improved rates of disease prediction and greater diversity and representation within clinical trials. These valuable advancements demonstrate the power of large-scale data aggregation.

The OMNY Health example also illuminates a critical limitation of the current paradigm: This powerful dataset, although de-identified, remains proprietary. Access is primarily available to paying customers. Even though this model serves some research interests, it inherently restricts the broader universe of nonprofit researchers, public health agencies, and smaller, mission-driven startups that could leverage such data for public initiatives but lack the substantial financial resources required for access. The current model, therefore, primarily benefits private actors and those who can afford to participate in the data marketplace, inadvertently starving many AI applications that could yield profound societal benefits if data were more equitably accessible and shared under frameworks designed for broader societal gain.

To unlock AI’s full potential for widespread societal benefit, we urgently require not only new technological frameworks but also a fundamental shift in our cultural norms surrounding data sharing. We must cultivate an understanding under which the act of sharing specific, consented-upon data for public good initiatives is reframed—from an inherent liability or a mere transaction to a powerful, pro-social contribution, akin to how society largely views and values blood donations. Data donation is about fostering a sense of collective ownership and responsibility for the data that can drive solutions to our shared problems.

Imagine a robust, ethically grounded system of “Data Donors.” Individuals could proactively consent to share specific, clearly defined fields of their anonymized or pseudonymized data—including mobility patterns for urban planning, de-identified genomic sequences for medical research, or even aggregated purchasing habits to understand economic trends for public policy—with rigorously vetted startups, academic institutions, and nonprofit organizations. These entities would be vetted not just for their technical capabilities but also for their ethical frameworks, data security practices, and demonstrable commitment to public good outcomes. The critical differentiator would be the transparent commitment of these recipient entities to developing AI products and ongoing research aimed squarely at addressing public problems. Picture a scenario where a non-profit AI lab, armed with aggregated health metrics from data donors, develops breakthrough diagnostic tools for rare diseases that might otherwise be neglected by commercial interests. Or consider an academic consortium building sophisticated models to predict and mitigate the localized impacts of climate change, fueled by anonymized energy consumption patterns and environmental sensor data voluntarily shared by citizens who are directly invested in the outcomes.

The anonymized data from your smartwatch’s fitness app, for example, could, with your explicit and ongoing permission, flow not just to the commercial developers of the app for product improvement but also to a trusted, independent, and non-profit AI research institute. Such an institute, in turn, could work on creating novel preventative cardiovascular health interventions or understanding population-level activity trends to inform public health campaigns.

How can such a system be fostered and sustained? Beyond appealing to altruism, which is a powerful motivator for many, individuals could receive tangible benefits, such as discounted or fully compensated access to the AI-driven public good tools they help create. More innovative models might even explore granting data donors micro-stakes, data dividends, or other forms of recognition and benefit from the entities that successfully train on their data and deploy impactful solutions. This would acknowledge the value of data donors’ contributions in a more concrete way.

The viability and ethical integrity of any proposed Data Donor system hinges entirely on a bedrock of trust, transparency, and robust, independent governance. This requirement necessitates easily understandable ethical guidelines co-developed with public input, state-of-the-art security protocols that go beyond mere compliance, auditable data handling practices, and dynamic, user-centric mechanisms for ongoing consent management and effortless data withdrawal. Such governance might involve multi-stakeholder oversight bodies composed of ethicists, legal experts specializing in data rights, citizen representatives, and technical specialists. To supplement these governance mechanisms, transparent auditing processes and clear, accessible mechanisms for redress could be in place should data be misused or ethical boundaries be crossed.

The message from innovators on the front lines, including participants at events such as the Open Source AI Summit and researchers in university labs worldwide, is unequivocal: The development of public-oriented AI tools at scale is severely hampered by the lack of access to high-quality, diverse, and representative datasets. The Data Donor model offers a concrete, ethical, and consensual pathway to unlock this vital resource. It recognizes that our collective data, when shared responsibly and with clear public purpose, can transform from a mere commodity passively extracted and exploited into a powerful, renewable engine for societal progress—an engine capable of accelerating discoveries and deploying solutions at a scale previously unimaginable, fostering a more equitable and sustainable future for all. It is time to consciously architect an alternative to a system where the primary value of data is commercial extraction and, instead, build a future where our data directly and democratically fuels solutions to humanity’s most pressing challenges. The exploration and championing of such a necessary evolution in our relationship with data and AI is a critical task for our time, demanding bold vision and collaborative action.