A different take on AI safety: A research agenda from the Columbia Convening on AI openness and safety

On Nov. 19, 2024, Mozilla and Columbia University’s Institute of Global Politics held the Columbia Convening on AI Openness and Safety in San Francisco. The Convening, which is an official event on the road to the AI Action Summit to be held in France in February 2025, took place on the eve of the Convening of the International Network of AI Safety Institutes. In the convening we brought together over 45 experts and practitioners in AI to advance practical approaches to AI safety that embody the values of openness, transparency, community-centeredness and pragmatism. 

Prior to the event on Nov. 19, twelve of these experts formed our working group and collaborated over six weeks on a thorough, 40-page “backgrounder” document that helped frame and focus our-person discussions, and design tracks for participants to engage with throughout the convening. 

The Convening explored the intersection of Open Source AI and Safety, recognizing two key dynamics. First, while the open source AI ecosystem continues to gain unprecedented momentum among practitioners, it seeks more open and interoperable tools to ensure responsible and trustworthy AI deployments. Second, this community is approaching safety systems and tools differently, favoring open source values that are decentralized, pluralistic, culturally and linguistically diverse, and emphasizing transparency and auditability. Our discussions resulted in a concrete, collective and collaborative output: “A Research Agenda for a Different AI Safety,” which is organized around five working tracks.

We’re grateful to the French Government’s AI Action Summit for co-sponsoring our event as a critical milestone on the “Road to the AI Action Summit” in February, and to the French Minister for Artificial Intelligence who joined us to give closing remarks at the end of the day. 

In the coming months, we will publish the proceedings of the conference. In the meantime, a summarized readout of the discussions from the convening are provided below. 

Group photo of attendees at the Columbia Convening on AI Openness and Safety, smiling and waving while wearing blue, red, and white berets, seated and standing in a brightly lit room with large windows.

Readout from Convening:

What’s missing from taxonomies of harm and safety definitions?

Participants grappled with the premise that there is no such thing as a universally ‘aligned’ or ‘safe’ model. We explored the ways that collective input can both support better-functioning AI systems across use cases, help prevent harmful uses of AI systems, and further develop levers of accountability.  Most AI safety challenges involve complex sociotechnical systems where critical information is distributed across stakeholders and key actors often have conflicts of interest, but participants noted that open and participatory approaches can help build trust and advance human agency amidst these interconnected and often exclusionary systems. 

Participants examined limitations in existing taxonomies of harms and explored what notions of safety put forth by governments and big tech companies can fail to capture. AI-related harms are often narrowly defined by companies and developers for practical reasons, who often overlook or de-emphasize broader systemic and societal impacts on the path to product launches. The Convening’s discussions emphasized that safety cannot be adequately addressed without considering domain-specific contexts, use cases, assumptions, and stakeholders. From automated inequality in public benefits systems to algorithmic warfare, discussions highlighted how safety discussions accompanying AI systems’ deployments can become too abstract and fail to center diverse voices and the individuals  and communities who are actually harmed by AI systems. A key takeaway was to continue to ensure AI safety frameworks center human and environmental welfare, rather than predominantly corporate risk reduction. Participants also emphasized that we cannot credibly talk about AI safety without acknowledging the use of AI in warfare and critical systems, especially as there are present day harms playing out in various parts of the world.

Drawing inspiration from other safety-critical fields like bioengineering, healthcare, and public health, and lessons learned from adjacent discipline of Trust and Safety, the workshop proposed targeted approaches to expand AI safety research. Recommendations included developing use-case-specific frameworks to identify relevant hazards, defining stricter accountability standards, and creating clearer mechanisms for harm redressal. 

Safety tooling in open AI stacks

As the ecosystem of open source tools for AI safety continues to grow, developers need better ways to navigate it. Participants mapped current technical interventions and related tooling, and helped identify gaps to be filled for safer systems deployments. We discussed the need for reliable safety tools, especially as post-training models and reinforcement learning continues to evolve. Conversants noted that high deployment costs, lack of safety tooling and methods expertise, and fragmented benchmarks can also hinder safety progress in the open AI space. Resources envisioned included dynamic, standardized evaluations, ensemble evaluations, and readily available open data sets that could help ensure that safety tools and infrastructure remain relevant, useful, and accessible for developers. A shared aspiration emerged: to expand access to AI evaluations while also building trust through transparency and open-source practices.

Regulatory and incentive structures also featured prominently, as participants emphasized the need for clearer guidelines, policies, and cross-sector alignment on safety standards. The conversation noted that startups and larger corporations often approach AI safety differently due to contrasting risk exposures and resourcing realities, yet both groups need effective monitoring tools and ecosystem support. The participants explored how insufficient taxonomical standards, lack of tooling for data collection, and haphazard assessment frameworks for AI systems can hinder progress and proposed collaborative efforts between governments, companies, and non-profits to foster a robust AI safety culture. Collectively, participants envisioned a future where AI safety systems compete on quality as much as AI models themselves.

The future of content safety classifiers

AI systems developers often have a hard time finding the right content safety classifier for their specific use case and modality, especially when developers need to also fulfill other requirements around desired model behaviors, latency, performance needs, and other considerations. Developers need a better approach for standardizing reporting about classifier efficacy, and for facilitating comparisons to best suit their needs. The current lack of an open and standardized evaluation mechanism across various types of content or languages can also lead to unknown performance issues, requiring developers to perform a series of time-consuming evaluations themselves — adding additional friction to incorporating safety practices into their AI use cases.

Participants chartered a future roadmap for open safety systems based on open source content safety classifiers, defining key questions, estimating necessary resources, and articulating research agenda requirements while drawing insights from past and current classifier system deployments. We explored gaps in the content safety filtering ecosystem, considering both developer needs and future technological developments. Participants paid special attention to the challenges posed in combating child sexual abuse material and identifying other harmful content. We also noted the limiting factors and frequently Western-centric nature of current tools and datasets for this purpose, emphasizing the need for multilingual, flexible, and open-source solutions. Discussions also called for resources that are accessible to developers across diverse skill levels, such as a “cookbook” offering practical steps for implementing and evaluating classifiers based on specific safety priorities, including child safety and compliance with international regulations.

The workshop underscored the importance of inclusive data practices, urging a shift from rigid frameworks to adaptable systems that cater to various cultural and contextual needs and realities. Proposals included a central hub for open-source resources, best practices, and evaluation metrics, alongside tools for policymakers to develop feasible guidelines. Participants showed how AI innovation and safety could be advanced together, prioritizing a global approach to AI development that works in underrepresented languages and regions.

Agentic risk

With growing interest in “agentic applications,” participants discussed how to craft meaningful working definitions and mappings of the specific needs of AI-system developers in developing safe agentic systems. When considering agentic AI systems, many of the usual risk mitigation approaches for generative AI systems — such as content filtering or model tuning —  run into limitations. In particular, such approaches are often focused on non-agentic systems that only generate text or images, whereas agentic AI systems take real-world actions that carry potentially significant downstream consequences. For example, an agent might autonomously book travel, file pull requests on complex code bases, or even take arbitrary actions on the web, introducing new layers of safety complexity. Agent safety can present a fundamentally different challenge as agents perform actions that may appear benign on their own while potentially leading to unintended or harmful consequences when combined.

Discussions began with a foundational question: how much trust should humans place in agents capable of decision-making and action? Through case studies that included AI agents being used to select a babysitter and book a vacation, participants analyzed risks including privacy leaks, financial mismanagement, and misalignment of objectives. A clear distinction emerged between safety and reliability; while reliability errors in traditional AI might be inconveniences, errors in autonomous agents could cause more direct, tangible, and irreversible harm. Conversations highlighted the complexity of mitigating risks such as data misuse, systemic bias, and unanticipated agent interactions, underscoring the need for robust safeguards and frameworks.

Participants proposed actionable solutions focusing on building transparent systems, defining liability, and ensuring human oversight. Guardrails for both general-purpose and specialized agents, including context-sensitive human intervention thresholds and enhanced user preference elicitation, were also discussed. The group emphasized the importance of centralized safety standards and a taxonomy of agent actions to prevent misuse and ensure ethical behavior. With the increasing presence of AI agents in sectors like customer service, cybersecurity, and administration, Convening members stressed the urgency of this work.

Participatory inputs

Participants examined how participatory inputs and democratic engagement can support safety tools and systems throughout development and deployment pipelines, making them more pluralistic and better adapted to specific communities and contexts. Key concepts included creating sustainable structures for data contribution, incentivizing safety in AI development, and integrating underrepresented voices, such as communities in the Global Majority. Participants highlighted the importance of dynamic models and annotation systems that balance intrinsic motivation with tangible rewards. The discussions also emphasized the need for common standards in data provenance, informed consent, and participatory research, while addressing global and local harms throughout AI systems’ lifecycles.

Actionable interventions such as fostering community-driven AI initiatives, improving tools for consent management, and creating adaptive evaluations to measure AI robustness were identified. The conversation called for focusing on democratizing data governance by involving public stakeholders and neglected communities, ensuring data transparency, and avoiding “golden paths” that favor select entities. The workshop also underscored the importance of regulatory frameworks, standardized metrics, and collaborative efforts for AI safety.

Additional discussion

Some participants discussed the tradeoffs and false narratives embedded in the conversations around open source AI and national security. A particular emphasis was placed on the present harms and risks from AI’s use in military applications, where participants stressed that these AI applications cannot solely be viewed as policy or national security issues, but must also be viewed as technical issues too given key challenges and uncertainties around safety thresholds and system performance.

Conclusion

Overall, the Convening advanced discussions in a manner that showed that a pluralistic, collaborative approach to AI safety is not only possible, but also necessary. It showed that leading AI experts and practitioners can bring much needed perspectives to a debate dominated by large corporate and government actors, and demonstrated the importance of a broader range of expertise and incentives. This framing will help ground a more extensive report on AI safety that will follow from this Convening in the coming months.

We are immensely grateful to the participants in the Columbia Convening on AI Safety and Openness; as well as our incredible facilitator Alix Dunn from Computer Says Maybe, who continues to support our community in finding alignment around important socio-technical topics at the intersection of AI and Openness.

The list of participants at the Columbia Convening is below, individuals with an asterisk were members of the working group 

  • Guillaume Avrin – National Coordinator for Artificial Intelligence, Direction Générale des Entreprises
  • Adrien Basdevant – Tech Lawyer, Entropy
  • Ayah Bdeir* – Senior Advisor, Mozilla
  • Brian Behlendorf – Chief AI Strategist, The Linux Foundation 
  • Stella Biderman– Executive Director, EleutherAI 
  • Abeba Birhane – Adjunct assistant professor, Trinity College Dublin 
  • Rishi Bommasani – Society Lead, Stanford CRFM
  • Herbie Bradley – PhD Student, University of Cambridge
  • Joel Burke – Senior Policy Analyst, Mozilla 
  • Eli Chen – CTO & Co-Founder, Credo AI
  • Julia DeCook, PhD – Senior Policy Specialist, Mozilla 
  • Leon Derczynski – Principal research scientist, NVIDIA Corp & Associate professor, IT University of Copenhagen
  • Chris DiBona – Advisor, Unaffiliated
  • Jennifer Ding – Senior researcher, The Alan Turing Institute 
  • Bonaventure F. P. Dossou – PhD Student, McGill University/Mila Quebec AI Institute 
  • Alix Dunn – Facilitator, Computer Says Maybe 
  • Nouha Dziri* – Head of AI Safety, Allen Institute for AI 
  • Camille François* – Associate Professor, Columbia University’s School of International and Public Affairs
  • Krishna Gade – Founder & CEO, Fiddler AI 
  • Will Hawkins* – PM Lead for Responsible AI, Google DeepMind 
  • Ariel Herbert-Voss – Founder and CEO, RunSybil 
  • Sara Hooker – VP Research, Head of C4AI, Cohere
  • Yacine Jernite* – Head of ML and Society, HuggingFace 
  • Sayash Kapoor* – Ph.D. candidate, Princeton Center for Information Technology Policy
  • Heidy Khlaaf* – Chief AI Scientist, AI Now Institute 
  • Kevin Klyman – AI Policy Researcher, Stanford HAI 
  • David Krueger – Assistant Professor, University of Montreal / Mila 
  • Greg Lindahl – CTO, Common Crawl Foundation
  • Yifan Mai – Research Engineer, Stanford Center for Research on Foundation Models (CRFM)
  • Nik Marda* – Technical Lead, AI Governance, Mozilla
  • Petter Mattson – President, ML Commons 
  • Huu Nguyen – Co-founder, Partnership Advocate, Ontocord.ai 
  • Mahesh Pasupuleti – Engineering Manager, Gen AI, Meta 
  • Marie Pellat* – Lead Applied Science & Safety, Mistral 
  • Ludovic Péran* – AI Product Manager
  • Deb Raji* – Mozilla Fellow 
  • Robert Reich – Senior Advisor, U.S. Artificial Intelligence Safety Institute
  • Sarah Schwetmann – Co-Founder, Transluce & Research Scientist, MIT
  • Mohamed El Amine Seddik – Lead Researcher, Technology Innovation Institute 
  • Juliet Shen – Product Lead, Columbia University SIPA
  • Divya Siddarth* – Co-Founder & Executive DIrector, Collective Intelligence Project
  • Aviya Skowron* – Head of Policy and Ethics, EleutherAI 
  • Dawn Song  – Professor, Department of Electrical Engineering and Computer Science at UC Berkeley
  • Joseph Spisak* – Product Director, Generative AI @Meta 
  • Madhu Srikumar* – Head of AI Safety Governance, Partnership on AI
  • Victor Storchan – ML Engineer 
  • Mark Surman – President, Mozilla
  • Audrey Tang* – Cyber Ambassador-at-Large, Taiwan
  • Jen Weedon – Lecturer and Researcher, Columbia University 
  • Dave Willner – Fellow, Stanford University 
  • Amy Winecoff – Senior Technologist, Center for Democracy & Technology 

Share on Twitter