Red Teaming in the Public Interest
/ By Andrew Martin
The report explores red-teaming for generative AI, analyzing how adversarial testing methods can assess risks, protect public interests, and guide governance.
/ By Andrew Martin
On February 20, 2025, the report was discussed, examining red-teaming’s role in genAI evaluation and governance.
As generative AI (genAI) systems grow more capable and widespread, regulators, technologists, and the public are pressing for safety practices to anticipate harms. An early and promising approach, adapted from cybersecurity and military domains, is red-teaming, where designated teams use adversarial tactics to expose weaknesses. Based on 26 semi-structured interviews and participant observation at three public red-teaming events, and conducted as a collaboration between Data & Society and the AI Risk and Vulnerability Alliance (ARVA), Red-Teaming in the Public Interest analyzes how these methods are being tailored to assess genAI.
Red-teaming genAI raises methodological questions—how and when to red-team, who participates, and how findings are used—and conceptual ones: whose interests are protected; what counts as problematic model behavior and who defines it; and whether the public is secured or used. In this report, Ranjit Singh, Borhane Blili-Hamelin, Carol Anderson, Emnet Tafesse, Briana Vecchione, Beth Duckles, and Jacob Metcalf outline a vision for red-teaming in the public interest that goes beyond system-centric testing of already built systems to include the full range of public involvement in evaluating genAI harms.