AI Evaluations and the Standards Metaphor

Amina Abdu & Abigail Jacobs

Significant attention has been devoted to the question of how best to govern artificial intelligence (AI). In addition to legislation, many policy proposals focus on extra-legal regulatory instruments. Notably, AI evaluations provide a particularly attractive solution, imposing seemingly neutral measurements across the widespread contexts in which AI operates. Because AI evaluations are driven by a wide range of actors, their adoption as a governance tool has the potential to shift power in AI policymaking, raising questions of legitimacy—as opposed to clever marketing strategies. In particular, the companies that create AI are also often key players in designing and marketing AI evaluations. We collect and analyze a corpus of U.S. AI policy comments and proposals to explore how large technology companies and government actors conceptualize self-regulation by technology companies as a legitimate policy intervention. We note that AI evaluations are often described using the language of standards, another more established soft law regulatory instrument. Drawing on the history of standards, we discuss how AI companies leverage the metaphor of standards to describe benchmarks and evaluations in order to legitimate corporate expertise. We then examine the Implications of this metaphor, describing where it Is useful In the context of AI and where It obscures Important policy decisions.