Distinguishing Audits, Evals, and Red-Teaming

In conversations about AI safety, you often hear people mention audits, red-teaming, and evaluation. While related, there are important differences. In short, evaluations (evals) and red-teaming can be a part of an auditing process. A full audit looks at both the AI model and the organization. Here's how we think about the relationships between the three concepts.

Audits are a set of procedures that apply to both an AI system and an organization to discover whether risk management is correctly carried out at all levels. Including more than just model assessment, an audit checks things like the cybersecurity and infosecurity of the organization, that the organization has the level of safety culture adequate to deal with dangerous systems, that it has good risk management procedures (e.g. efficient incident reporting mechanisms), and that red teaming and evaluations are carried out effectively.

Red teaming in the context of an AI safety audit is model-specific adversarial testing aimed at finding the worst behaviors specific to a model. Each model can fail in unique and unexpected ways, so the goal of red-teaming is to find those failures and report them to the organization before they're found and exploited by external adversaries.

‍Evaluations are sets of tests run on a model to estimate its propensity to produce output with certain qualities. Those tests often aim at being as quantitative as possible. The best evals for risk assessment should include red teaming, but red teaming can be very involved and customized to a specific model, so not all evals include it. The key thing to note is that the more standardized and scalable an eval is, the more likely it is to miss failure modes a more customized red-teaming assessment would uncover.

back to blog Read The Paper