Defining acceptable and unacceptable risk levels is a necessary step when managing risk. Our risk management framework (Campos et al., 2025) leverages two concepts: risk tolerance (maximum acceptable risk levels that should never be exceeded) and key risk indicator thresholds (thresholds on measurable signals such as model evaluations, that trigger specific mitigations to maintain risks below the risk tolerance). Within this framework, multiple key risk indicator thresholds can be defined for a single risk tolerance level. The third draft of the EU General Purpose AI (GPAI) Code of Practice requires the definition of Systemic Risk Tiers that serve multiple functions, including specifying which mitigations to implement under different conditions (third draft Measure II.1.2(List 2)(1))) and establishing acceptable risk levels (third draft Measure II.1.2(List 1)(2)(c)). This memo provides guidance on defining and operationalizing these risk tiers. It is an expanded version of parts of the feedback that we provided on the second and third Code of Practice drafts.
To inform the drafting of the Code, we proposed a hierarchy of approaches to define risk tiers that helps organize harm levels, risk scenario levels, and capability levels for general-purpose AI models in a structured, ordered way. Risk tiers can in principle be defined by regulators and/or providers. Ideally, they should be the same for all models. For each risk, multiple tiers may be defined, but at minimum, one tier must be designated as "unacceptable." This unacceptable tier effectively establishes the risk tolerance.
We think risk tiers should be defined using the following three approaches, in the order of preference listed. Developers should only use the next approach after they demonstrate that the previous approach is not feasible for the specific risk being assessed.
Steps:
This is the preferred approach when potential harm can be directly estimated because harm is ultimately what we aim to mitigate—it represents the actual negative impacts we care about preventing. Additionally, harm based tiers are most useful for governance because they allow societal discourse on unacceptable levels of risk and standardisation of them across providers.
This approach is used in various industries as well as EU guidelines and regulations such as:
Harm is difficult to measure in two cases: where it is hard to characterize by a metric, and where the potential risk scenarios are unclear at present. Therefore, sometimes the harm-based approach is not feasible, in which case the scenario-based approach should be tried instead.
Steps:
This approach should only be used when direct harm is hard to measure or foresee. An approach similar to this is currently used (though not with probabilities) in the risk management policies of most developers of advanced AI systems. In practice, the approach of numerous developers is in between scenario-based tiers and source-based tiers. The characterization of the thresholds is often less connected to harms in the real-world than we would recommend, but more connected to what the capabilities enable.
Steps:
This approach should only be used when harm-based and scenario-based approaches are infeasible. While easiest to measure and implement, this approach is furthest removed from actual harm and should be used as a last resort.