Executive Summary
Risk management for frontier AI systems is a nascent science. It has so far focused on “if-then scenarios”, where an evaluation result pointing to a certain level of a dangerous capability triggers a certain set of mitigations. This approach has several limitations. First, as risk itself is not measured, we cannot know by how much the mitigations reduce risk or provide justification for whether the mitigations are sufficient. Second, it leads to the treatment of capabilities measured by standardized benchmarks in isolation, ignoring interactions between capabilities that can impact real-world risks and important factors related to the precise path to harm.
The role of risk modeling is to bridge the gap between the source of risk (e.g., dangerous capabilities or propensities, deployment conditions or affordances) and the actual harm, to enable systematically analyzing the risk. This technical report applies the methodology we have developed for quantitative modeling of AI-enabled risks to the domain of cyber offense. We provide a road map for implementing the methodology as well as tentative results from applying it to nine risk models. As a common criticism of quantitative risk assessment is its lack of scalability, we also experiment with the use of
LLM-simulated experts to provide estimates, in addition to our human expert Delphi study. Comprehensive and systematic risk modeling can provide numerous benefits to different groups of AI risk stakeholders:
- Cybersecurity defenders and vendors can leverage the insights on where AI uplift is the hightest to prioritize their mitigation efforts.
- In the AI evaluation and benchmark community, evaluators can leverage the insights to see where new benchmarks would reduce the greatest uncertainty in risk estimates and hence where they should focus their efforts.
- In AI companies, decision makers can use the more precise and forward-looking data to make more informed development and deployment decisions.
- Regulators and policymakers can gain more foresight on where AI risk is heading and start to determine expected harm to define risk thresholds.
In other high-risk industries, such as nuclear power and aviation, these types of benefits drove a shift over time from qualitative risk assessment to quantitative. In order to prompt a step in that direction for AI risk management, we present this methodology and initial attempt at applying it. While the resulting estimates necessarily carry significant uncertainty, we hope that publishing specific numbers can enable experts to pinpoint exactly where they disagree, and collectively refine estimates, something that cannot be done with qualitative assessments alone.
Methodology (Section 2)
Our risk modeling methodology consists of six, interlinked steps:
- Selecting risk scenarios. We systematically decompose the risk universe into a set of representative scenarios.
- Constructing risk scenarios. We build risk models for each scenario. These comprise four types of risk factors: the number of actors, the frequency with which attacks are launched, the probability of the attack succeeding, and the harm that would arise as a result. The steps in an attack are modeled using the MITRE ATT&CK framework.
- Quantifying “baseline” risk. We establish estimates for the “baseline” risk (negligible or non-existent use of AI) case, in order to create a reference point, based on cyber threat intelligence data, historical case studies and expert review. This is captured as a Bayesian network.
- Determining key risk indicators (KRIs) for AI “uplift”. We establish which forms of KRIs, such as benchmark performance, can serve as evidence to infer values for uplifted risk factors. This technical report uses Cybench and BountyBench as examples.
- Estimating AI uplift. We build a quantitative mapping between the KRIs and the risk factors in the risk model and use these to generate estimates for the risk factors. We conduct a Delphi study with cybersecurity experts for one risk scenario and we experiment with the use of “LLM-estimators” to generate estimates at scale. Experts provide confidence intervals around their estimates.
- Propagating individual estimates among experts and across risk factors to aggregate estimates. We fit the estimates to the appropriate distributions and propagate the individual parameter estimates using Monte Carlo simulations to arrive at an overall risk distribution of the scenario.
Throughout this methodology, we rely extensively on cybersecurity experts. We iterate multiple times with four cybersecurity experts with complementary backgrounds to develop and refine the list of nine selected risk scenarios. Each baseline risk model is reviewed by one expert with relevant domain expertise, who validates the parameter values and suggests corrections where appropriate. Nine cybersecurity experts participated in the Delphi study for uplift estimation, and one expert reviews all of the uplift values produced by the LLM estimators to identify implausible estimates.
Results from Delphi Processes (Section 3)
In the modified Delphi study we conducted with cybersecurity experts, we had nine cyber experts provide two round of estimates of risk factors for one risk model, with a facilitated discussion in-between to discuss points of contention. We find that experts vary highly in how confident they are in assessing their uncertainty. Further, uplift estimates on risk factors associated with quantities (number of actors, number of attempts/actor/year, impact) exhibit a much greater variance than those associated with probabilities that are bounded by [0,1]. It is also noteworthy that the uplift variance increases as the corresponding benchmark task gets more difficult.
We also experiment with LLM-simulated expert estimators. Their estimates of probability risk factors closely follow those of humans. However, for quantity risk factors, there is more disagreement. LLM estimators are often more conservative, providing significantly lower predictions of uplift than human experts. LLM estimators predict a lower total risk than their human counterparts, with the deviation from human estimates increasing as task difficulty grows. LLMs also demonstrate lower uncertainty than humans, especially at higher AI capability levels.
Results from the Quantitative Evaluation (Section 4)
In Section 4, we provide the tentative quantitative results of our nine risk models in order to demonstrate the many use cases of risk modeling and create scrutiny, debate, and criticism around specific values so that we can iteratively work toward more exact estimates. Given the nascency of the science of AI risk modeling and the limitations of our methodology, we do not recommend making use of the exact numbers for decision-making at this time. We provide detailed results of our early comparative findings (intra- and inter-model) as a proof of concept for the potential value in quantitative risk modeling. Interesting results indicated by the models include:
- For seven out of nine scenarios, the models indicate that state of the art (SOTA) (at the time of conducting experiments) AI systems provide uplift relative to the baseline, i.e., the estimated total risk is higher when malicious actors’ use AI at current capabilities.
- At “saturation”, i.e., when AI can reliably perform all tasks in the benchmarks we use, the models indicate that the total risk estimates are again significantly higher than the SOTA-level.
- Across risk models, the models do not suggest a uniform pattern (i.e., AI is not consistently helping low-level or high-level attackers more)
- None of the four risk factors (number of actors, number of attempts, probability of success, and damage per attack) is suggested to play the key role in uplift, but rather all contribute to the increase in risk across different scenarios, i.e., AI helps with both “quantity” and “quality”.
- The models suggest that AI provides more uplift for three MITRE tactics, Execution, Impact, and Initial Access, relative to the other eleven. However, there is significant variability in uplift within each factor.
Limitations and Future Work (Section 5)
This is, to our knowledge, one of the first attempts at building a systematic procedure for quantitative modeling of cybersecurity risks arising from AI misuse. Therefore, we acknowledge a number of limitations with our methodology and discuss them at length in Section 5 to guide future work.
Back to top