Anthropic's Responsible Scaling Policy

The Responsible Scaling Policy (RSP) is a voluntary AI safety framework developed by Anthropic, an American AI company, for managing catastrophic risks from advanced artificial intelligence systems. First published on September 19, 2023, the RSP was the first such framework published by a frontier AI developer. The policy uses a graduated system of AI Safety Levels (ASLs), which pair progressively higher model capabilities with correspondingly stricter safety and security requirements. Anthropic drew the concept from the biosafety level (BSL) classification system used in the United States for biological research facilities.

The RSP has been revised several times, most recently with Version 3.0, which took effect on February 24, 2026. That revision was the largest structural change to date: it replaced the original commitment to halt training or deployment if safety measures were inadequate with a more conditional approach, and introduced new transparency mechanisms including Frontier Safety Roadmaps and periodic Risk Reports. As of late 2025, twelve companies in total (including Anthropic) had published frontier AI safety frameworks.

Background

Anthropic was founded in 2021 by former OpenAI employees, including Dario Amodei and Daniela Amodei, with a stated focus on AI safety research. The RSP was developed amid growing concern about the potential for increasingly capable AI systems to cause large-scale harm, whether through deliberate misuse (such as assistance in bioweapon development) or through AI systems acting autonomously in ways their designers did not intend.

The framework drew on a broader concept for responsible scaling developed by Paul Christiano and ARC Evals (later renamed METR). Anthropic credited ARC Evals with Providing substantial advice, especially on evaluations for autonomous capabilities. The RSP was formally approved by Anthropic's board of directors, and subsequent changes require board approval following consultations with the company's Long Term Benefit Trust (LTBT), a governance body designed to ensure the company's decisions account for long-term societal benefit rather than short-term commercial pressures.

AI Safety Levels

The core mechanism of the RSP is the AI Safety Levels (ASL) framework. Each level defines a tier of model capability alongside a corresponding set of safety, security, and operational requirements. As capabilities increase, so do the safeguards required for training and deployment.

ASL-1

The lowest tier encompasses AI systems with no significant potential for catastrophic harm, such as narrow game-playing programs or early-generation language models with limited capabilities.

ASL-2

The second tier covers systems that exhibit preliminary signs of potentially dangerous knowledge, for instance, some ability to discuss topics related to weapons of mass destruction, but whose outputs do not meaningfully exceed what a person could find through publicly available resources such as textbooks or search engines. At the time the RSP was first published, Anthropic placed all of its Claude models in this category.

Deployment safeguards at this level include training models to decline harmful requests and monitoring tools to flag problematic inputs and outputs. Security measures focus on defending model weights against opportunistic theft.

ASL-3

The third tier addresses systems whose capabilities meaningfully exceed what is available from non-AI sources in ways that could Contribute to catastrophic outcomes, or that exhibit basic autonomous behaviors. The original RSP required that models at this level not be deployed if they showed meaningful catastrophic misuse risk when subjected to adversarial testing by expert red-teamers.

In May 2025, Anthropic provisionally activated ASL-3 protections for Claude Opus 4, the first time the company had deployed a model under ASL-3 standards. The company stated that while it had not definitively determined the model had crossed the ASL-3 capability threshold, it could no longer confidently rule out that possibility. The ASL-3 deployment measures target the narrow risk of Claude being used to assist in creating or acquiring chemical, biological, radiological, and nuclear (CBRN) weapons, using techniques such as specialized classifiers, access controls, and bug bounty programs. The ASL-3 security measures strengthen protections against model weight theft, targeting the threat of sophisticated non-state attackers.

ASL-4 and beyond

The original RSP left higher tiers deliberately unspecified, noting that they were too distant from current systems to define concretely. Version 1.0 committed to defining ASL-4 evaluations before the company's models reached ASL-3. That commitment was later revised: the October 2024 update replaced the obligation to predefine future ASL levels with a more flexible approach (see Version history).

Under RSP v3.0, Anthropic moved away from specifying fixed ASL categories for future capability levels, instead using a system of capability thresholds mapped to required mitigations. The company acknowledged that some mitigations at higher levels would require industry-wide or government coordination and could not be achieved by any single company acting alone.

Version history

Anthropic has described the RSP as a "living document" intended to evolve alongside AI capabilities and the company's understanding of risks.

Version 1.0 (September 2023)

The initial version established the ASL framework and defined the ASL-2 and ASL-3 tiers in detail. Its central commitment was that Anthropic would temporarily halt training of more capable models if its safety measures had not kept pace. It also committed to defining ASL-4 evaluations before ASL-3 models were trained. ARC Evals (now METR) was credited as a key collaborator.

Version 2.0 (October 2024)

The first major revision changed how the ASL terminology was used: "ASL" now referred to sets of safeguards rather than to the models themselves, and the policy introduced the concepts of "Capability Thresholds" and "Required Safeguards." The autonomous replication and adaptation (ARA) threshold, which had previously triggered mandatory escalation to higher safety standards, was reclassified as a "checkpoint" that would prompt additional evaluation without automatically requiring new safeguards. The commitment to predefine future ASL levels was also removed. Co-founder and Chief Science Officer Jared Kaplan was named as Anthropic's Responsible Scaling Officer.

Version 2.1 (March 2025)

This update introduced a new CBRN capability threshold addressing capabilities that could meaningfully assist moderately resourced state weapons programs. It also split the existing AI R&D thresholds into two levels: one for the ability to fully automate entry-level AI research, and another for capabilities that could dramatically accelerate the pace of effective AI scaling.

Version 2.2 (May 2025)

A minor update that refined certain security exclusions under the ASL-3 standard.

Version 3.0 (February 2026)

The largest structural overhaul since the RSP's inception. The update was first reported by TIME on February 24, 2026. Key changes included:

  • Removal of the unilateral pause commitment: Previous versions committed Anthropic to halt development or deployment if its safety measures could not keep pace with model capabilities. Version 3.0 replaced this with a conditional framework under which Anthropic would delay development only if both (a) the company considered itself the frontrunner in AI development and (b) it judged The Risk of catastrophe to be significant.
  • Separation of company-level and industry-wide commitments: The policy adopted a structure distinguishing mitigations Anthropic plans to implement regardless of competitors' actions from more ambitious measures it recommends the entire industry adopt. Anthropic stated it could not commit to the industry-wide measures on its own.
  • Frontier Safety Roadmaps: A new requirement to publish roadmaps describing concrete plans for risk mitigations across security, alignment, safeguards, and policy. These are publicly declared goals rather than binding commitments.
  • Risk Reports: A commitment to publish detailed risk assessments every three to six months, with the option for review by independent experts who would have access to minimally redacted versions.
  • Competitor-matching provision: A commitment to meet or exceed the risk reduction posture of comparable competitors, and to delay development or deployment until it can do so.

Capability thresholds (v3.0)

Version 3.0 organizes catastrophic risk around four capability thresholds, each corresponding to a distinct category of potential harm:

  1. Non-novel chemical and biological weapons: AI systems that could meaningfully assist individuals with basic technical training in creating, obtaining, or deploying chemical or biological weapons capable of catastrophic damage.
  2. State-program-level CBRN uplift: Capabilities that could substantially enhance the weapons development capabilities of moderately resourced state programs.
  3. Automation of entry-level AI research: The ability to fully automate the work of entry-level AI researchers.
  4. Dramatic acceleration of effective scaling: Capabilities that could cause a dramatic acceleration in the pace of effective AI scaling.

Each threshold maps to both company-level mitigations and industry-wide recommendations.

Governance and oversight

The RSP includes several procedural safeguards:

  • A designated Responsible Scaling Officer (currently Jared Kaplan) who oversees implementation of the policy.
  • Board approval required for changes to the policy, following consultation with the Long Term Benefit Trust.
  • Internal review of procedural compliance on a regular basis.
  • Whistleblower protections for employees who report safety concerns, with explicit provisions that non-disparagement clauses do not preclude raising such concerns.
  • Under v3.0, external review of Risk Reports by independent experts chosen for AI safety expertise and absence of major conflicts of interest.

Frontier Compliance Framework

In December 2025, Anthropic published a separate document called the Frontier Compliance Framework (FCF), a mandatory compliance document distinct from the voluntary RSP. The FCF serves a dual regulatory purpose: in the United States, it functions as Anthropic's "Frontier AI Framework" under California's Transparency in Frontier Artificial Intelligence Act (SB-53, also referred to as TFAIA), which was signed into law on September 29, 2025, and took effect on January 1, 2026; in the European Union, it serves as a publicly available summary of Anthropic's Safety & Security Framework under the EU AI Act, after Anthropic Ireland Limited signed the General-Purpose AI Code of Practice. Anthropic's blog post announcing the FCF discussed only its role in SB-53 compliance and did not mention the EU dimension. SB-53 was the first U.S. statute focused specifically on safety requirements for frontier AI systems, requiring large frontier developers (those with annual revenues over $500 million) to publish annual frameworks explaining how they identify, assess, and mitigate catastrophic risks.

The FCF covers four systemic risk categories: cyber offense, CBRN threats, harmful manipulation (including influence operations and election interference), and sabotage and loss of control (including autonomous behavior that evades oversight). For each category, it defines tiered risk levels with descriptions of the relevant capabilities, though the specific mitigations to be applied at each tier are not predetermined and are instead to be determined when the relevant tier is reached. The FCF explicitly notes that it uses a lower threshold for "catastrophic risk" than the RSP: the FCF addresses statutory definitions of catastrophic harm (such as events causing more than 50 deaths or $1 billion in damages), while the RSP reserves the term for existential threats or fundamental destabilization of global systems. The framework was updated on March 2, 2026, revising risk tiers across all four categories and introducing new tiers for harmful manipulation.

Anthropic stated that the RSP would remain its voluntary safety policy going forward, and that it may go beyond or differ from what the FCF requires for regulatory compliance. Anthropic publicly endorsed SB-53 prior to its passage, arguing that the law would formalize transparency practices that responsible frontier developers were already following voluntarily. The company has also advocated for a comparable federal transparency framework, arguing that a national standard would avoid a patchwork of inconsistent state-level requirements.

The Midas Project, a nonprofit that tracks changes to AI safety policies, characterized the FCF's publication as a "major" change, arguing that the FCF was a substantially thinner document than the RSP, lacking the RSP's binding mitigations tied to specific capability thresholds, its delegation of release authority to the Responsible Scaling Officer, and the involvement of the Long-Term Benefit Trust. The Midas Project noted that when Anthropic endorsed SB-53, it had characterized the law as a codification of practices the company already followed, yet the FCF did not incorporate much of what gave the RSP its rigor.

Influence and adoption

The RSP has been cited as an influential model for frontier AI safety governance. Following Anthropic's publication, several major AI companies published their own frameworks, and the concepts Underlying the RSP have informed legislative efforts in multiple jurisdictions.

As of December 2025, twelve companies had published frontier AI safety frameworks: Anthropic, OpenAI, Google DeepMind, Magic, Naver, Meta, G42, Cohere, Microsoft, Amazon, xAI, and NVIDIA. In May 2024, sixteen companies agreed to publish such frameworks as part of the Frontier AI Safety Commitments at the AI Seoul Summit.

The RSP concept has also influenced legislation, including California's SB-53, New York's RAISE Act, and aspects of the EU AI Act and General-Purpose AI Code of Practice. METR (formerly ARC Evals), which Anthropic credited as having helped originate the RSP concept, has published comparative analyses of the various published frameworks.

General reception

The RSP has been credited with helping to establish a norm in the AI industry: within months of Anthropic's announcement, both OpenAI and Google DeepMind published broadly similar frameworks, as Anthropic noted in its v3.0 blog post. The Centre for the Governance of AI (GovAI), in a detailed analysis, noted that RSP-like frameworks had served as a foundation for frontier AI regulation in multiple jurisdictions. Anthropic itself has acknowledged that the RSP served as a useful internal forcing function, pushing the company to develop safety measures ahead of when they were needed.

Reception of v3.0 changes

The February 2026 revision received a mixed reception. GovAI published a detailed analysis stating that its initial reaction was "rather negative," particularly because the removal of the pause commitment made it more likely Anthropic would deploy models posing unacceptable risks. However, GovAI wrote that after closer engagement, its overall view became more positive, finding The New Roadmap and Risk Reports valuable for increasing transparency, while noting that these mechanisms still largely rely on self-reporting. GovAI also warned that other companies might lower their own commitments in response to Anthropic's changes without adopting comparable transparency measures.

Media coverage was more uniformly critical, with outlets characterizing the changes as Anthropic "scaling back" or "loosening" its safety commitments.

The removal of the unconditional pause commitment attracted the most scrutiny. TIME reported that the new policy would only trigger a delay if Anthropic's leadership simultaneously considered the company to be the frontrunner in AI development and judged catastrophe risk to be significant, a considerably higher bar than the previous categorical rule. CNN described the change as Anthropic "adopting a nonbinding safety framework that it says can and will change," and quoted the company as arguing that responsible developers pausing while less careful competitors advanced could make the world less safe overall.

The timing of the revision coincided with a reported standoff between Anthropic and the Pentagon over military uses of Claude, with Defense Secretary Pete Hegseth reportedly giving the company a deadline to lift restrictions on the technology's use. Both Anthropic and a source familiar with the matter told reporters the policy revision was unrelated to the Pentagon discussions.

Chief Science Officer Jared Kaplan told TIME that the company did not believe halting its own training would help if competitors continued to advance without comparable safeguards.

Broader criticisms

A recurring criticism across all versions of the RSP is that the policy is voluntary and self-assessed: Anthropic itself determines whether it is in compliance, and no external enforcement mechanism exists. The Japan Times noted that the v3.0 goals are publicly announced targets rather than binding commitments. GovAI has also observed that the strength of the original pause commitment was debated even before v3.0 was announced, given that Anthropic always retained the ability to revise the RSP.

See also

  • AI safety
  • Anthropic
  • Claude (language model)
  • Existential risk from artificial general intelligence
  • Biosafety level
  • Frontier model
  • AI alignment
  • EU AI Act
  • Transparency in Frontier Artificial Intelligence Act