Anthropic's Responsible Scaling Policy

Articles

The Responsible Scaling Policy (RSP) is a voluntary AI safety framework developed by Anthropic, an American AI company, for managing catastrophic risks from advanced artificial intelligence systems. First published on September 19, 2023, the RSP was the first such framework published by a frontier AI developer. The policy uses a graduated system of AI Safety Levels (ASLs), which pair progressively higher model capabilities with correspondingly stricter safety and security requirements. Anthropic drew the concept from the biosafety level (BSL) classification system used in the United States for biological research facilities.

The RSP has been revised several times, most recently with Version 3.0, which took effect on February 24, 2026. That revision was the largest structural change to date: it replaced the original commitment to halt training or deployment if safety measures were inadequate with a more conditional approach, and introduced new transparency mechanisms including Frontier Safety Roadmaps and periodic Risk Reports. As of late 2025, twelve companies in total (including Anthropic) had published frontier AI safety frameworks.

Background

Anthropic was founded in 2021 by former OpenAI employees, including Dario Amodei and Daniela Amodei, with a stated focus on AI safety research. The RSP was developed amid growing concern about the potential for increasingly capable AI systems to cause large-scale harm, whether through deliberate misuse (such as assistance in bioweapon development) or through AI systems acting autonomously in ways their designers did not intend.

The framework drew on a broader concept for responsible scaling developed by Paul Christiano and ARC Evals (later renamed METR). Anthropic credited ARC Evals with Providing substantial advice, especially on evaluations for autonomous capabilities. The RSP was formally approved by Anthropic's board of directors, and subsequent changes require board approval following consultations with the company's Long Term Benefit Trust (LTBT), a governance body designed to ensure the company's decisions account for long-term societal benefit rather than short-term commercial pressures.

AI Safety Levels

The core mechanism of the RSP is the AI Safety Levels (ASL) framework. Each level defines a tier of model capability alongside a corresponding set of safety, security, and operational requirements. As capabilities increase, so do the safeguards required for training and deployment.

ASL-1

The lowest tier encompasses AI systems with no significant potential for catastrophic harm, such as narrow game-playing programs or early-generation language models with limited capabilities.

ASL-2

The second tier covers systems that exhibit preliminary signs of potentially dangerous knowledge, for instance, some ability to discuss topics related to weapons of mass destruction, but whose outputs do not meaningfully exceed what a person could find through publicly available resources such as textbooks or search engines. At the time the RSP was first published, Anthropic placed all of its Claude models in this category.

Deployment safeguards at this level include training models to decline harmful requests and monitoring tools to flag problematic inputs and outputs. Security measures focus on defending model weights against opportunistic theft.

ASL-3

The third tier addresses systems whose capabilities meaningfully exceed what is available from non-AI sources in ways that could Contribute to catastrophic outcomes, or that exhibit basic autonomous behaviors. The original RSP required that models at this level not be deployed if they showed meaningful catastrophic misuse risk when subjected to adversarial testing by expert red-teamers.

In May 2025, Anthropic provisionally activated ASL-3 protections for Claude Opus 4, the first time the company had deployed a model under ASL-3 standards. The company stated that while it had not definitively determined the model had crossed the ASL-3 capability threshold, it could no longer confidently rule out that possibility. The ASL-3 deployment measures target the narrow risk of Claude being used to assist in creating or acquiring chemical, biological, radiological, and nuclear (CBRN) weapons, using techniques such as specialized classifiers, access controls, and bug bounty programs. The ASL-3 security measures strengthen protections against model weight theft, targeting the threat of sophisticated non-state attackers.

ASL-4 and beyond

The original RSP left higher tiers deliberately unspecified, noting that they were too distant from current systems to define concretely. Version 1.0 committed to defining ASL-4 evaluations before the company's models reached ASL-3. That commitment was later revised: the October 2024 update replaced the obligation to predefine future ASL levels with a more flexible approach (see Version history).

Under RSP v3.0, Anthropic moved away from specifying fixed ASL categories for future capability levels, instead using a system of capability thresholds mapped to required mitigations. The company acknowledged that some mitigations at higher levels would require industry-wide or government coordination and could not be achieved by any single company acting alone.

Version history

Anthropic has described the RSP as a "living document" intended to evolve alongside AI capabilities and the company's understanding of risks.

Version 1.0 (September 2023)

The initial version established the ASL framework and defined the ASL-2 and ASL-3 tiers in detail. Its central commitment was that Anthropic would temporarily halt training of more capable models if its safety measures had not kept pace. It also committed to defining ASL-4 evaluations before ASL-3 models were trained. ARC Evals (now METR) was credited as a key collaborator.

Version 2.0 (October 2024)

The first major revision changed how the ASL terminology was used: "ASL" now referred to sets of safeguards rather than to the models themselves, and the policy introduced the concepts of "Capability Thresholds" and "Required Safeguards." The autonomous replication and adaptation (ARA) threshold, which had previously triggered mandatory escalation to higher safety standards, was reclassified as a "checkpoint" that would prompt additional evaluation without automatically requiring new safeguards. The commitment to predefine future ASL levels was also removed. Co-founder and Chief Science Officer Jared Kaplan was named as Anthropic's Responsible Scaling Officer.

Version 2.1 (March 2025)

This update introduced a new CBRN capability threshold addressing capabilities that could meaningfully assist moderately resourced state weapons programs. It also split the existing AI R&D thresholds into two levels: one for the ability to fully automate entry-level AI research, and another for capabilities that could dramatically accelerate the pace of effective AI scaling.

Version 2.2 (May 2025)

A minor update that refined certain security exclusions under the ASL-3 standard.

Version 3.0 (February 2026)

The largest structural overhaul since the RSP's inception. The update was first reported by TIME on February 24, 2026. Key changes included:

Removal of the unilateral pause commitment: Previous versions committed Anthropic to halt development or deployment if its safety measures could not keep pace with model capabilities. Version 3.0 replaced this with a conditional framework under which Anthropic would delay development only if both (a) the company considered itself the frontrunner in AI development and (b) it judged The Risk of catastrophe to be significant.
Separation of company-level and industry-wide commitments: The policy adopted a structure distinguishing mitigations Anthropic plans to implement regardless of competitors' actions from more ambitious measures it recommends the entire industry adopt. Anthropic stated it could not commit to the industry-wide measures on its own.
Frontier Safety Roadmaps: A new requirement to publish roadmaps describing concrete plans for risk mitigations across security, alignment, safeguards, and policy. These are publicly declared goals rather than binding commitments.
Risk Reports: A commitment to publish detailed risk assessments every three to six months, with the option for review by independent experts who would have access to minimally redacted versions.
Competitor-matching provision: A commitment to meet or exceed the risk reduction posture of comparable competitors, and to delay development or deployment until it can do so.

Capability thresholds (v3.0)

Version 3.0 organizes catastrophic risk around four capability thresholds, each corresponding to a distinct category of potential harm:

Non-novel chemical and biological weapons: AI systems that could meaningfully assist individuals with basic technical training in creating, obtaining, or deploying chemical or biological weapons capable of catastrophic damage.
State-program-level CBRN uplift: Capabilities that could substantially enhance the weapons development capabilities of moderately resourced state programs.
Automation of entry-level AI research: The ability to fully automate the work of entry-level AI researchers.
Dramatic acceleration of effective scaling: Capabilities that could cause a dramatic acceleration in the pace of effective AI scaling.

Each threshold maps to both company-level mitigations and industry-wide recommendations.

Governance and oversight

The RSP includes several procedural safeguards:

A designated Responsible Scaling Officer (currently Jared Kaplan) who oversees implementation of the policy.
Board approval required for changes to the policy, following consultation with the Long Term Benefit Trust.
Internal review of procedural compliance on a regular basis.
Whistleblower protections for employees who report safety concerns, with explicit provisions that non-disparagement clauses do not preclude raising such concerns.
Under v3.0, external review of Risk Reports by independent experts chosen for AI safety expertise and absence of major conflicts of interest.

Frontier Compliance Framework

In December 2025, Anthropic published a separate document called the Frontier Compliance Framework (FCF), a mandatory compliance document distinct from the voluntary RSP. The FCF serves a dual regulatory purpose: in the United States, it functions as Anthropic's "Frontier AI Framework" under California's Transparency in Frontier Artificial Intelligence Act (SB-53, also referred to as TFAIA), which was signed into law on September 29, 2025, and took effect on January 1, 2026; in the European Union, it serves as a publicly available summary of Anthropic's Safety & Security Framework under the EU AI Act, after Anthropic Ireland Limited signed the General-Purpose AI Code of Practice. Anthropic's blog post announcing the FCF discussed only its role in SB-53 compliance and did not mention the EU dimension. SB-53 was the first U.S. statute focused specifically on safety requirements for frontier AI systems, requiring large frontier developers (those with annual revenues over $500 million) to publish annual frameworks explaining how they identify, assess, and mitigate catastrophic risks.

The FCF covers four systemic risk categories: cyber offense, CBRN threats, harmful manipulation (including influence operations and election interference), and sabotage and loss of control (including autonomous behavior that evades oversight). For each category, it defines tiered risk levels with descriptions of the relevant capabilities, though the specific mitigations to be applied at each tier are not predetermined and are instead to be determined when the relevant tier is reached. The FCF explicitly notes that it uses a lower threshold for "catastrophic risk" than the RSP: the FCF addresses statutory definitions of catastrophic harm (such as events causing more than 50 deaths or $1 billion in damages), while the RSP reserves the term for existential threats or fundamental destabilization of global systems. The framework was updated on March 2, 2026, revising risk tiers across all four categories and introducing new tiers for harmful manipulation.

Anthropic stated that the RSP would remain its voluntary safety policy going forward, and that it may go beyond or differ from what the FCF requires for regulatory compliance. Anthropic publicly endorsed SB-53 prior to its passage, arguing that the law would formalize transparency practices that responsible frontier developers were already following voluntarily. The company has also advocated for a comparable federal transparency framework, arguing that a national standard would avoid a patchwork of inconsistent state-level requirements.

The Midas Project, a nonprofit that tracks changes to AI safety policies, characterized the FCF's publication as a "major" change, arguing that the FCF was a substantially thinner document than the RSP, lacking the RSP's binding mitigations tied to specific capability thresholds, its delegation of release authority to the Responsible Scaling Officer, and the involvement of the Long-Term Benefit Trust. The Midas Project noted that when Anthropic endorsed SB-53, it had characterized the law as a codification of practices the company already followed, yet the FCF did not incorporate much of what gave the RSP its rigor.

Influence and adoption

The RSP has been cited as an influential model for frontier AI safety governance. Following Anthropic's publication, several major AI companies published their own frameworks, and the concepts Underlying the RSP have informed legislative efforts in multiple jurisdictions.

As of December 2025, twelve companies had published frontier AI safety frameworks: Anthropic, OpenAI, Google DeepMind, Magic, Naver, Meta, G42, Cohere, Microsoft, Amazon, xAI, and NVIDIA. In May 2024, sixteen companies agreed to publish such frameworks as part of the Frontier AI Safety Commitments at the AI Seoul Summit.

The RSP concept has also influenced legislation, including California's SB-53, New York's RAISE Act, and aspects of the EU AI Act and General-Purpose AI Code of Practice. METR (formerly ARC Evals), which Anthropic credited as having helped originate the RSP concept, has published comparative analyses of the various published frameworks.

General reception

The RSP has been credited with helping to establish a norm in the AI industry: within months of Anthropic's announcement, both OpenAI and Google DeepMind published broadly similar frameworks, as Anthropic noted in its v3.0 blog post. The Centre for the Governance of AI (GovAI), in a detailed analysis, noted that RSP-like frameworks had served as a foundation for frontier AI regulation in multiple jurisdictions. Anthropic itself has acknowledged that the RSP served as a useful internal forcing function, pushing the company to develop safety measures ahead of when they were needed.

Reception of v3.0 changes

The February 2026 revision received a mixed reception. GovAI published a detailed analysis stating that its initial reaction was "rather negative," particularly because the removal of the pause commitment made it more likely Anthropic would deploy models posing unacceptable risks. However, GovAI wrote that after closer engagement, its overall view became more positive, finding The New Roadmap and Risk Reports valuable for increasing transparency, while noting that these mechanisms still largely rely on self-reporting. GovAI also warned that other companies might lower their own commitments in response to Anthropic's changes without adopting comparable transparency measures.

Media coverage was more uniformly critical, with outlets characterizing the changes as Anthropic "scaling back" or "loosening" its safety commitments.

The removal of the unconditional pause commitment attracted the most scrutiny. TIME reported that the new policy would only trigger a delay if Anthropic's leadership simultaneously considered the company to be the frontrunner in AI development and judged catastrophe risk to be significant, a considerably higher bar than the previous categorical rule. CNN described the change as Anthropic "adopting a nonbinding safety framework that it says can and will change," and quoted the company as arguing that responsible developers pausing while less careful competitors advanced could make the world less safe overall.

The timing of the revision coincided with a reported standoff between Anthropic and the Pentagon over military uses of Claude, with Defense Secretary Pete Hegseth reportedly giving the company a deadline to lift restrictions on the technology's use. Both Anthropic and a source familiar with the matter told reporters the policy revision was unrelated to the Pentagon discussions.

Chief Science Officer Jared Kaplan told TIME that the company did not believe halting its own training would help if competitors continued to advance without comparable safeguards.

Broader criticisms

A recurring criticism across all versions of the RSP is that the policy is voluntary and self-assessed: Anthropic itself determines whether it is in compliance, and no external enforcement mechanism exists. The Japan Times noted that the v3.0 goals are publicly announced targets rather than binding commitments. GovAI has also observed that the strength of the original pause commitment was debated even before v3.0 was announced, given that Anthropic always retained the ability to revise the RSP.

External links

🪦 Wikipedia History

12 daysage

6editors

7edits

Archive Provenance

Created: March 21, 2026

Wikipedia title: Anthropic's Responsible Scaling Policy

Original author: Enervation

Original author ID: 15210045

Last editor: Avatar317

Last editor ID: 14614106

Last edit: March 25, 2026

Deleted: April 1, 2026

Article size: 25.8 KB

Technical Metadata

Wikipedia page ID: 82747727

Last revision ID: 1345256049

SHA-1 hash: 2es3ktqzo1ik1lcxe8ohw0d2ruhhzkw

Metadata captured: March 25, 2026 8:12 PM

Metadata updated: April 8, 2026 3:45 PM

📊 Wikipedia Stats

Views before deletion: 776

Wikipedia pages linked here: 2

All Access: 388 views · 2 months · March 2026 to April 2026

Desktop: 349 views · 2 months · March 2026 to April 2026

Mobile App: 9 views · 2 months · March 2026 to April 2026

Mobile Web: 30 views · 2 months · March 2026 to April 2026

View monthly pageviews (8)

April 2026 · All Access · 9 views

April 2026 · Desktop · 7 views

April 2026 · Mobile Web · 2 views

April 2026 · Mobile App · 0 views

March 2026 · All Access · 379 views

March 2026 · Desktop · 342 views

March 2026 · Mobile Web · 28 views

March 2026 · Mobile App · 9 views

Subject Tags

Articles containing suspected AI-generated texts from March 2026Risk managementScience policySelf-regulation

Maintenance Categories

View maintenance categories (1)

Articles for deletion

Top Contributors

View full contributor metadata (6)

Avatar317 · 1 edit · user ID 14614106 · first Mar 21, 2026 · last Mar 25, 2026

Btyner · 1 edit · user ID 185327 · first Mar 21, 2026 · last Mar 25, 2026

Enervation · 1 edit · user ID 15210045 · first Mar 21, 2026 · last Mar 25, 2026

LeoGe25 · 1 edit · user ID 50401091 · first Mar 21, 2026 · last Mar 25, 2026

Rodw · 1 edit · user ID 125972 · first Mar 21, 2026 · last Mar 25, 2026

RussBot · 1 edit · user ID 279219 · first Mar 21, 2026 · last Mar 25, 2026

Also Known As

Frontier Compliance Framework

View redirect record details

Frontier Compliance Framework · page ID 82747947 · Article

Why Deleted

AfD

"1) Fails Notability WP:GNG; subject not described in detail in Independent, Reliable Sources; 2) written by LLM see WP:NEWLLM 3) Self sourced WP:SPS and WP:PROMOTIONAL sourced almost entirely to Anthropic itsself. '''"

Nominated by Avatar317

Delete 3

Delete per WP:NEWLLM. This also counts as WP:COI because it is Anthropics own model that wrote the article per the creators edit summary.— Jumpytoo

Delete per WP:NEWLLM, this is disclosed to be generated.— Gurkubondinn

Delete Writing an article about one's self is a COI, and we don't allow LLMs to be used, Delete in either case.— Oaktree b

by Explicit

Articles for deletion/Anthropic's Responsible Scaling Policy (XFDcloser)

View AfD discussion ↗

Sources

ai-analytics.wharton.upenn.edu/...

anthropic.com/...

assets.anthropic.com/...

Additional preserved links are available in the archive details below.

Archive Inventory

View stored source record counts

Revision rows stored: 7

Outgoing links stored: 40

External links stored: 28

Templates stored: 61

Talk exports stored: 1

AfD exports stored: 1

Raw API payloads stored: 1

Image records stored: 0

View full source metadata

Outgoing Wikipedia links (40)

AI alignmentAI safetyAI Seoul SummitAmazon (company)AnthropicArtificial intelligenceArtificial intelligence safetyBiosafety levelBioweaponBoard of directorsCatastrophic riskCBRNClaude (language model)Claude Opus 4CNNCohereDaniela AmodeiDario AmodeiEU AI ActEuropean UnionExistential risk from artificial general intelligenceFrontier modelGeneral-Purpose AI Code of PracticeGoogle DeepMindHallucination (artificial intelligence)Jared KaplanMeta AIMETRMicrosoftNaver CorporationNvidiaOpenAIPete HegsethResponsible AI Safety and Education ActTime (magazine)Transparency in Frontier Artificial Intelligence ActUnited StatesUnited States Department of DefenseWeapons of mass destructionXAI (company)

Backlinks (2)

Frontier Compliance FrameworkTransparency in Frontier Artificial Intelligence Act

External links (28)

Templates (61)

AI-generatedAmboxArticle for deletion/datedCite webDated maintenance categoryDated maintenance category (articles)DMCAFind sources mainspaceFULLROOTPAGENAMEMain otherModule:ArgumentsModule:Category handlerModule:Category handler/blacklistModule:Category handler/configModule:Category handler/dataModule:Category handler/sharedModule:Check for unknown parametersModule:Citation/CS1Module:Citation/CS1/COinSModule:Citation/CS1/ConfigurationModule:Citation/CS1/Date validationModule:Citation/CS1/IdentifiersModule:Citation/CS1/styles.cssModule:Citation/CS1/UtilitiesModule:Citation/CS1/WhitelistModule:Disambiguation/templatesModule:Find sourcesModule:Find sources/configModule:Find sources/linksModule:Find sources/templates/Find sources mainspaceModule:Message boxModule:Message box/ambox.cssModule:Message box/configurationModule:Namespace detect/configModule:Namespace detect/dataModule:Ns has subpagesModule:PagetypeModule:Pagetype/configModule:Pagetype/disambiguationModule:Pagetype/rfdModule:Pagetype/setindexModule:Pagetype/softredirectModule:SDcatModule:StringModule:Template link generalModule:UnsubstModule:Wikitext ParsingModule:YesnoNOINDEXNowrapNs has subpagesPagetypeReflistReflist/styles.cssSDcatShort descriptionShort description/lowercasecheckTemplate link expandedTlxUse mdy datesYesno

📊 Wikidata

Wikidata Archive Data

Captured: April 8, 2026 4:17 PM

Main Menu

Anthropic's Responsible Scaling Policy

Background

AI Safety Levels

ASL-1

ASL-2

ASL-3

ASL-4 and beyond

Version history

Version 1.0 (September 2023)

Version 2.0 (October 2024)

Version 2.1 (March 2025)

Version 2.2 (May 2025)

Version 3.0 (February 2026)

Capability thresholds (v3.0)

Governance and oversight

Frontier Compliance Framework

Influence and adoption

General reception

Reception of v3.0 changes

Broader criticisms

See also

External links

Anthropic's Responsible Scaling Policy

Background

AI Safety Levels

ASL-1

ASL-2

ASL-3

ASL-4 and beyond

Version history

Version 1.0 (September 2023)

Version 2.0 (October 2024)

Version 2.1 (March 2025)

Version 2.2 (May 2025)

Version 3.0 (February 2026)

Capability thresholds (v3.0)

Governance and oversight

Frontier Compliance Framework

Influence and adoption

General reception

Reception of v3.0 changes

Broader criticisms

See also

External links

See Also

Paul Hopkin