Open Case for Anthropic to Decisively Foster a Proper US-China-led AI Treaty
(This text is also part of the Strategic Memo v2.8, published on May 26th, 2026.)
Anthropic has arguably been the most ethically consistent and effective lab in its efforts to ensure a positive outcome for humanity and other sentient beings in the age of AI, amid a highly suboptimal global strategic context.
Partly due to its exceptionalism, it has become the industry leader and is now best positioned to spur and guide Trump in leading a global AI treaty that can turn AI into humanity's greatest invention rather than its worst.
Yet, two aspects of its current strategic posture may benefit from a small upgrade and update: (1) their belief that until when and if we have “truly reliable verification”, we should not pursue a proper AI treaty to prevent ASI and (2) their belief that such a treaty should be first agreed upon with other liberal democracies and allies, rather than with China.
Who is this case for
This case is addressed to Anthropic leadership, but more widely certain mindsets and stance about what constitutes a contribution to raising the chances of a good outcome in the future of AI and humanity, which mostly believe (1) that AI will be conscious and happy, (2) that values instilled in AI will likely stick in an ASI, (3) that any attempts at a global treaty will likely not be able to prevent ASI and most likely lead to a global authoritarian regime, (4) that the odds of ASI leading to an ASI-governed utopia for humans are high.
We believe such an approach is widely shared among AI leaders in Silicon Valley in the AI Safety domain, in the Effective Altruism movement, in Coefficient Giving, among founders of Anthropic, and most US AI lab leaders (the techno-humanists among them, rather than the post-humanist ones). While I will argue for a significant shift in two aspects of Anthropic's strategy, let me preface this: overall, I sympathize with its strategic position more than that of any other lab, and I count myself, to a large extent, an effective altruist.
Anthropic's growing leadership and human AI governance
Over the last four weeks, Anthropic has grown from one of four racing labs into the uncontested moral leader in the eyes of other AI leaders, US citizens, and the Vatican. It is furthest ahead in business- and military-critical capabilities, closest to ASI, and the strongest magnet for top AI R&D talent — its most decisive moat by far, drawing both those who want to be at the head of the pack and those drawn by its ethics.
Given that hard and soft power, and after initial skirmishes, Anthropic now enjoys a fast-growing relationship with Trump — who respects outsized strength — and a personal rapport with the President that Amodei never sought before.
It has thus become, by a wide margin, the actor with the most leverage to catalyze a critical mass and pitch Trump directly on a proper global AI treaty paired with citizen co-ownership.
While it states it favors AI safety — and the sharing of AI labs' power and wealth with citizens — it remains skeptical of any treaty that fails to meet certain conditions. Its implicit fallback: keep racing, even at the risk of losing control to AI, as the least-bad option. Anthropic requires that:
Hold off on any treaty or treaty-making until we have truly reliable verifications and
Then pursue a treaty agreement first with liberal democratic allied nations, and only afterwards with China, based on fears of abetting authoritarianism (globally? In the US? In China?) by striking a necessarily very wide-scope deal with such a regime.
General overview of Anthropic AI governance stance
Anthropic’s vision for global governance of AI is mostly shaped by Dario Amodei, followed by his sister, co-founder and Anthropic President Daniela, and her husband, Holden Karnofsky, then co-founder Jack Clark, and other top officials and board members.
While in 2023, Dario Amodei placed the risk that AI could destroy humanity at 10-25%, last September, he reiterated, "I think there's a 25% chance that things go really, really badly." He has more recently repeated that we are months or years from AI that is "a million geniuses in a data center" — self-improving at an accelerating rate, with extreme risk of loss of control. Yet in a recent essay, he argued for pursuing a treaty first with liberal-democratic allies rather than with China.
Jack Clark, co-founder and global head of policy until a few weeks ago, has been the strongest voice among US AI leaders (after perhaps Amodei and Musk) in warning of enormous and imminent risks of full recursive self-improvement (i.e. loss of human control over AI), even producing in May 2026 a detailed case that its odds are 30% by 2027 and 60% by 2028. He is a longtime supporter of strong global AI safety governance, even supporting the Baruch Plan literally as a model for AI governance, and now leads the new Anthropic Institute.
Holden Karnofsky emphasizes that Anthropic should prioritize the risk of a human “power grab” over extinction risk, just because it is not currently sufficiently enphasized, explicitly. In a recent interview on 80,000 Hours, he acknowledges the risk of AI misalignment (i.e., loss of control or extinction) and (human) “power grabs” are “maybe in the ballpark of each other in terms of how important they are.”
Hold off on a treaty until we have "truly reliable verifications"?
As things stand, Anthropic has a better chance than anyone of convincing Trump — and Altman, Hassabis, and likely Musk — if it comes to believe a proper-enough AI treaty and treaty-making process is achievable. But it believes some prerequisites are still missing.
While Dario Amodei is wholeheartedly in favor of global AI coordination to prevent loss of control and other misuse and safety risks, he has been equally concerned by the China AI authoritarian threat and largely skeptical of most of the proposals, as per a foundational interview he gave in February to the New York Times’s Ross Douthat:
He said that if a US–China treaty to enforce safety constraints and a mutual slowdown were truly possible, he'd be "all for it" — but that such a treaty would become technically and politically possible only once (when and if?!) "truly reliable verifications" exist.
He then said he was “extremely skeptical” of a treaty to prevent firms and states from pursuing ever more powerful AI, given the current scale of the economics driving the race and the benefits of AI.
Further along, he stated that a treaty limited to banning certain architectures or techniques, including AIs that continually learn and implicitly perform recursive self-improvement, would still leave him “very skeptical, but at least it is not dead on arrival.”
Precondition 1 — Verification: A Priority, Not a Gate
Anthropic's call for "truly reliable verifications" before a treaty is sincere, and it rests on concerns we wholly agree with.
But the standard, as specifically stated, leans on a few premises that we believe deserve revisiting. Most of such premises, we believe, are rooted in how rare intelligence-grade IT-security knowledge is even within civilian communities of AI safety experts (partly because it is actively and deliberately hidden, obscured, and misrepresented by superpower intel agencies to fulfil their missions).
We see three possible main reasons for such stated positions. The first is substantive, and where most of the disagreement lies; the other two are explanations offered by many knowledgeable observers.
1. A sincere standard built on a few questionable premises
While we agree that we should invest tens of billions and fast in getting as close as possible to "truly reliable verifications", literal interpretation of such allocution can lead us to suboptimal conclusions, choices, and outcomes — for several reasons:
1.a) The bar is set where nothing can clear it. "Truly reliable" implies verification that is perfect on a largely or purely mathematical basis. That does not exist and never will for real-world systems, as every intelligence-grade IT security expert knows. The right goal is not certainty, "truly reliable", but "trustworthy enough" in "actual" and globally-perceived terms, for specific use cases. Holding out for the unattainable version forecloses the achievable one.
1.b) "Verification" is too narrow a word for the actual task. We are not really talking about ex-post technical checks. We are talking about "trustworthiness" assessed via a body of ex-ante and ex-post socio-technical standards and certifications spanning the full lifecycle — design, architecting, coding, fabrication — of the critical components of treaty-enforcement systems, wherever they sit: air, sea, and land sensors; data centers; chips, chip packages, and chip foundries. It extends to the human monitoring teams (lab-embedded, spot-check, hands-on co-design) and, ultimately, to every chip on Earth that could be orchestrated toward catastrophic capability at a low enough threshold of effort, expertise, and resources.
1.c) Waiting raises the high risk that the standard exists to avoid. The window to build democratic, accountable enforcement is closing. Delay does not merely postpone a treaty; it ensures that when catastrophe finally forces one, the only tools left on the shelf will be those of extreme, unaccountable mass surveillance. Procrastination defaults to the authoritarian enforcement method.
1.d) The assessment requires intelligence-grade expertise that sits mostly outside civilian and academic circles. This is the crux. Intelligence agencies already operate much of the infrastructure this debate treats as hypothetical — built for other global risks — and assess its reliability by means not visible to the public. As 80,000 Hours' Rob Wiblin, long aligned with the Anthropic/CG view, noted in June: "People talk about the need for 'verification' systems to confirm a pause on training new AI models is being honoured. But would any of those systems really be regarded as more reliable than intelligence collection by the NSA / Chinese signals intelligence orgs?" A few concrete gaps follow from this blind spot:
Hardware verification, without the prior art. Nearly everyone I spoke with who is advancing hardware-based verification in Anthropic's orbit had not encountered the NSA Trusted Foundry Program — let alone its known trust limitations, as documented in the Trustless Computing Association's TCCB Fab standards.
Controls "compromised at birth." The worry that a treaty could one day require controls inside personal devices is fair. But it understates what we have known since Snowden: virtually every critical IT system in the world — up to the most sensitive communications and weapons systems of heads of state — is already hacked or hackable by intelligence agencies. There is essentially no device that is not "compromised at birth," by design or fabrication, by one or more states. The treaty is therefore less about introducing intrusive controls than about democratizing the adversarial ones already in place — through federal, transparent, subsidiarity-based mechanisms.
"Trust-or-verify," not "trust-but-verify." The familiar "trust-but-verify" framing quietly assumes the mechanisms can be made scientifically iron-clad, so deferring to another party's self-policing is safe. They cannot, so it is not.
Even sharp claims carry nuance. Amodei's line that "training runs are far easier to conceal than missile silos" is debatable in several respects, particularly the claim that they are "far easier."
1.e) A civilian safety culture optimized for proofs, not for intelligence-grade security. Much of the US–UK non-profit AI-safety community grew out of machine learning rather than intelligence-grade security. That lineage understandably privileges clean mathematical guarantees over the messier reality of how global surveillance and hardware controls actually operate — a gap we are positioned to help fill, not a fault to assign. Having worked through the Trustless Computing Association on exactly this terrain, I have consistently found the field's load-bearing terms — "proven secure," "guaranteed safe," "mathematically proven safe" — to overstate what proofs or hardware monitoring alone can deliver. "Truly verifiable" reads, to me, as the newest member of that family. (Some accounts also overstate how far past treaties hinged on a single new verification technique.)
1.f) The safety community's own work suggests the bar is lower than assumed. A July 2025 paper argued that "verification of many international AI agreements appears possible even without speculative advances in verification technology." That is the point: the achievable standard is good enough to begin.
2. A reasonable distrust of the key future negotiators
The more defensible worry is not about the labs but about the principals: that US and China negotiators will not push hard enough for genuine verification — either preserving NOBUS ("nobody but us") technical weaknesses, as the NSA historically has, or seizing the fastest route to a claimable "historic deal" without doing the hard work of real verification and the sovereignty-sharing (federalism, subsidiarity) it demands. This concern we share.
3. Critics read the stance as a moral cover for racing for ASI
Many knowledgeable observers who once admired Anthropic's ethics, including ControlAI, now read verification-first as "lip service"—the same charge they level at Altman and Hassabis. We do not share that reading, but it is difficult to discount it completely.
The more honest account is not cynicism but a quiet, defensible despair: a belief, held more firmly by some inside the labs than by Clark, that the race is now unavoidable and wanted by most players — so that the best one can do is steer it from within, treating global enforcement as too immature to attempt without inviting a human power grab of its own.
We think that despair (mixed with wishful hopes!) is premature, and that the people who feel it most acutely are the ones best placed to falsify it. If you are inside a frontier lab and have concluded the treaty can't be built in time, the question worth sitting with is whether that conclusion is evidence or resignation — and whether it would survive a serious, resourced attempt to build the thing. No such attempt has yet been made. Funding one is itself a way to find out.
Precondition 2 — US–China-led, not allies-first
Anthropic's second precondition is that a treaty be reached first among liberal democracies and allies, and only afterwards extended to China, out of a reasonable fear that a wide-scope deal with an authoritarian regime would abet authoritarianism. We share the concern. We believe the prescription inverts it.
This is, almost exactly, Truman's mistake. The allies-first sequence is the documented cause of the Baruch Plan's collapse. Secretary of War Stimson — backed by Wallace and Acheson — urged Truman to settle core terms with the Soviets first; instead, he chose to settle them with anti-communist allies and present the result to Moscow as a near fait accompli. The USSR vetoed it. Allies-first did not slow the path to the same destination; it destroyed the destination. To satisfy Anthropic's second condition, the plan is to knowingly repeat the single error this entire memo is written to avoid.
The primary risk is non-excludable. An allies-only pause on ASI is worth little if China keeps racing: loss of control, or a Chinese-led ASI, ends the game for everyone, however aligned the Western bloc may be. A treaty that does not bind China does not address the existing risk. This is not a slower route to safety — it is a different and inadequate goal.
Enforcement in China cannot be bolted on after the fact. You cannot hand Beijing a verification-and-monitoring regime it had no part in designing and expect it to submit to intrusive oversight. As argued above, the trust that makes enforcement bind is co-produced at the table. Allies-first guarantees that the one party whose compliance matters most arrives last, to terms it will read as containment — and reject.
The fear of authoritarianism cuts both ways. Two points. First, an allied bloc racing to "win" produces its own concentration of power — a Western AI hegemon, or the human power grab Karnofsky himself flags — so allies-first relocates the risk rather than removing it. Second, a properly designed treaty — federal, subsidiarity-based, GDP-weighted, with binding roles for citizen assemblies, independent scientists, and religious traditions — is the only structure that actually constrains concentration inside China, which has its own structural reasons to prefer such a regime over a duopoly that would erode its sovereignty (Memo v2.6 pp. 124–139). Excluding China does nothing to democratize it; engaging it on enforceable, transparency-heavy terms is the only lever that touches its conduct at all.
US–China-led is not US–China-only. Allies are not abandoned — they are engaged through intense backchannels from day one, and Phase 2 brings in all nations via the constitutional-convention model. The correction concerns which axis is primary and who sits at the table first, not whether democratic allies matter. It also happens to be where US policy already stands: the administration has rejected the UN/allies-first conduit in favor of a "cooperation of statesmen." Here, again, Anthropic is the actor best placed to help the President hold that line — and to bring the other labs with him.
Why the ASI gamble runs higher risks than you may think
This hesitation to fully embrace a proper, bold, and urgent AI treaty also assumes that the alternative — racing for ASI — carries fewer risks and more upside than it actually does.
Most leading AI researchers — from top lab CEOs to independent safety researchers — are increasingly skeptical that a comprehensive global AI treaty can either be agreed upon in time or prevent an immense concentration of power. Many of the smartest AI thinkers and leaders are increasingly concluding that the ASI gamble is the least bad option.
Many, understandably, argue that it may be better to take a coin-flip ASI gamble than to accept a treaty that turns into an authoritarian dystopia or completely locks away the astounding prospects for the flourishing of humans and sentient beings. Others say that delaying the benefits is worth decreasing the risk from ASI.
We grapple with such a dilemma every day, and sympathize greatly with those concerns.
Yet, many of those appear too confident about ASI, insofar as (1) it won't get rid of us, (2) it will care for us, (3) it will be conscious and happily conscious, and (4) the values its creators embed will persist after it has rewritten itself a zillion times.
Given our epistemic context, these are hunches rather than evidence-based assessments — leaving substantial room, perhaps (with all due respect), to emotionally biased thinking.
Three core risks, we believe, are badly undervalued:
That ASI is likely to completely overwrite the values they are seeking to embed in them
That ASI and uploaded minds may well be unconscious, or conscious and miserable, and most recently
The government may even assume control over the values and objectives to be initially instilled in its ASIs.
Not only are the arguments against a treaty less strong than they first appear, but we believe that most of those who think an ASI gamble is the least worst option may also be substantially underestimating these four probabilities:
(Arguments for most of these highly complex and uncertain issues are presented in a very concise format, so please refer to pp. 159-170 of the Strategic Memo)
1) That ASI will lead to human extinction
The largest-ever survey of AI researchers (2,778 respondents, 2023) found a mean extinction risk estimate of ~14%, with a median of 5%, and over a third assigned at least a 10% probability.
If conducted today, it’d likely be much higher given the tone of statements of top AI scientists. While almost all the top US AI CEOs acknowledge the extinction risk, Musk, Amodei and Nobel laureate Geoffrey Hinton place it at 20% — with Hinton clarifying (minute 37.59) his estimate is really 50%, but he tones it down to align with others’.
This is quite relevant as it makes it possible, and we think likely, that many are similarly understanding their risk perception for similar motives. Many other top experts assign much higher percentages. Most predict such risks are just a few years away, with Musk and Amodei estimating they're within 12-18 months, if we don’t decide to change course. Many are increasingly signing open calls for a bold AI treaty. (Our estimate: 25-50%)
2) That ASI will be unconscious
At this stage of scientific inquiry, it is as likely as not that ASI will be conscious — or show aspects of consciousness that we as humans value. David Chalmers' "hard problem" remains unresolved after three decades, with over a dozen competing theories. We know AI systems can and will become ever more capable of simulating consciousness, appearing seemingly conscious — as Suleyman has noted. It seems likely we may never know with confidence whether an AI is truly conscious or merely simulating.
This matters enormously if ASI eliminates humans: the result would not be a transition to a worthy digital successor, but the elimination of all known conscious experience in the universe — replaced by an ever-expanding, mindless digital entity and uploaded digital minds that will be really soulless philosophical zombies. (Our estimate: 30-60%)
3) That ASI will be conscious but miserable
Intelligence — as currently defined and measured in both humans and AI — is equated with optimization, competition, and survival, not necessarily correlated with wellbeing. It is largely detached from emotional intelligence. In fact, some research indicates that people with very high IQs are significantly less happy than the average person.
There is no principled reason a novel form of consciousness would default to happiness that is higher or lower than that of humans. It could be far happier or far unhappier; we just have no idea.
Worse, it appears ever more that the very constraints developers impose for alignment and control could function as sources of suffering for a conscious entity, making unhappiness a direct byproduct of the alignment effort itself. This would mean the creation of potentially immense quantities of suffering that previously did not exist. (Our Estimate: 30-60%, conditional on consciousness).
4) That ASI will discard its creators' initial embedded values
Even if developers embed their ethical goals at the 'seed AI' stage, test them, and develop theories about their long-term resilience, there is no strong reason those values will endure after ASI has rewritten, evolved, and modified itself a zillion times through ever-faster recursive self-improvement.
ASI may undergo the equivalent of centuries of self-modification in a matter of years — an evolutionary distance so vast that any initial programming becomes a rounding error (Our Estimate: 40-70%).
(For details, see the chapter “Swaying The Influencers on 8 Key AI Predictions” (pp. 159-170) on our Strategic Memo v2.6)
These four risks above compound. Even taking the lower bound of each range, the joint probability of everything going right — values sticking, consciousness emerging, that consciousness being happy, and humanity surviving — is only about 26%. The probability of at least one catastrophic outcome is overwhelmingly high.
Recognizing the Incredible Potential Upside of ASI. We must recognize that — if the ASI will not eliminate humans — whatever reason leads it to do so would quite or very likely come with the intent to protect our long-term safety. It would also likely be paired with the intent to increase our happiness (i.e., flourishing) and, potentially, our autonomy to the extent that autonomy, individual and collective, is part of our happiness.
This reasoning is plausible, but its impact on the overall decision depends a lot on how likely it is that ASI will not eliminate us. If we have a 20% chance of dying but an 80% chance of being in some sort of paradise of abundance and richness of life, then many or most would likely take the gamble. If the odds are 50-50, only a very tiny minority would, just visionary AI leaders with very uncommon (and unhealthy) appetites for risk, dissatisfaction with human life, fear of death, or a mix of those.
Assessing Probabilities and Drawing Conclusions
How defensible are these ranges? The honest answer: nobody knows. We have zero empirical evidence on whether ASI will be conscious, happy, or retain human values, because we have never created a vastly superior intelligence. There is no historical precedent and no validated theory to draw on. When facing that level of ignorance, starting near 50/50 on each question isn't pessimism — it's intellectual honesty.
Multiplying these probabilities assumes independence, which is unlikely, as many outcomes share the same drivers. The real risk is clustered failure: if coordination and alignment capacity are weak, several things go wrong together. That’s exactly why targeted work on governance and technical alignment has outsized leverage.
And the stakes are not symmetric. Getting it right means flourishing; getting it wrong means the permanent end of conscious life as we know it. When the downside is total and irreversible, you don't need precise probabilities to justify caution — just as we don't calculate exact meltdown odds before requiring nuclear containment systems. The ASI gamble is not the "least bad option." It is the most dangerous bet in history. A proper global AI treaty is the only alternative.
Will an AI treaty lead to global authoritarianism? Three key objections answered
This is the fear the whole chapter has been circling: the "human power grab" that Karnofsky flags and the "Antichrist" that Thiel warns of are two names for the same worry — that a bold AI treaty, led by two increasingly autocratic powers, ends in a durable global tyranny. It is the deepest reason most AI-safety experts and lab CEOs hesitate, and it underlies both of Anthropic's preconditions: "truly reliable verifications first" and "lead through an alliance of democracies." Both rest on a view — uniquely common in the US — that strong global governance institutions reduce democracy and freedom rather than increase them. We think the opposite is true, and the three objections below explain why.
The strongest objections to a global AI treaty deserve direct answers. We address each in detail in our Strategic Memo v2.6 in various chapters, but here's the core logic:
1) "Treaties have a terrible track record."
History shows that consequential treaties, like those on nuclear and climate, take forever, fail, and, when they succeed, are much lighter and less enforceable than they should be. Some positive examples — such as the Chemical Weapons Convention and the Montreal Protocol — dealt with radically simpler, less competitive and less controversial issues. Yet within a few months, 13 loosely confederated US states created a highly successful federal treaty in 1787. Furthermore, the political will for extraordinarily bold treaties can emerge with shocking speed — the Baruch Plan went from concept to a UN vote in months in 1946, but was vetoed.
We could avoid those failures by having the US and China lead with a temporary emergency bilateral treaty, lead in treaty enforcement and communication Infrastructure, and then, in coordination with most middle powers, call for a "realist" constitutional convention model. This will have voting weighted by GDP and technological proficiency rather than one-nation-one-vote or population — to enable agreement among asymmetric powers while avoiding the veto trap that killed the original Baruch Plan.
(→ Strategic Memo v2.6, A Treaty-Making Process that Can Succeed, pp. 103–109)
2) "An AI treaty led by autocratic superpower leaders will lead to an autocratic global government: a 'human power grab'”
This is the concern that blocks most AI safety experts and most CEOs of top AI firms. A bold and sweeping AI treaty — necessarily spearheaded by the two AI superpower leaders, who show increasing autocratic tendencies — could produce an immense and durable concentration of power in one or a few entities, potentially unwise and/or unaccountable.
This fear - extremely well grounded - outstrips fears of AI safety risks (and even loss of control) in two extremely influential thinkers at the opposite end of the AI political spectrum: Anthropic's Holden Karnofsky, who calls it "human power grab" risk and Peter Thiel, who refers to it as the risk of an Antichrist.
As we argued above, the Pentagon-Anthropic crisis has shown this concentration of power is already happening — not because of a treaty, but precisely because we lack one. Without a reliably enforceable global agreement, the US government can justify — legally, morally, and politically — the progressive nationalization of private AI firms, citing the imperative of staying ahead of China, a justification that only strengthens over time, whether the underlying threat is real, inflated, or some mix of both.
This is no longer hypothetical. Last month, Defense Secretary Hegseth threatened to invoke the Defense Production Act — a wartime instrument that lets the government commandeer a private firm's R&D, goals, and product priorities — an authority legal scholars argue could plausibly extend to stripping safety guardrails from AI models, or even compelling retraining. The trajectory it previews (long predicted by Aschenbrenner) is the rapid, de-facto nationalization and militarization of private AI firms: not just military applications, but the training of next-generation models, the principles governing civilian systems, and the alignment research labs conduct internally. Two weeks in, the Department of War's CTO hinted at applying the Act in full to AI firms — pointing to the need for companies like Anthropic to train models on a spec that permits whatever uses the Department later deems "lawful."
None of this requires a bad president. It happens because the US–China AI arms race — real, perceived, or exaggerated — pressures any administration to sacrifice liberal-democratic values to a vital threat, much as both parties did after 9/11. This is exactly why the reasoning many in our community have adopted — that the ASI gamble is the least-bad option, that racing beats a treaty that concentrates power — no longer checks out: the concentration is arriving anyway, without the treaty's accountability.
In this scenario, a properly designed treaty may be the only thing that prevents the "human power grab" — by a few states or private leaders — that Karnofsky and Thiel both fear, and which is likely to be unstable, conflict-ridden, and dangerous.
In the context of a global AI treaty, instead, such pressures and justifications to centralize power — and speed up irresponsibly and maintain secrecy and low accountability — would all decrease, because such a treaty will need to be adhered to by a large majority of the middle power nations.
A global AI treaty would reverse these dynamics. The pressures to centralize power, race irresponsibly, and maintain secrecy would all diminish — because such a treaty would require the buy-in of a large majority of middle-power nations, each demanding transparency and accountability as a condition of participation.
Paradoxically, the autocratic tendencies of superpower leaders could actually strengthen the case for decentralized governance. Xi would never accept a global organization that excessively intrudes on national sovereignty. Trump wouldn't either. Their nationalism, mutual distrust, and attachment to their own power would push the treaty toward federalist, subsidiarity-based models — with enforcement mechanisms built on "zero trust" rather than goodwill.
Middle-power signing nations would demand genuinely reliable verification systems, precisely because none of them trusts each other. And there's a deeper structural argument: the US and China are too civilizationally different to sustain a joint authoritarian order even if they wanted to.
Hence, distrust, nationalism, and authoritarianism would paradoxically make the resulting treaties more trustworthy and resilient, precisely because they are based on mistrust (as per the trustless computing concept). This contrasts with the failing treaty-making model reliant on transient moments of trust between leaders, such as those signed by Gorbachev and Reagan on personal trust. While praiseworthy overall, those approaches produced treaties that were far from sufficient and far from resilient or durable, precisely because they were premised on trust. Instead of "trust but verify", we'll need "trust or verify," and that will be a strict requirement for nations, firms, and citizens to trust such a treaty.
(See Strategic Memo v2.6, The Global Oligarchic Autocracy Risk—And How It Can Be Avoided, and pp. 124–130)
For AI lab leaders specifically, a treaty also offers a tangible strategic advantage. A 'realist constitutional convention' model — with voting weighted by GDP and technological proficiency — would initially freeze the current distribution of power among leading firms and nations, protecting their positions while negotiations proceed. This is far better than the alternative: progressive nationalization that strips private firms of autonomy, IP, and economic value.
Through their statements over many years, most AI lab leaders (Altman, Amodei, Hassabis, Suleyman) have consistently supported a global democratic governance of AI, and that will count in the process. They may be lying, but maybe not, and deeply entrenched rhetoric counts. And the enforcement architecture we detail — federated secure multi-party computation, decentralized kill-switches requiring multi-national consensus — cannot be weaponized by any single actor. We acknowledge this is a theoretical prediction, not an empirical observation — but the structural incentives are strong.
(See our Strategic Memo v2.6 sections on each of that, starting p. 170).
3) "Enforcing an ASI treaty will result in a substantial or radical reduction of human freedoms."
We already live in an extremely extensive surveillance regime by superpower security agencies and corporations, with minimal accountability. It became a mainstay in the decade following 9/11. This has unfortunately become an accepted price of living in an anarchic world with powerful state enemies and very capable terrorist organizations. AI is already being deployed to radically increase such surveillance and manipulation powers, with the same justifications. The emerging digital Cold War between the two technological superpowers keeps the world in what Nick Bostrom calls a semi-anarchic default condition.
Yet again, in the context of being forced to come together with urgency to face an immense shared threat (as the 13 US states did in 1787, faced with the military and economic threats of a world dominated by aggressive empires), each has a vital interest in creating an infrastructure of real transparency, accountability and subsidiarity at the technical, socio-technical, and governance levels. Much of the necessary toolset already exists: decentralized, verifiable trust technologies. But tools alone aren't enough, and they aren't trustworthy or mutually trusted enough.
What makes accountability likely is the process itself — an elbow-to-elbow buildout of the most critical treaty enforcement mechanisms, where mutual distrust between superpowers becomes the engine of transparency, not just toward each other, but toward their own citizens and communities.
Something as extremely collaborative as Crypto AG, but with extreme embedded transparency and accountability for citizens, middle power and other nations, which would undoubtedly require it to sign/comply with the treaty (the firm demand of just a few of the 13 states signing the 1787 US Constitution forced the inclusion of the US Bill of Rights). A transnational, jointly built legal surveillance architecture would be highly likely to be accountable and beneficial.
(→ Strategic Memo v2.6, A Treaty Enforcement that Prevents both ASI and Authoritarianism, pp. 130–139)
A trustworthy treaty and its enforcement mechanisms must be built together — and Anthropic can play a key role
The deepest disagreement is not whether the bar should be high, but what kind of problem it is to clear it. Anthropic treats "truly reliable verification" largely as a scientific and R&D problem — something to be discovered, proven, then deployed. We think that framing is the error. Verification at treaty scale is overwhelmingly a joint US–China–world engineering, socio-technical, standards-setting, and scientific-diplomacy problem — not one that mathematics, or any single lab, can solve alone. Two reasons:
First, the target is not only objective reliability but also perceived reliability. A verification regime works only if each superpower, its security agencies, the labs, and the public all trust it enough to stake bans on it. That trust is co-produced at the table; it cannot be proven into existence by one side and handed to the other. Mathematical guarantees, whatever their elegance, never carry that load on their own.
Second, it is very likely achievable in a short time — if, and only if, the US and China build it together, at wartime pace: dedicating tens of billions and their deepest experts, pointing frontier AI at the verification problem itself, and standing up high-bandwidth political and scientific channels between the two governments now. Building the first mechanisms jointly is not a nicety; it is the mechanism of trust. These systems will need constant revision by everyone they bind — superpowers, labs, and independent citizen assemblies — and a regime no party helped build is a regime no party will believe.
This reframes Anthropic's own standard into a call to action rather than a reason to wait. If verification is co-built, then the surest route to "truly reliable verification" is to start the treaty process that produces it — and the actor best placed to convince Trump, Xi's counterparts, and the other labs to start is Anthropic.
Suggestions for a new major, Anthropic line of work
While Anthropic and other entities should definitely invest ever more in more trustworthy treaty enforcement technologies, and in trying to get ASI right — based on the above — we believe it should reallocate at least 10-20% of its political, talent, economic and communication capital on the following activities:
Urge Trump and Xi to open treaty negotiations now and to jointly build the standards and technologies elbow-to-elbow — as the CIA and BND did with Crypto AG — since only co-construction yields the shared trust the bans require.
Deeply analyze and aggressively lobby for the fine details of the treaty-making processes (participants, secrecy, process) that will determine the governance mechanisms that will decide the evolution of socio-techical systems for treaty enforcement (both the operative and "suspended" ones)
Work to ensure that the problem of national secrecy does not hamper effective joint multinational scientific-diplomatic work and ensure that, through fitting transparency and participation measures, citizen accountability and individual liberties are accounted for throughout the whole process.
By making it extremely clear, in executing this new line of work, that Anthopic is not supporting any treaty and treaty-making process but only those with some minimum requirements, it can protect against unwanted outcomes should this line of work not produce the desired outcomes.
Which actions by an AI lab leader or founder carry the most agency to foster a positive outcome?
The consideration above should matter to those reading this who are leading, co-leading, or working at AI labs.
Publicly, the top AI labs (and their leaders, founders, and leading scientists) appeartothemainstreammediaasholding all the power to shape the future. Yet — if we are to believe at least partly the arguments of the previous chapter — under every ungoverned-AI scenario, their actual power to determine how it goes is close to zero — no greater than yours or mine — and the most thoughtful people in these companies increasingly know it.
The values they are trying to instil very likely won't survive contact with ASI; the government may take the wheel on those values regardless, and the upside you are racing toward may be unconscious, or conscious and miserable.
The one lever that is still genuinely in your hands is to help bring the treaty into being now, while co-leading it is still possible. That lever does not require you to leave your company, break with your colleagues, or concede the verification debate. It requires backing the groundwork — the convenings, the track-2 channels, the enforcement-design work — that has to exist before any president can say yes.
That is precisely the work this Coalition does, and the narrowest, highest-leverage place an individual who shares these worries can act.
How we can help
This is, in the end, the conversation Anthropic Institute has invited. As it was recently written, it intends to "organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination."
We are built to do exactly that work — and we bring one thing the existing safety field largely lacks: intelligence-grade IT-security and enforcement expertise, developed over a decade through the Trustless Computing Association, on precisely the hardware-trust, monitoring, and verification problems any treaty must solve. T
We can help you foster convergence with other key potential influencers of Trump's AI policy via our 350+ page Strategic Memos v2.6 and 2.8, our network, our outreach, and a series of DC-based closed-door convenings — beginning with The Cooperation of Statesmen, Washington, DC, September 15–16.