After the Pentagon-Anthropic Clash, the ASI Gamble No Longer Checks Out
By Rufo Guerreschi — March 4th, 2026
The Pentagon-Anthropic clash isn’t a one-off dispute — it's a preview of where the current trajectory leads: progressive nationalization and militarization of AI firms, the erosion of alignment work, and the elimination of AI lab leaders' ability to even try to steer the ASI race toward good outcomes. The feared "human power grab" isn't coming with of a global AI treaty — it's already happening because we lack one. Paradoxically, a treaty led by distrustful superpower leaders would produce more resilient, decentralized governance — precisely because mutual suspicion demands actually enforceable terms. Our Deal of the Century initiative targets the most neglected chokepoint in AI governance: generating political will among the ~12 individuals who hold disproportionate influence over Trump's AI policy.
Many AI safety experts and leaders of top AI firms have been calling for strong global coordination and an AI treaty. But a central concern — especially among those very leaders — is that the necessary boldness of such a treaty, and the fact that it would inevitably need to be spearheaded by two "strongmen" superpower leaders, would lead to an immense and durable concentration of power.
This fear of an unaccountable global authority — often described as a"human power grab" risk by Anthropic’s Holden Karnofsky or as the risk of anAntichrist by Peter Thiel — frequently outweighs the perceived risks of losing control of AI, and consequent risk of extinction, which both acknowledge.
This fear is well-grounded — and it is leading a growing number of people in Silicon Valley, as Karnofsky among many others has acknowledged, to conclude that taking the Superintelligence gamble is the least bad option. This, even though they assign high probabilities to extinction and low probability that whatever values they embed in an ASI will persist long-term — assuming they get there before others do. Karnofsky himself,one of the originators of the RSP concept, published a detailed case for abandoning binding safety commitments on the same day Anthropic released its RSP v3.0.
The Pentagon-Anthropic crisis has changed this calculus fundamentally.
Last week, Trump ordered every federal agency to "immediately cease" using Anthropic's technology. Defense Secretary Hegseth designated the company a "Supply-Chain Risk to National Security" — a label previously reserved for foreign adversaries — because Anthropic refused to allow unrestricted military use of its AI for mass domestic surveillance and fully autonomous weapons. Hegseth had threatened to invoke the Defense Production Act, authority that legal scholars argue could extend to compelling the retraining of AI models themselves — stripping safety guardrails not just from contracts, but from training.
Dean Ball, one of the primary authors of the White House AI Action Plan, called this "the quasi-nationalization of a frontier lab.Thomas Wright of Brookings summarized the message to Silicon Valley: if you disagree with the government, "we will either partially nationalize your company through the Defense Production Act, or we will try to blacklist and ruin your company."
A Human Power Grab is not Happening because of an AI Treaty but because of its Absence
While surely those developments are a matter of concern, I'd argue that overall these developments show how the feared concentration of power is already occurring without a treaty, and primarily because of our lack of it.
This isn't just about one contract. AI lab leaders now face an unprecedented erosion of their agency on three fronts simultaneously.
First, values embedded in ASI will most likely not persist — even if developers embed ethical principles at the "seed AI" stage, there is no strong reason those values will endure after ASI has rewritten itself through recursive self-improvement countless times. We estimate a 40-70% probability they'll be discarded.
Second, even perfectly aligned AI can't prevent governments from deploying it for authoritarian ends — alignment solves for the AI's behavior, not for the uses to which aligned AI is put by those in power.
Third, and most urgently, labs are increasingly unlikely to even get the chance to try proper alignment at all.
A DoD official told Axios the government could "force Anthropic to adapt its model without any safeguards." The chilling effect is already industry-wide: xAI, OpenAI, and Google all signed "any lawful use" deals within days.
This dynamic leads to the militarization of top AI firms — one of OpenAI's five board members is the former director of the NSA — and makes their eventual de-facto functional nationalization increasingly justifiable, even inevitable — as Leopold Aschenbrenner has predicted. It could happen within months. Pressures like we have started seeing by the Pentagon's are bound to deeply compromise AI alignment work, as noted by Helen Toner,"One thing the Pentagon is very likely underestimating: how much Anthropic cares about what future Claudes will make of this situation."
In the absence of an AI treaty, the cascading effects are already visible in the erosion of voluntary safety commitments. On February 24 — the same day as the Pentagon's DPA threat — Anthropic released RSP v3.0, dropping its core pledge to pause model training if safety measures couldn't be guaranteed in advance. Anthropic Chief Science Officer Jared Kaplan told TIME: "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments…if competitors are blazing ahead." This is exactly the dynamic a treaty would solve: in an ever more brutal race among nations and firms, unilateral safety commitments become untenable. Only binding multilateral commitments can hold.
The ASI Gamble No Longer Checks Out
Many in the EA and AI safety communities have concluded that the ASI gamble is the least bad option — better a coin-flip than a treaty that concentrates power in an unaccountable global authority. We grapple with this dilemma daily, and sympathize deeply with these concerns. Yet, such reasoning had major issues well before the Pentagon-Anthropic clash.
This reasoning rests on four probability estimates that, we argue, are systematically underweighted: (1) ASI leads to human extinction (our estimate: 25-50%; the largest-ever survey of AI researchers found a mean of ~14%, and top figures like Hinton, Musk, and Amodei place it at 20-50%); (2) ASI will be unconscious (30-60%); (3) if conscious, ASI will be unhappy (30-60%); (4) ASI will discard its creators' embedded values after recursive self-improvement (40-70%).
These risks compound. Even taking the most optimistic bound of each — 75% survival, 70% values stick, 70% consciousness, 70% happiness — the joint probability of everything going right is only ~26%. The probability of at least one catastrophic outcome: ~74%. And the stakes are radically asymmetric: getting it wrong means the permanent end of conscious life as we know it. (SeeStrategic Memo v2.6, pp. 159-170)
A proper global AI treaty is the only mechanism that changes these odds — by preventing the race dynamics that make unilateral safety commitments untenable, and by creating the governance infrastructure that no single company or country can build alone.
Treaties built on mutual suspicion are more durable than those built on personal chemistry
A treaty-making process led by "strongmen" superpower leaders would, counterintuitively, be more likely to produce a resiliently decentralized governance regime. Such leaders are above all attached to their own power and national sovereignty — they would never accept a global treaty-organization that is overly empowered or intrusive in their national prerogatives.
This contrasts sharply with the historical model of treaties built on personal chemistry between leaders — like those between Gorbachev and Reagan. While praiseworthy, those agreements proved neither sufficient nor durable, precisely because they were premised on trust. Instead of "trust but verify," the AI treaty we need demands "trust or verify": a strict requirement for nations, firms, and citizens before they place confidence in any agreement.
Enforcement architectures based on zero-knowledge proofs, federated secure multi-party computation, and decentralized kill-switches requiring multi-nation consensus cannot be weaponized by any single actor.
Furthermore, such a treaty would reduce the pressure to centralize power and maintain national or bilateral secrecy, as it would require the participation, trust and oversight of a large majority of middle-power nations. (See Strategic Memo v2.6, The Global Oligarchic Autocracy Risk—And How It Can Be Avoided, and pp. 124–130)
From Assurance Contract to Treaty: A Concrete Mechanism
To break the current deadlock, even just two CEOs — from Anthropic, Google DeepMind, or OpenAI — could negotiate and present to Trump anassurance contract, as proposed by FLI's Anthony Aguirre: a mechanism whereby top AI labs formally commit to specific enforceable safeguards — such as bans on certain recursive self-improvement methods — binding only once a critical mass of other major players do the same.
But to succeed, we propose that such an assurance contract also include a specific call to Trump to pursue a US-China-led AI treaty with minimum requirements for both the treaty and the treaty-making process — to ensure the result is substantially better than just risking the ASI gamble. (See ourcase for AI experts for the full argument.)
A Critical Mass of Influencers is Needed to Sway Trump
To gain enough political power to do so, such CEOs should aim to join forces with other key potential influencers of Trump's AI policy that are sympathetic towards a treaty and/or deeply concerned about safety risks
As we documented in our 356-page Strategic Memo, these include Vance, Bannon, Pope Leo XIV, Gabbard, Sutskever, Carlson, and Rogan — and could potentially extend to Sacks (who has stated he thinks about loss of control "all the time") and even Kratsios.(See Strategic Memo v2.6, Persuading Trump's AI Influencers, pp. 170–323)
In our vision, together, they should pitch a bold, US-China-led global AI treaty directly to President Trump — similarly to how Oppenheimer, Acheson and others lead Truman to present the Baruch Plan — for history's boldest treaty proposal, for nuclear weapons — on the very date of birth of Donald Trump.
These lab leaders and influencers would present a pitch to Trump for "The Deal of the Century" on purely pragmatic terms, as a means to: (a) reliably future-proofs American leadership, (b) prevent China dominance, (c) avert potential safety catastrophe for the world and his family, while (d) avoiding the trap of a centralized, unaccountable global bureaucracy and autocracy.
At a time when his ratings are at their lowest, and US voters are increasingly terrified of AI, Trump could leave a legacy "worth 100 Nobel Peace Prizes", enabling him to quietly retire in 2029 widely cherished around the globe.
A Treaty Enforcement that Prevents both ASI and Authoritarianism
For this treaty to be successful — and overall reduce global concentration of power and wealth — the proposers should propose bare minimum requirements for both the treaty-making process and the final agreement. To start it should immediately launch, even before the treaty negotiations start, the joint development, at wartime speed, of mutually-trusted, beyond state-of-the-art, treaty enforcement mechanisms and diplomatic communications.
After an initial framing by the US and China — and an emergency bilateral interim treaty — the treaty-making process must involve the unique expertise of superpower security agencies, religious institutions, and independent experts, along with at least most middle-power nations. It must also safeguard the long-term innovation capacity and economic value of leading AI firms. The treaty should adhere to a subsidiarity-based model that is federalist and decentralized, ensuring it does not become a tool for a "human power grab" or global autocracy.
Most importantly, the treaty making process should utilize models and methods that can avoid the huge failure of the past, avoid capture by one or two nations, prevent a veto and make sure that a very competent treaty, open to future improvement is approved — all of that in a predictable amount of time.
Most importantly, the process must use treaty-making models that avoid the failures of past multilateral efforts: preventing capture by one or two nations, eliminating the veto, and delivering a competent, improvable treaty within a predictable timeframe
Among the possible models, we believe, is that the constitutional convention model may be the best. This model - inspired by the 1787 US Constitutional Convention as suggested by Sam Altman in 2023 — is the only one that can prevent the fatal use of the veto, succeed in delivering an extremely wide-scoped and fair treaty in short and predictable times, and ensure resilient subsidiarity terms. The model will be amended to be realistic: voting will initially be weighted by GDP to secure and future-proof US and China leadership, while still preventing a global duopoly. (See Strategic Memo v2.6, A Treaty Enforcement that Prevents both ASI and Authoritarianism, pp. 130–139)
Why This Is the Highest-Leverage Neglected Intervention.
No other organization is primarily focused on generating political will among the ~12 key potential influencers of Trump's AI policy to pursue a bold, timely, and proper global AI treaty. The AI governance field almost entirely neglects this chokepoint.
In 15 months, on just $75K, we've built a 356-page Strategic Memo synthesizing 667+ sources, engaged 85+ high-value contacts including 23 AI lab officials, and established direct pathways to 2 of 10 primary target influencers — at roughly $180 per high-value meeting. With 2-3 dedicated hires ($150K-$250K), we can leverage AI tools to transform this treasure trove into systematic persuasion at scale within weeks. (See 2026 Roadmap)
We are seed-funded by Jaan Tallinn's Survival and Flourishing Fund and have pending applications with EA Funds, Coefficient Giving, and other institutional funders. Bridge funding of $50K-$150K would extend our runway through the critical June 2026 convening window.
The Path to Rome – Bridging the Gap for the "Deal of the Century"
At the Coalition for a Baruch Plan for AI, we are actively working to build exactly the political bridge that an assurance contract would need. Our Deal of the Century initiative targets the convergence of key influencers — from AI lab CEOs to advisors to JD Vance, who has shownclear deference to the Pope on AI ethics — around a joint pitch for Trump to co-lead a US-China AI treaty.
A central step is our consultations leading up to our closed-door 1st Rome Convening of The Deal of The Century on June 4-5, 2026. These meetings aim to catalyze a "humanist AI alliance" that cuts across conservative, techno-humanist, and post-humanist among a critical mass of key potential influencers of Trump's AI policy, their advisors, introducers and envoys.
We are counting on the Vatican's unique aggregating role. Pope Leo XIV has positioned AI as central to his papal mission and his main AI advisor Paolo Benanti has led a call for a bold AI treaty with top AI scientists. His moral authority can help counter the accelerationist consensus that dominates Silicon Valley but is deeply minoritarian among US voters and MAGA leaders.
The political window is real but narrow. Support our work, join as an advisor, or — the single highest-value contribution — introduce us to our target influencers or their circles. Success is uncertain, but the alternative — watching the most dangerous gamble in history play out while a tractable intervention goes unfunded — is unconscionable.