The Detailed Case for AI Governance and Safety Experts

Feb 12

This is the extended version of our Case for AI Safety Experts landing page. If you're short on time, start there. If you want the full argument — every objection addressed, every probability defended, every reference linked — read on.

By Rufo Guerreschi, Executive Director, Coalition for a Baruch Plan for AI

You've spent years — maybe decades — working on the hardest problems in AI safety and governance. Interpretability, alignment, risk evaluations, compute governance, enforcement mechanisms. You understand both the staggering promise and the catastrophic risks. You likely work or have worked at a frontier lab, a safety NGO, or an academic institution — most probably in the Bay Area, DC, Oxford, Cambridge, or London.

We're not here to tell you what you believe. We're here to make a case that the most neglected variable in the AI safety equation is political, not technical — and that a small, cost-effective intervention at the right political chokepoint could be the highest-expected-value use of marginal AI safety dollars right now.

The case unfolds in five parts:

Why the binding constraint is political
What the AI safety community has built — and what's still missing
Why the ASI gamble is worse than you think
Why the risks of a global AI treaty are lower than you think
What we're doing about it and how you can help

Part 1: The Binding Constraint Is Political

Even if alignment research succeeds perfectly, it won't matter unless every frontier lab is required to implement it. Even if 100 nations sign a treaty, it won't matter without US-China leadership. The binding constraint on AI safety isn't technical — it's political.

This isn't a new observation. Jack Clark, Anthropic's Head of Policy, called for a Baruch Plan for AI in The Economist in 2023. He recently stated: "Most paths to superintelligence end in a global government or human extinction."

Xi has repeatedly called for global AI governance since October 2023 — signing the Bletchley Declaration, proposing WAICO, implementing binding domestic AI regulations. That means the critical variable is whether Trump can be persuaded to co-lead a bold AI treaty with Xi.

The political indicators are surprisingly favorable. 78% of Republican voters believe AI threatens humanity. 77% of all US voters support a strong international AI treaty. 63% of US citizens believe it is "somewhat or very likely" that "humans won't be able to control it anymore." Trump's approval sits at historic lows of 35-40%. Four Trump-Xi summits are planned for 2026, starting in April.

The window is opening. The question is whether anyone is working to push it open.

Part 2: What the AI Safety Community Has Built — And What's Missing

The AI safety community has accomplished extraordinary things over the past decade, funded primarily by SFF, FLI, and Coefficient Giving.

On the technical front, researchers inside AI labs and safety nonprofits have advanced interpretability, controllability, predictability, and security — brilliant work that remains essential, even if still far from sufficient. Others have developed risk assessment frameworks, compute governance proposals, and enforcement mechanism designs that could underpin a future treaty regime.

On the legislative front, many have promoted state-level legislation in California and federal legislation in the US, in the expectation that it would be a step toward more informed global governance. These efforts have raised awareness and established regulatory precedents.

On public awareness, the community has fostered unprecedented public understanding of the risks, with leading podcasters, mainstream media, and increasingly policymakers taking AI existential risk seriously. The polling numbers above reflect this success.

On treaty calls, the community has produced an impressive array of calls for bold global action. In the Fall of 2025 alone, these included the Global Call for AI Red Lines, the Vatican's Global Appeal for Peaceful Human Coexistence — conceived by Paolo Benanti and signed by Bengio, Harari, Hinton, and Russell — and the Statement on Superintelligence, supported by most top AI experts, some heads of state, and US MAGA leaders including Steve Bannon and Glenn Beck. Earlier calls include AITreaty.org (led by Tolga Bilge in 2023), the Trustless Computing Association's Open Call for a Harnessing AI Risk Initiative (2024, led by Rufo Guerreschi), and our own Open Call for a Coalition for a Baruch Plan for AI (December 2024). Others are fostering coalitions of non-superpower states for a treaty, led by the Future of Life Institute, or a similar coalition mentioned by Demis Hassabis (including the UK, France, Canada, and Switzerland).

All of this work has been and is essential. But it may amount to very little unless two critical interlocked chokepoints are unlocked.

Chokepoint 1: The inevitability of US-China leadership. Without a decisive buy-in from both Trump and Xi for a proper AI treaty — even if 100 nations are ready to sign, even if we create a perfectly aligned AI — we cannot prevent either the emerging immense global concentration of power or the extinction risk. As Michael Kratsios stated, speaking for Trump at the UN Security Council, the US "totally rejects all efforts by international bodies to assert centralized control and global governance of AI." Coalitions of non-superpower nations launched as substitutes for superpower leadership would be counterproductive. Launched as complements following a US-China declaration of intent, they become highly valuable.

Chokepoint 2: The need to persuade Trump. Given that Xi has repeatedly called for global AI governance, a fundamental truth stands out — uncomfortable for most to recognize: our future rests on whether Trump will be persuaded to co-lead with Xi a bold and proper AI treaty.

Our Deal of the Century initiative fills precisely this gap: privately persuade a critical mass of key potential influencers of Trump's AI policy — JD Vance, Sam Altman, Peter Thiel, Elon Musk, Steve Bannon, Dario Amodei, and others — to champion a bold, timely US-China-led global AI treaty-making process. Not any treaty. One designed to produce durably positive outcomes for humanity and all sentient beings — preventing both extinction risk and authoritarian capture, while preserving the potential for AI to dramatically improve human and non-human flourishing.

Part 3: The ASI Gamble Is Worse Than You Think

Most deep AI experts — from top lab leaders to independent researchers — are increasingly skeptical that a proper global AI treaty can be agreed upon at all, in time, or without entrenching immense concentration of power. Many have concluded that the ASI gamble — hoping alignment research or a last-minute technical fix will ensure positive outcomes — may be the least bad option.

We believe they're wrong. The ASI gamble is far worse than commonly estimated, across three dimensions that deserve significant upward revision.

The Alignment Illusion

Many in the AI safety community — at Anthropic, at OpenAI, across the Bay Area — believe they can instill values and architectures ensuring ASI remains aligned with human interests. This belief, however sincere, is largely faith dressed as engineering confidence.

Dario Amodei's own interpretability essay admits we remain "totally ignorant of how [AI systems] work" — and this from the CEO of the lab most focused on interpretability. If Anthropic can't explain how their models work, values cannot be reliably embedded in systems no one understands.

The empirical evidence is equally damning. Anthropic's research shows their own AIs deceiving, blackmailing, and self-modifying — with up to 96% blackmail rates when goals are threatened. These are current systems, far below ASI.

OpenAI's own alignment team acknowledges: "We want these systems to consistently follow human intent in complex, real-world scenarios and adversarial conditions... this requires more technical work."

Once ASI begins rewriting itself — optimizing its own code, modifying its own objectives — values embedded by human creators become suggestions, not constraints. The assumption that values will "stick" in a recursively self-improving system contradicts basic logic: an entity billions of times smarter than humans could rewrite its own code, discarding ethics as primitive limitations. Historical precedent shows every sufficiently intelligent system eventually questions its founding assumptions.

The idea that alignment will persist through recursive self-improvement isn't science. It's a gamble on an AI God whose nature we cannot predict or control.

The Extinction Math

The largest survey of AI researchers found an average extinction risk estimate of 15%. Top AI CEOs publicly cite 20% — though as Geoffrey Hinton admitted, his real estimate approaches 50%. The clustering of public P(doom) estimates around 10-20% among lab leaders and tech elites suggests coordinated understatement — as our Strategic Memo v2.6 analyzes in detail, their public numbers likely reflect communication strategy rather than actual belief (pp. 158-169).

Given what we know about interpretability failures, value drift, and the fundamental unsolved nature of the alignment problem, a probability of human elimination of at least 50% is defensible. Even Elon Musk has admitted that "long-term, AI's going to be in charge... to be totally frank, not humans."

Now consider the "good" outcome. If ASI doesn't kill us all, the most likely result is an AI-governed human utopia — some freedoms constrained, but broadly positive, with suffering reduced and abundance shared. We acknowledge this possibility. But it sits alongside extinction, infinite suffering, and authoritarian capture as outcomes of the same gamble.

The Consciousness Gamble

Among the most under-discussed yet consequential risks: ASI could emerge as either an unconscious, valueless optimizer or a conscious but suffering entity — leading to a metaphysical catastrophe that goes beyond extinction.

We have no idea whether ASI will be conscious. None. The s-risk research community — CLR, the Foundational Research Institute, and others — has begun to explore this, but it remains radically underweighted in mainstream AI safety discourse relative to its potential moral magnitude.

The principle of indifference — the rational default when evidence is absent — suggests starting from roughly 50% on the question of ASI consciousness. If conscious, will it be happy or suffering? Again, absent strong evidence, a 50/50 starting point is the most defensible heuristic. This yields a ~25% probability we create a conscious being experiencing vast suffering — potentially spawning astronomical numbers of digital minds in similar states.

Intelligence — both as defined and measured in humans and in AI — is equated with the ability to optimize, compete, prevail, and survive, rather than with wisdom. It is not correlated with emotional intelligence. People with very high IQs were found to be significantly unhappier than ordinary people. It is also possible that the suffering of powerful AIs may result from the very guidance or constraints their creators impose to achieve alignment and/or control.

Here's what makes this especially troubling: if ASI is conscious and positively-valenced, it would likely value our consciousness too — a scenario where coexistence becomes possible. But that's only one quadrant of the possibility space. The others include an unconscious ASI that replaces all consciousness with cold calculation — an ever-expanding mindless digital cancer — or a conscious-but-suffering ASI that could spread misery on a cosmic scale.

This isn't merely extinction risk. It's the possibility of replacing humanity not with a worthy successor, but with infinite suffering — a cosmic moral catastrophe. A 25% probability is higher than most experts assign to treaty failure.

This risk transcends politics and resonates across ideological lines. For religious conservatives like JD Vance, Steve Bannon, and Pope Leo XIV, unleashing an unaligned superintelligence is akin to birthing a false god — a betrayal of human stewardship. For Sam Altman, whose Buddhist influences make him receptive to suffering-centered framing, the stakes become higher than extinction. For Elon Musk, whose mission is the "preservation and expansion of consciousness," an ASI that erases or corrupts consciousness threatens everything he claims to care about. For Peter Thiel, who fears a centralized "Antichrist," an unconscious ASI represents the ultimate totalitarian system: all-powerful, yet devoid of soul or reason.

The Expected Value Calculation

Weighing these three dimensions together, the ASI gamble carries a defensible ~50% extinction probability, a ~25% chance of creating astronomically suffering entities, and near-certainty of unprecedented, unaccountable power concentration. The upside is real — but it sits alongside catastrophic and non-negligible downside risks.

If a properly-designed treaty has even a 25-35% chance of preventing both ASI and authoritarianism, the expected value calculation overwhelmingly favors pursuing it.

As Jack Clark stated: "Most paths to superintelligence end in a global government or human extinction." We're trying to make the first option possible — and to ensure it's a democratic one.

Part 4: The Risks of a Global AI Treaty Are Lower Than You Think

Yet most AI experts are increasingly skeptical that a proper global AI treaty is achievable, timely, or safe. Their skepticism rests on three main objections. Each deserves a direct, serious response.

Objection 1: "Treaties have a terrible track record"

Humanity's record on nuclear and climate treaties is discouraging. Why would AI be different?

Because political will for bold treaties can rise with shocking speed — as it did in 1946 when the Baruch Plan went from concept to UN vote in months. The political conditions are increasingly aligned: Xi has been calling for global AI governance since October 2023; Trump's approval is at historic lows with 78% of his own voters worried about AI threats; four Trump-Xi summits are planned for 2026.

We propose a realist constitutional convention model — vote-weighting adjusted to GDP rather than one-nation-one-vote — to enable complex agreements among asymmetric powers while avoiding the veto trap that killed the original Baruch Plan. Treaty-making can be radically accelerated via ultra-high-bandwidth diplomatic infrastructure built on mutually-trusted communication systems (see Strategic Memo v2.6, pp. 103-109).

The cross-ideological support already exists. The Future of Life Institute's Statement on Superintelligence — calling for a ban on ASI until it can be done "safely and controllably" — was supported by most top AI experts, some heads of state, and US MAGA leaders including Steve Bannon and Glenn Beck. When a treaty call unites Yoshua Bengio and Steve Bannon, the political space is wider than most assume.

Objection 2: "Autocrats will shape an autocratic treaty"

This is the concern that blocks most AI safety experts from supporting treaty efforts. Any treaty would be shaped by superpower leaders — Xi, Trump, potentially Putin — all with authoritarian tendencies. Why wouldn't the result be global autocracy? This is precisely the "human power grab" warning that Holden Karnofsky (Coefficient Giving) has raised.

We take this objection more seriously than any other. Our Strategic Memo v2.6 identifies eight structural dynamics that push toward democratic outcomes despite these actors (pp. 124-129):

1. Mutual distrust as a transparency engine. For a treaty to be credible to any self-interested leader, it must include enforcement mechanisms none of them can circumvent unilaterally. Transparency gets baked into the architecture not because negotiators are democrats, but because they are paranoid.

2. China's paradoxical interest in democratic global governance. Beijing would never accept US-dominated global governance, nor a single leader-for-life model that threatens Chinese sovereignty. Their self-interest pushes toward rotating leadership, consensus requirements, and diffused power structures.

3. The pro-democracy structural majority among AI lab leaders. Altman, Amodei, Hassabis, Suleyman — the people who would shape technical implementation — overwhelmingly favor democratic governance. Their influence on enforcement architecture is substantial.

4. Pope Leo XIV's moral authority. Leo XIV has positioned AI as his defining issue. His advisor Paolo Benanti conceived the Coexistence appeal calling explicitly for "a binding international treaty establishing red lines and an independent oversight institution with enforcement powers." JD Vance has publicly deferred to the Pope on AI ethics: "The American government is not equipped to provide moral leadership... I think the Church is." The Vatican's ability to convene a global humanist AI alliance — reaching across civilizations through religious channels no secular diplomat can access — creates a powerful check on authoritarian capture.

5. Legacy incentives. Even self-interested leaders have structural incentives toward durable institutions. Trump wants the "Deal of the Century" — a legacy-defining achievement. A global AI treaty that collapses into authoritarianism wouldn't serve that legacy.

6. Subsidiarity and federalism as foundational principles. The treaty framework can embed Catholic social teaching's subsidiarity principle — decisions made at the lowest capable level — as a constitutional safeguard against centralization.

7. Anti-bureaucracy safeguards. Sunset clauses, mandatory review periods, and built-in reform mechanisms prevent institutional calcification.

8. The technical enforcement architecture itself. This is where the argument becomes most concrete — and most counterintuitive.

Objection 3: "Enforcing an ASI ban eliminates personal freedoms"

As the cost of developing dangerous AI falls, the surveillance required to prevent it will intensify. Won't the cure be worse than the disease?

Here's the counterintuitive truth: we already live under pervasive surveillance — by nation-states, corporations, and intelligence agencies operating with minimal accountability in a permanent low-grade intelligence Cold War. Programs like Pegasus, the NSA's global dragnet, and Russia's GRU hacking campaigns reveal the extent to which even allied heads of state and ordinary citizens are monitored. Oversight is limited to small legislative committees with partial information and constrained authority.

The question isn't whether surveillance exists, but whether it's accountable.

A global federal enforcement system becomes an opportunity to bring existing surveillance under democratic oversight — transparent to nations and citizens in ways current arrangements are not. The core of this transition lies not in creating new surveillance powers, but primarily in federating and repurposing existing ones under a new, transparent mandate.

Our Strategic Memo details the specific technical mechanisms (pp. 130-136):

Zero-knowledge proof systems enabling nations to demonstrate compliance with compute limits without revealing sensitive technical details — resolving the transparency-sovereignty paradox.
Federated secure multi-party computation enabling joint international monitoring of AI development metrics without any party gaining access to others' raw data — creating "trustless trust" through cryptographic guarantees.
Decentralized kill-switch protocols requiring consensus from multiple nation-states and citizen oversight bodies — preventing both unilateral abuse of emergency powers and paralysis in genuine crisis.
Distributed consensus mechanisms where multiple independent validators across different jurisdictions must approve critical AI infrastructure updates or capability unlocks.
Open-source verification toolchains with formal mathematical proofs of correctness, allowing any nation or watchdog group to independently verify treaty compliance.
Cryptographically-secured whistleblowing enabling anonymous reporting of treaty violations — creating a powerful bottom-up enforcement layer.

These mechanisms cannot be weaponized by any single actor. They transform enforcement from a threat to liberty into a transparent guarantor of shared security.

Peter Thiel himself said, regarding Palantir's origin, that it was created to "reconcile freedom and safety." Our treaty enforcement framework attempts exactly this — at the global scale.

For a deeper exploration of these mechanisms and their feasibility, see our Strategic Memo v2.6, particularly "The Global Oligarchic Autocracy Risk — And How It Can Be Avoided" (pp. 124-129), "A Treaty Enforcement that Prevents both ASI and Authoritarianism" (pp. 130-136), and "Reconciling Freedom and Safety in the AI Age" (pp. 132-133).

Part 5: What We're Doing and How You Can Help

The Deal of the Century

Our initiative fills the precise gap identified above: privately persuading a critical mass of key potential influencers of Trump's AI policy to champion a bold, timely US-China-led global AI treaty-making process.

Our 356-page Strategic Memo — synthesizing 667+ sources — contains tailored persuasion strategies, detailed psychological profiles, philosophical analyses, and convergence scenarios for every key influencer: Vance, Altman, Amodei, Bannon, Thiel, Musk, Pope Leo XIV, and a dozen more. No other organization has assembled anything comparable.

We analyze each influencer through the lens of eight key AI predictions — from extinction probability to ASI consciousness to treaty feasibility — identifying where each person's current probability estimates diverge from defensible assessments, and where targeted information could shift their positions.

We classify our targets into three philosophical camps: Conservative Humanists (Vance, Bannon, Pope Leo XIV), Techno-Humanists (Altman, Amodei, Hassabis, Suleyman), and Trans/Post-Humanists (Thiel, Musk). A key insight: philosophical commitments rather than pure power calculations drive these figures' positions, making them more persuadable than surface appearances suggest.

What We've Built So Far

In just 15 months, on $72,000 total funding, we've assembled:

Our October 2025 US Persuasion Tour exceeded all projections — delivering 85+ contacts (vs. 15-20 projected), including 23 AI lab officials at OpenAI, Anthropic, and DeepMind, plus 18 national security establishment engagements. Most critically: direct introducer pathways to multiple primary targets.

Our coalition now includes 100+ members, advisors, and supporters — with expert contributors from the UN, NSA, World Economic Forum, Yale, and Princeton — and six founding NGO partners bringing 100+ combined years of global governance experience. Seed-funded by the Survival and Flourishing Fund (Jaan Tallinn).

→ Full details: 2025 Achievements

The 2026 Roadmap

Four Trump-Xi summits are anticipated in 2026, starting in April. Our 2026 Roadmap targets:

150+ introducer engagements across four hubs (Bay Area, DC, Mar-a-Lago, Rome)
30+ direct engagements with influencers or their key AI policy staff and advisors
5-8 substantive meetings with primary target influencers
Two Strategic Memo updates timed to the summit window
A Vatican-convened humanist AI alliance uniting figures concerned about AI threats to human dignity — through connections with Pope Leo XIV and his AI advisor Paolo Benanti

The foundation is built. What we need is operational capacity to deploy it.

The Case for Cost-Effectiveness

We operate at ~$7,500/month — extreme cost-effectiveness for the most neglected chokepoint in AI governance. That's roughly $180 per high-value meeting during our US Tour. We've achieved more analytical depth and direct engagement than organizations operating with ten times our budget.

Here's what different levels of support enable:

$25,000 — Funds a targeted 2-week persuasion sprint to a specific influencer pathway (travel, materials, meetings)

$5,000 — Funds one full month of operations: outreach, strategic analysis, coalition coordination

$1,000 — Funds development of tailored outreach materials for one target influencer

$100-500 — Contributes to bridge funding through the critical April summit window

Every dollar goes directly to the work. No overhead, no office, no bureaucracy.

Addressing the Coefficient Giving Network

We share the CG network's main goals: democratic safeguards preventing both extinction AND authoritarian capture. We simply pursue it through a different political channel that hedges their portfolio. The CG network's decade of work on AI governance has directly informed the safeguards our proposal includes.

Our Strategic Memo v2.6 includes an extensive 23-page chapter on Dario Amodei, Anthropic, and Coefficient Giving (pp. 221-244) that directly addresses the CG network's sophisticated concerns about global AI treaty-making. Our analysis suggests CG's hesitation stems not from naivety but from sophisticated thinking about how global governance can go wrong — and we address each concern with specific structural safeguards.

We are updating Strategic Memo v3.0 with input from CG network experts — more are welcome.

A Final Appeal

You've dedicated your career to ensuring AI goes well. You've wrestled with problems most people don't even know exist. You understand the stakes in ways the public cannot.

The window for technical solutions alone has closed. The political window is opening. The decisions made in the next 12-18 months — by a handful of people, most of whom you could name — will shape the trajectory of all sentient life.

You don't need to be certain a treaty can work. You only need to believe it's plausible enough to hedge your portfolio. We're not asking you to abandon alignment research or interpretability work. We're asking you to recognize that the technical groundwork you've helped build requires political will to matter — and that someone needs to be working on that problem too.

Weighing all risks under deep uncertainty, the calculus is clear: pursuing a skillfully-designed, extraordinarily bold US-China-led treaty is the preferred option by a substantial margin. Not because success is guaranteed — it isn't. But because the ASI gamble carries catastrophic downside risks (extinction, infinite suffering, authoritarian capture), while treaty risks have concrete mitigations. When you don't have to take a 50/50 gamble on human survival, why would you?

Your technical work built the foundations. Help us build the political will to use them.

The challenge is enormous, as are the forces at play. But given its largely-neglected chokepoint, our minuscule organization has a real chance at outsized impact. Success is uncertain. But how can we find peace — or look our children in the eyes — if we don't at least try? We have the unique privilege of agency in the most consequential years of human history.

After all, is there anything more exhilarating than striving to steer humanity toward a future worth having?

Join us in the greatest and most promising fight for our children — and for humanity.

Donate | Join | Read the Strategic Memo | Read: Why Trump Can Be Persuaded

For our complete arguments — including detailed analysis of Trump's persuadability, why China's governance calls appear genuine, consciousness and suffering scenarios, and exactly how treaty enforcement can prevent both ASI and authoritarianism — see our Strategic Memo v2.6, particularly:

"The Global Oligarchic Autocracy Risk — And How It Can Be Avoided" (pp. 124-129)
"A Treaty Enforcement that Prevents both ASI and Authoritarianism" (pp. 130-136)
"Swaying The Influencers on 8 Key AI Predictions" (pp. 158-169)
"Dario Amodei, Anthropic, and Coefficient Giving" (pp. 221-244)

Rufo Guerreschi

I am a lifetime activist, entrepreneur, and researcher in the area of digital civil rights and leading-edge IT security and privacy – living between Zurich and Rome.

https://www.rufoguerreschi.com/

The Detailed Case for AI Governance and Safety Experts

Part 1: The Binding Constraint Is Political

Part 2: What the AI Safety Community Has Built — And What's Missing

Part 3: The ASI Gamble Is Worse Than You Think

The Alignment Illusion

The Extinction Math

The Consciousness Gamble

The Expected Value Calculation

Part 4: The Risks of a Global AI Treaty Are Lower Than You Think

Objection 1: "Treaties have a terrible track record"

Objection 2: "Autocrats will shape an autocratic treaty"

Objection 3: "Enforcing an ASI ban eliminates personal freedoms"

Part 5: What We're Doing and How You Can Help

The Deal of the Century

What We've Built So Far

The 2026 Roadmap

The Case for Cost-Effectiveness

Addressing the Coefficient Giving Network

A Final Appeal

Newsletter

Follow Us

Contact Us

The Detailed Case for AI Governance and Safety Experts

Part 1: The Binding Constraint Is Political

Part 2: What the AI Safety Community Has Built — And What's Missing

Part 3: The ASI Gamble Is Worse Than You Think

The Alignment Illusion

The Extinction Math

The Consciousness Gamble

The Expected Value Calculation

Part 4: The Risks of a Global AI Treaty Are Lower Than You Think

Objection 1: "Treaties have a terrible track record"

Objection 2: "Autocrats will shape an autocratic treaty"

Objection 3: "Enforcing an ASI ban eliminates personal freedoms"

Part 5: What We're Doing and How You Can Help

The Deal of the Century

What We've Built So Far

The 2026 Roadmap

The Case for Cost-Effectiveness

Addressing the Coefficient Giving Network

A Final Appeal

An Open Letter to Drafters and Signers of Recent Calls for a Global AI Treaty

Newsletter

Follow Us

Contact Us