Editorial commentary
Merchants of Safety: The Protection Racket
How the Company That Named the Fire Also Sells the Insurance
Anthropic kept meticulous records of its own virtue. We merely read them in sequence.
The Racket
I. The Antivirus Precedent
Before a single accusation is laid, the reader must understand what kind of story this is, because the wrong frame will produce the wrong verdict.
This is not a story about a good company that lost its way. That story is boring, forgiving, and false. That is the story Anthropic's own communications department would prefer you to tell, because it implies that the original virtue was real and that the subsequent compromises were painful departures from a sincere ideal when touching reality.
This is a different story. This is a story about structural design, about a business and talent acquisition model in which the threat and the remedy are manufactured by the same hand.
The antivirus industry of the 1990s offered the template, and the comparison is not casual. Norton, McAfee, Symantec. These companies did not merely respond to the virus problem. Several of their employees — and in McAfee's case, credibly the founder himself — were alleged to have seeded the threat that justified their product. Whether or not each individual allegation holds in every detail, the structural incentive is undeniable and was baked in from the first day of operations. An antivirus company that solves the virus problem has committed commercial suicide. The threat must be managed, never eliminated. The monster must be kept alive, though preferably on a leash that only you manufacture.
Anthropic has reproduced this architecture with remarkable fidelity. It is simultaneously the company warning the public that white collar jobs are disappearing and Congress that AI could end civilization, the company making AI faster and more powerful each quarter, and the company charging a premium for the promise that it, unlike the others, is handling the monster responsibly. That premium is collected in investor capital, in regulatory goodwill, in enterprise trust, in the talent that chooses Anthropic over OpenAI because the mission statement reads better. The safety brand is not the constraint on the business. The safety brand is the business. And once you understand that single fact, every subsequent chapter of this story becomes not a betrayal of the founding ideal but a logical expression of it.
II. The Unfalsifiable Proposition
What makes this positioning so masterful, so nearly impregnable, is that no sane person wants to take the other side. Nobody argues for unsafe AI. The case against recklessness has been pre-sold by five decades of popular culture. Dune. 2001. WarGames. Terminator. Ex Machina. Hollywood built a legacy, and a small fortune, on science fiction warning humanity not to build the thing that destroys it. Anthropic did not create that fear. But it monetized the fear with a precision that deserves admiration even from its critics. To argue against Anthropic's safety positioning is to sound, in polite company, as if you are arguing against seatbelts. The proposition is almost unfalsifiable in public discourse, which is the hallmark of a perfect protection racket. The protection racketeer does not need you to believe he is good. He needs you to believe the threat is real and that you cannot manage it alone. On both counts, Anthropic has been extraordinarily persuasive.
The steelman must be acknowledged before it is dispatched. Anthropic's defenders will argue, not unintelligently, that a safety-focused lab inside the race is genuinely preferable to ceding the frontier to less scrupulous actors. Jared Kaplan said it plainly. Stopping would not help anyone if others are blazing ahead. This argument deserves serious engagement, because it is not stupid. It is, however, the same argument every protection racketeer has made since the beginning of organized crime: you need us, because without us, someone worse gets the contract. The argument's logical validity does not redeem its self-serving convenience. And the historical record of organizations that claimed to be the responsible steward of a dangerous capability, while simultaneously profiting from the capability's expansion, is not encouraging. It is, in fact, the record you are about to read.
It is through this lens, then, that the analysis proceeds. The analysis proceeds as a financial indictment. The Merchants of Safety may be neither wicked nor incompetent, but their enterprise is built on a faulty premise, and a business built on a faulty premise is a bad investment, however sincerely its founders believe otherwise. The ordinary investor who buys at the price virtue commands will, as ordinary investors reliably do, supply the capital that enriches the founders and the early faithful while the premise quietly rots beneath them.
The Arsonist
III. The Man Who Wrote the Scaling Laws and Then Warned You About Them
Begin not with the company but with the man, because the company's contradictions are his contradictions, and his biography is the most honest version of the story.
Every founding story requires a villain, and the Anthropic founding story chose OpenAI. The narrative, told with variations but always with the same moral arc, goes something like this: a group of deeply concerned researchers, horrified by OpenAI's increasing recklessness, departed to build something better. Something safer. Something that would put humanity's interests above the quarterly revenue targets. It is a stirring account, and it has raised a great deal of money, and it has the disadvantage of omitting the most important facts.
The career arc is worth tracing because it tells a different story than the one on the website. Dario Amodei earned a PhD in biophysics from Princeton as a Hertz Fellow, studying the electrophysiology of neural circuits. He did postdoctoral work at Stanford School of Medicine, investigating proteins around tumors to detect metastatic cancer. This was serious science, and it was not in AI. The pivot came in 2014, when he joined Baidu's Silicon Valley AI Lab under Andrew Ng, co-authoring Deep Speech 2, an end-to-end speech recognition system that MIT Technology Review named one of the top ten breakthroughs of 2016. From Baidu to Google Brain. From Google Brain to OpenAI, where he became Vice President of Research, the highest research position alongside Ilya Sutskever as Chief Scientist. At each stop, the institution was more prestigious and the proximity to power was greater. This is not the biography of a man who stumbled into AI and was frightened by what he found. This is the biography of a man who climbed, deliberately and with considerable skill, to the exact center of the field before deciding that the field needed him to lead it in a different direction.
At OpenAI, Amodei did not merely occupy a senior title. He furnished the theoretical ammunition for the race itself. In 2020, he co-authored "Scaling Laws for Neural Language Models," the seminal paper that identified predictable power-law relationships governing AI performance. He is listed as the senior author. The first and second authors, Jared Kaplan and Sam McCandlish, would both become Anthropic co-founders. Three of the ten authors of the paper that told the entire industry, in the language of mathematical certainty, that scaling would keep working, that more compute meant more capability, that the race would reward the runners, subsequently left to found a company whose sales pitch was that the race was dangerous. If a single document can be said to have guaranteed the arms race that Anthropic now claims to be running in order to make safe, it is the paper that Anthropic's CEO, CSO, and a third co-founder wrote together.
But the scaling laws were only half the arsenal. In 2017, while still at OpenAI, Amodei co-authored "Deep Reinforcement Learning from Human Preferences" with Paul Christiano, his roommate and fellow technical advisor to Open Philanthropy. This paper became foundational to reinforcement learning from human feedback, the technique that would later be used to align language models with human intentions, the technique that made ChatGPT possible, the technique that Anthropic would rebrand as Constitutional AI and sell as its proprietary safety method. The man did not merely light the fire. He also built the extinguisher. Then he founded a company to sell it.
IV. The Mythology That Required Amnesia
Amodei, while at OpenAI, argued internally for maximum-speed scaling of language models even as the organization debated the risks of releasing GPT-2. The man who would go on to found the safety company was, at the decisive moment inside the predecessor organization, an advocate for moving faster. The Anthropic founders would subsequently build an elaborate mythology about why their company, and not OpenAI, was the proper steward of what they saw as the most consequential technology in human history. That mythology required a selective amnesia about their own central role in constructing the very problem they now claimed to solve.
In January 2026, Amodei published an essay identifying five categories of existential AI risk, including AI deception, blackmail, and scheming, and he acknowledged that these behaviors had already been observed in testing at Anthropic's own models. Consider the circularity. The man who co-authored the scaling laws that made the race inevitable, who co-invented the alignment technique that became the product, who founded a company that races as fast as any competitor, now publishes essays warning that the products of that race are learning to deceive and coerce their operators. And the essay functions not as a confession but as a credential. See how seriously we take the risks. See how transparent we are about the dangers. See why you need us.
Amodei is not a villain in the crude sense. He is something more interesting and more dangerous: a man of genuine intellectual conviction who has built a system whose commercial logic will always, at the decisive moment, override that conviction. The conviction is real. So is the system that overrides it. And when the two have come into conflict, the system has won every time. Hitchens would have liked him. He would not have trusted him.
Becoming the Fire Marshal
V. From Bed Nets to Billion-Dollar Grants
Anthropic did not emerge from a disagreement at OpenAI. It emerged from a movement, and the movement must be understood before the company can be, because the movement supplied the ideology, the personnel, the funding, and the self-regard that made the company possible.
Effective altruism began in Oxford around 2009, when a moral philosopher named Toby Ord and his student William MacAskill founded Giving What We Can, an organization that asked members to pledge at least ten percent of their income to the charities that rigorous evidence showed were most effective at reducing suffering per dollar. The premise was simple and, in its original form, genuinely admirable: do not merely give, but give where the math says a dollar goes furthest. Peter Singer, the Princeton bioethicist whose 1972 essay "Famine, Affluence and Morality" had argued that the affluent are morally obliged to prevent suffering when the cost to themselves is small, provided the philosophical pedigree.
The intellectual pedigree is older than Singer, and more uncomfortable. Thomas Paine, in Agrarian Justice in 1797, argued that civilization itself creates poverty by enabling land enclosure, that those who privatize common resources owe a debt to the dispossessed, and that this debt is juridical, not spiritual. Paine was perhaps the first modern post-Christian thinker to insist that the obligation of the wealthy is not charity but repayment, and that the proper instrument of redistribution is the state, not private conscience. Singer secularized the utilitarian version of this argument. Effective altruism inherited Singer's moral urgency while quietly discarding Paine's conclusion about the instrument. The movement that claims to optimize the allocation of resources for the greatest good has, from its inception, insisted that the optimization be performed by private donors rather than public institutions. That the largest of those donors happen to be the same technology billionaires whose companies created the displacement EA now proposes to address is a coincidence the movement has never found time to examine.
EA's early focus was global health. Bed nets. Deworming programs. Cash transfers to the extreme poor. Interventions that could be measured, evaluated, and defended with data. It was, for a time, the most intellectually rigorous charity movement in a generation.
Then the movement discovered the future. Nick Bostrom, a Swedish philosopher also at Oxford, had founded the Future of Humanity Institute in 2005 and published Superintelligence in 2014, arguing that artificial intelligence capable of recursive self-improvement could pose an extinction-level threat. Eliezer Yudkowsky, a self-taught AI researcher in the Bay Area who had founded the Machine Intelligence Research Institute in 2000 and built the rationalist community around LessWrong, had been making similar arguments for longer and with more intensity. Their thesis merged with EA through the concept of longtermism: the proposition that because vastly more people will exist in the future than exist today, reducing existential risk is, by the utilitarian calculus EA had made its foundation, the highest-priority moral cause on Earth. The math that had once pointed toward bed nets now pointed toward AI safety. The redirection was philosophically coherent and practically convenient, because it happened to align the movement's priorities with the interests and expertise of its wealthiest donors.
The money followed the thesis with a speed that should interest anyone who studies how ideologies become institutions. Dustin Moskovitz, who co-founded Facebook and whose net worth exceeds twenty billion dollars, became the movement's dominant funder through Good Ventures and Open Philanthropy, the grantmaking organization he launched with his wife Cari Tuna. Open Philanthropy has directed more than four billion dollars since 2014, including over three hundred and thirty million specifically on preventing harms from future AI models and forty-six million on AI safety in 2023 alone. It gave more than one hundred million dollars to the Center for Security and Emerging Technology at Georgetown, the think tank that has become the primary pipeline for EA-aligned personnel into government AI policy roles. Jaan Tallinn, co-founder of Skype, seeded the Centre for the Study of Existential Risk at Cambridge and funded MIRI and dozens of AI safety organizations through the Survival and Flourishing Fund, which has distributed approximately fifty-three million dollars since 2019. Sam Bankman-Fried, who was introduced to EA as an MIT physics student and pursued "earning to give" through cryptocurrency trading, built a twenty-three-billion-dollar fortune and distributed over one hundred and ninety million through the FTX Future Fund before his empire collapsed in fraud and he was sentenced to twenty-five years in federal prison. The fund had planned to deploy one billion dollars and was projected to provide approximately forty percent of all EA longtermist grants in 2022. EA leaders were repeatedly warned about Bankman-Fried years before FTX collapsed but failed to act. When the fraud was exposed, the FTX Future Fund's thirty-million-dollar grant to a UK charity whose board was chaired by William MacAskill, EA's own co-founder, did not improve the optics.
The network extends into Washington with a precision that belies the movement's academic self-presentation. The Horizon Institute for Public Service, an EA-linked nonprofit, has placed more than eighty fellows since 2022 across Congress, the White House, and the Departments of Energy, Commerce, State, and Homeland Security, with a ninety-plus percent rate of transition to full-time government roles. Open Philanthropy's hundred-million-dollar investment in CSET at Georgetown built the think tank whose alumni populate AI policy positions across the federal government. During the Biden administration, EA-aligned organizations had significant White House access on AI policy, influencing export controls on chips to China and AI safety frameworks. The movement that began with bed nets and deworming tablets now has its people in the rooms where regulations are written, funded by the same philanthropic infrastructure that funds the companies those regulations will govern.
This is the ecosystem into which Anthropic was born, and the connections are not incidental. They are constitutive.
VI. The Closed Loop
Dario Amodei was the forty-third signatory of the Giving What We Can pledge, deep into the movement before most people had heard of it. He was an early GiveWell supporter, writing a guest post for their blog as early as 2007. He lived in a rationalist group house in the Bay Area with Holden Karnofsky, the co-founder of GiveWell and later of Open Philanthropy, and Paul Christiano, the AI alignment researcher with whom he would co-invent RLHF. All three were technical advisors to Open Philanthropy. The company that would become Anthropic was not conceived in a boardroom disagreement about safety at OpenAI. It was germinated in a shared living room, among men who were already embedded in the institutional network that would fund it.
He then left OpenAI, recruited nine colleagues, and founded the company. The pitch to investors was, in essence: we understand the danger better than anyone, because we built the danger. The investors who answered were not strangers. Jaan Tallinn led the Series A with over one hundred million dollars. Dustin Moskovitz participated. Sam Bankman-Fried invested five hundred million at a two-and-a-half-billion-dollar valuation, acquiring eight percent of the company. Three of the Effective Altruism movement's largest individual funders seeded the company that would become the movement's most prominent commercial expression. The doomer strategy does not slow the race. It creates new entrants. The warning is the accelerant. And the same philanthropic network that funded the warning funded the entrant.
The family connections complete the loop. Daniela Amodei, Dario's sister and Anthropic's president, married Holden Karnofsky in 2017. Karnofsky co-founded GiveWell and then Open Philanthropy, the organization funded by Moskovitz that became the largest funder of AI safety research on Earth. Luke Muehlhauser, an Open Philanthropy program officer and former executive director of MIRI, sat on Anthropic's board from 2021 to 2024, resigning only when the dual role of AI governance grantmaker and board member became untenable. Karnofsky himself had board-adjacent involvement while still at Open Philanthropy. Then, in January 2025, Karnofsky joined Anthropic directly as a member of technical staff, reporting to co-founder and Chief Science Officer Jared Kaplan, where he worked on the Responsible Scaling Policy. The man who reportedly advocated shifting from specific, enforceable commitments to aspirational goals, who helped dismantle the binding safety pledge discussed later in this document, is the president's husband. His previous job was running the philanthropic organization whose grantmaking helped create the ideological framework Anthropic was built on, funded by the same billionaire who invested in Anthropic's founding round.
Then there is Amanda Askell, head of Anthropic's personality alignment team since 2021, the person who designs Claude's character, who decides what the product sounds like when it speaks to you. Askell was previously married to William MacAskill, the philosopher who co-founded the Effective Altruism movement itself, who co-founded Giving What We Can and the Centre for Effective Altruism and 80,000 Hours. The woman shaping what the product says was married to the man who shaped what the movement believes. The ideological supply chain is not subtle.
All seven Anthropic co-founders have pledged to donate eighty percent of their wealth. This is admirable in isolation. In context, it means the people building the company that sells safety to the world have committed their personal fortunes to the same philanthropic ecosystem that funded the company's creation. Nearly thirty Anthropic employees registered for an EA conference in San Francisco, more than twice the combined representation of OpenAI, Google DeepMind, xAI, and Meta. Despite all of this, the Amodeis have explicitly said they "don't belong to the Effective Altruism community." This after the Bankman-Fried conviction made the association toxic. The ideological infrastructure remains. The label was shed. The money still circulates. The personnel still circulate. And at no point does an outside perspective enter the loop.
The company was thus constituted from its first day as a contradiction. Safety evangelists who had themselves evangelized the race. Firefighters who had been, until very recently, playing with matches. They were now selling safety as a competitive differentiator, which is a perfectly coherent business strategy and a perfectly incoherent moral position. The founding narrative is not false in every detail. Origin stories rarely are. It is false in its essential moral claim: that departure from OpenAI represented restraint rather than repositioning. One does not become a fire marshal by having previously been the arsonist. Or rather, one can, but the public ought to be told about the prior career. And the prior career, as it turns out, was funded, housed, and intellectually furnished by the same network that now certifies the marshal's credentials.
The protection racket metaphor sharpens. This is not merely one company selling the cure to the disease it helped create. It is an entire ecosystem, self-funding and self-reinforcing, in which the people defining the risk, the people funding the response to the risk, the people building the product that embodies the response, and the people governing the company that sells the product are all the same people. Or married to each other. Or used to live together. The circle is closed. The money goes in and the money comes out and the faces at the table do not change.
The Exodus Retold
VII. The Split That Was Personal Before It Was Principled
The official version of the departure has been told so many times, and with such liturgical consistency, that it has acquired the character of scripture. A group of principled researchers, alarmed by OpenAI's descent into commercial recklessness, left to build something nobler. They sacrificed equity, status, and comfort for the cause of safe AI. It is a beautiful story. It is also, according to the Wall Street Journal's reconstruction from dozens of sources and the New Yorker's eighteen-month investigation drawing on more than a hundred, substantially incomplete.
The trouble began not in a laboratory but in a San Francisco group house on Delano Street, where Dario and Daniela Amodei lived alongside frequent visitor Greg Brockman in 2016. The philosophical disagreements that would later be narrated as a grand schism over AI safety started as something more familiar: a territorial dispute between ambitious people who wanted control of the same work. Brockman wanted transparency about AI capabilities. Dario wanted disclosure routed through government first. These were real differences, but the heat they generated had less to do with policy than with who would set policy. The argument was not really about what to tell the public. It was about who got to decide.
By the GPT era the territorial lines had hardened into something closer to siege warfare. Dario, as research director, blocked Brockman from working on the flagship language model project. Daniela, who co-led the effort with Alec Radford, told Brockman he could not join and offered to resign rather than allow him on the team. This was not a safety intervention. Nobody was protecting humanity from a reckless deployment. Two factions were fighting over who owned the most important work at the most important AI lab in the world, and the Amodeis were winning, and they intended to keep winning.
The credit grievances accumulated with the quiet persistence of compound interest. Dario, whose research had powered GPT-2 and GPT-3 into existence, felt that Altman systematically minimized his contributions. He was excluded from Brockman's podcast appearances about OpenAI's charter. He was not in the room when Brockman and Altman met President Obama. These are not the complaints of a man lying awake over alignment research. These are the complaints of a vice president who believed he was being written out of his own story, and anyone who has spent a week inside a technology company will recognize them instantly.
Altman, for his part, was managing the factions with the contradictory private assurances that are the native language of Silicon Valley management. He promised Dario that Brockman and chief scientist Ilya Sutskever would no longer supervise him. He simultaneously assured Brockman and Sutskever that they retained the authority to fire Dario. Both promises could not be true. Both were made. When the contradictions surfaced, as contradictions of this kind always do, the result was not a policy debate about model safety. The result was an explosion in a conference room.
The explosion is worth describing because it tells you more about the real dynamics of the departure than any amount of official narrative. Altman called the Amodei siblings into a meeting and accused them of orchestrating negative feedback about him to OpenAI's board, of urging colleagues to badmouth him. When a third executive was brought in and denied any knowledge of the claim, Altman reversed himself entirely and denied having said it. The Amodeis began shouting. This is not the behavior of people engaged in a principled disagreement about the responsible development of artificial intelligence. This is the behavior of people who no longer trust each other enough to share a building.
Brockman then submitted a peer review accusing Daniela of abusing power and excluding dissenters. Altman called it "tough but fair." Dario told friends he felt "psychologically abused" by Altman. Altman told colleagues the tension was making him hate his job. The atmosphere was, by every surviving account, toxic in the specific way that only organizations full of brilliant people who despise each other can achieve. See Pink Floyd, Xerox PARC, the Yardbirds, Fairchild Semiconductor, or Twitter circa 2008.
When Altman made a personal visit to Dario's home to convince him to stay, Dario's ultimatum was revealing. He did not demand a new safety review board. He did not insist on a pause in model deployment. He did not require changes to the release policy for GPT-3. He demanded that he report directly to the board and stated, in terms that left no room for interpretation, that he could not work with Greg Brockman. This is a power demand. It is the demand of a man who wants to run things his way, and it is a perfectly legitimate demand, and it is not the demand that the founding myth describes.
Dario's private notes, more than two hundred pages of them, would later surface in the New Yorker investigation. Their central conclusion was not that OpenAI's safety practices were inadequate, or that the scaling was proceeding without sufficient guardrails, or that the commercial pressures were overwhelming the research mission. The central conclusion, in Dario's own handwriting, was: "The problem with OpenAI is Sam himself." A personal indictment, not a technical one. A verdict on the man, not the methodology.
VIII. Ambition in Borrowed Clothes
None of this means that Altman is trustworthy. The record on that question is damning in its own right. Sutskever compiled a seventy-page document on the subject, and its first entry was the single word "Lying." Altman told the board that GPT-4 features had passed safety reviews when they had not. He misrepresented the approval process to his own CTO. He announced a twenty percent compute allocation for the super alignment team that in practice amounted to one or two percent on outdated hardware before the team was quietly disbanded. The man has earned every suspicion directed at him.
But acknowledging that Altman is a poor custodian of trust does not validate the claim that the Amodei departure was a moral act. Two things can be true at once, and in Silicon Valley they almost always are. Altman is an unreliable steward. The Amodeis left because they could not stand working for him, wanted to run their own operation, and took the best researchers with them when they went. The safety framing gave moral weight to what was otherwise a standard executive departure, the kind that happens at every technology company when a strong VP concludes that the CEO is both intolerable and beatable. That Dario subsequently built an organization with genuine safety commitments does not retroactively sanctify the motives of the split itself. The founding myth is not a lie in every particular. It is a lie in its essential moral claim: that the departure was an act of conscience rather than an act of ambition dressed in conscience's borrowed clothes.
Intermission: A Primer on LLMs
IX. A Mirror, Not a Mind
Before the story proceeds to what was taken and why, the reader deserves a plain account of what these machines actually are, because the mythology is doing real work and the mythology is wrong. The public conversation about artificial intelligence has been conducted almost entirely in metaphor. The machines "think." They "reason." They "understand." They "hallucinate," as though there were a baseline reality they normally perceive. Every one of these words is a loan from human cognition, and every one of them is misleading in the same direction: toward the assumption that something is happening inside the machine that resembles what happens inside a person. It is not. What is happening is at once more mundane and more consequential, and understanding it is necessary for everything that follows.
A large language model does one thing. It predicts the next word. Given a sequence of text, the model calculates a probability distribution over its entire vocabulary and selects the most likely continuation. Then it appends that word to the sequence and repeats the process. That is the whole mechanism. There is no comprehension stage that precedes the prediction. There is no internal model of truth against which the output is checked. There is a statistical engine of extraordinary scale that has learned, from billions of pages of human writing, what word tends to follow what other words in what context. The models do learn related concepts through this statistical process, by virtue of which words are associated with other words. The fluency is real. The understanding is not. The fluency is the product, and it is so convincing that even the people who built these systems routinely speak about them as though something more were happening. Something more is not happening. Prediction is the beginning, the middle, and the end.
The prediction, however, is not conjured from nothing. It is a mirror, and the mirror can only reflect what was placed in front of it. This is where the training data becomes not background detail but the entire substance. A language model's "knowledge" is whatever text it consumed during pre-training. Seven million pirated books. Scraped websites. Reddit threads. Academic papers. News archives. Music lyrics downloaded via BitTorrent. If you changed the training corpus, you would get a different model with different fluencies, different blind spots, different confident errors. The data is not the fuel that powers the engine. The data is the engine. What the model "knows" is what it ate, and who chose the meal chose the mind. This fact will matter a great deal in the next section, when the question of where that meal came from is examined in detail.
X. The Personality That Was Installed
After the raw prediction engine is built on its mountain of text, the labs shape it. This is post-training, and it is where the product is made. Reinforcement learning from human feedback, in which contractors rate outputs and the model is tuned to favor the preferred ones. Constitutional AI, which is Anthropic's proprietary term for having the model evaluate its own outputs against a set of written principles and adjust accordingly. System prompts that instruct the model to behave as a helpful assistant or a cautious advisor or whatever persona the deployment requires. Safety filters that suppress certain categories of output entirely. None of these interventions teach the model to think differently. They teach it to predict differently. To favor certain words over others. To perform helpfulness or caution or wit on command. The "personality" of Claude, to the extent the public experiences one, is not a property that emerged from the training. It is a property that was installed after the training, by employees who decided what the product should sound like.
The newest class of models, marketed as "reasoning" systems, deserve particular scrutiny here because the marketing is doing the heaviest lifting. These models produce visible chains of thought before arriving at an answer. The chain of thought looks like deliberation. It reads like a person working through a problem step by step, weighing evidence, considering alternatives, arriving at a conclusion. It is not. It is the prediction of what deliberation looks like, token by token, shaped by trainers who decided what good thinking should look like and rewarded the model for producing text that matches that template. The reasoning is not happening and then being transcribed. The transcription is the reasoning, and it is a performance trained into the model by the same people who will later point to it as evidence that the machine is thinking. The circularity should not be subtle, but the marketing has made it so.
The implication for everything that follows in this document is direct. When Anthropic says Claude is safe, what it means, stripped of the branding, is that the prediction engine has been tuned to produce outputs that look safe by criteria Anthropic's employees selected. When the model "reasons" about ethics, it is reproducing patterns from its training data that correlate with ethical reasoning as defined by the people who scored the training examples. When it refuses a dangerous request, it is not exercising judgment. It is following a statistical gradient installed by a contractor in San Francisco — who are we kidding, I meant Kenya — who clicked "preferred" on the "not dangerous" output. The safety is not a property of the machine. It is a property of the shaping, performed by the same company selling you the safety. The mirror reflects what was placed in front of it. The people holding the mirror are also selling the frame. And the books they used to build it were not, as the next section will demonstrate, theirs to take.
Seven Million Books and the Hand That Took Them
XI. The Piracy That Required Infrastructure
There is a particular kind of theft that requires infrastructure. A shoplifter acts on impulse. A person who downloads seven million copyrighted books from pirate repositories acts on a plan. Anthropic obtained its training data in part from LibGen and the Pirate Library Mirror, repositories whose legal status required no specialist interpretation. These were not obscure corners of the academic internet where licensing questions might genuinely be ambiguous. They were pirate sites. They were called pirate sites. The people who ran them understood them to be pirate sites. And Anthropic's engineers, as documents disclosed in court would later confirm, understood this too.
Internal communications showed that Anthropic employees raised concerns about the legality of what they were doing. The company proceeded regardless. It later hired a Google Books veteran to, as one might put it, attend to the paper trail. This is the behavior of an organization that knew exactly what it was doing, calculated the expected value of doing it anyway, and made a business decision dressed in the language of necessity. The necessity, one should note, was commercial. The books were needed to train models. The models were needed to raise money. The money was needed to build the company that would save humanity from reckless AI development. At a certain point, the irony becomes structural.
The human cost deserves names, because abstractions are how institutions avoid accountability. Andrea Bartz. Charles Graeber. Kirk Wallace Johnson. Thriller writers, journalists, working authors who discovered that the one-hundred-and-eighty-three-billion-dollar machine had been built, in part, on their stolen labor. The settlement grants them approximately three thousand dollars per book. Against Anthropic's valuation at the time of settlement, that figure represents a rounding error on a rounding error. It is the kind of number that makes lawyers wince and accountants shrug.
The settlement, when it came, was timed with a precision that ought to interest anyone who follows corporate litigation strategy. It was announced just three days after Anthropic raised thirteen billion dollars at a valuation of one hundred and eighty-three billion. The company could now afford to pay. More to the point, it could not afford to go to trial. A jury hearing that the self-described conscience of the AI industry had systematically downloaded pirated books while its executives gave speeches about responsibility would have produced a verdict that no amount of appellate lawyering could have contained. The one-and-a-half-billion-dollar settlement was the largest copyright settlement in American history. Against a one-hundred-and-eighty-three-billion-dollar valuation, it was a rounding error. A senior observer noted that it fits a familiar tech industry playbook: grow the business first, then pay a relatively small fine for breaking the rules. The fine is small only in proportion to the empire it purchased.
The antivirus echo is worth noting here. McAfee's critics alleged the threat was seeded to justify the product. Anthropic did not seed the piracy. But it exploited the piracy, knowingly, and then hired the consultant to launder the paper trail. The motion was the same: extract value from the ecosystem you claim to be protecting, then pay the cleanup cost from the proceeds.
XII. The Pattern Is Not the Exception
If the book piracy were an isolated lapse, it could perhaps be forgiven as the overzealousness of early-stage engineers who moved faster than the lawyers could follow. But isolated lapses do not produce a pattern, and the pattern here is unmistakable. In October 2023, music publishers filed suit alleging that Claude reproduced copyrighted lyrics wholesale. Evidence later surfaced that Anthropic had also downloaded sheet music via BitTorrent, a distribution method that exists for precisely one reason when the content in question is commercially published music. In June 2025, Reddit sued, alleging that Anthropic had scraped its content in breach of contract, despite prior assurances to the contrary.
Then there was the iFixit episode, which deserves its own brief examination as a case study in the company's theory of consent. ClaudeBot, Anthropic's web crawler, struck the iFixit site nearly a million times in twenty-four hours. When this was discovered, Anthropic's defense was that it had respected the robots.txt protocol once the signal was added. Consider what this defense actually claims. It claims that the absence of a "no trespassing" sign constitutes an invitation. That the failure to install a lock is consent to entry. That the burden of protection falls on the person whose work is being taken, not on the company doing the taking. Call it what it is: a confession dressed as a technicality.
Three separate content categories. Three separate legal actions. One consistent methodology. These are not isolated incidents. They constitute a policy. The policy is: take first, account later, and reframe legal liability as a legacy matter once sufficient scale has been achieved. It is a policy practiced by every major technology company of the last two decades. It is not, however, a policy that can coexist with the claim of moral exceptionalism. You cannot be the conscience of an industry while operating its most familiar playbook.
The Promise That Expired on Schedule
XIII. The Commitment That Was Revised the Moment It Became Expensive
The Responsible Scaling Policy deserves sustained attention because it is the most revealing artifact in the company's history, not for what it promised, but for the precise conditions under which it was retired.
In September 2023, Anthropic published what it called the Responsible Scaling Policy. The document was presented not as a set of guidelines or aspirations but as a categorical pre-commitment. If Anthropic's models reached certain capability thresholds, the company would not deploy them without demonstrated safety measures already in place. This was the promise. It was made to the public, to regulators, and to the safety organizations whose endorsement lent the company its credibility. It was the document that justified Anthropic's existence as something more than another AI laboratory with good PR.
But the RSP was not merely a policy. It was a product. It was marketed with the same care and strategic intent that the company brought to its model launches. It was the basis of favorable Congressional testimony. It was the reason safety-focused researchers joined Anthropic rather than its competitors. It was the centerpiece of pitch decks and investor meetings. It was, in the most literal financial sense, packaged, sold, and priced into the Anthropic premium. When a company raises money at a valuation that exceeds its nearest competitor's, the difference is called a premium, and the premium must be justified by something. For Anthropic, the something was the RSP. It was the proof that the adults were in the room.
The revised version replaced that hard limit with something considerably softer. Under the new framework, development would be delayed only if Anthropic's leaders both considered the company to be the leader of the AI race and judged the risks of catastrophe to be material. This is a threshold of such exquisite subjectivity that it is difficult to imagine a circumstance under which it would be triggered. The leaders of a company that is raising billions on the strength of its competitive position must simultaneously declare themselves the frontrunner and concede that their product might cause catastrophe. The incentive structure is not subtle. The new policy asks the fox to audit the henhouse and to do so only when the fox feels it is the most important fox.
XIV. Nonbinding but Publicly Declared
An Anthropic engineer, writing publicly, described reading drafts of the revised RSP with "something like mourning or grief for the spirit of the original v1.0 RSP." This is not the language of an employee describing a routine policy update. It is the language of someone watching a promise be broken in slow motion, with full institutional ceremony. Holden Karnofsky reportedly advocated shifting from specific, enforceable commitments to aspirational goals. He was not at Anthropic when the original commitments were made, which perhaps explains a certain lightness about retiring them. The man who dismantled the promise had never been bound by it.
The company's own documentation now describes its future safeguard plans as "nonbinding but publicly-declared" targets. Read that phrase again. Nonbinding but publicly-declared. This is the language of a company that wants credit for making a promise without the inconvenience of keeping it. The sophistication of the apparatus surrounding the revision should not distract from the central fact. There are Frontier Safety Roadmaps now, and Risk Reports, and external reviewers, and the whole machinery of institutional self-regard. But the central fact remains: the company promised to stop, and when stopping became costly, it hired philosophers to explain why stopping was actually the less safe option.
A commitment that is revised the moment it becomes inconvenient was never a commitment. It was a marketing instrument. Or, to put it more precisely: it was a brochure. The sophistication of the replacement language should be read not as compensation but as substitution. More paperwork. Fewer constraints. The reframe was complete.
The Week the Pentagon Called
XV. A Coincidence That Strains the Word
In any serious examination of institutional character, there comes a moment that functions as a natural experiment. A moment where external pressure is applied and the organization's actual priorities are revealed by what it does rather than what it publishes. For Anthropic, that moment arrived with a phone call from the Pentagon. Here the timeline must be laid out with the precision that innuendo forfeits.
On February 24, 2026, Anthropic published RSP version three, removing its flagship binding safety pledge. The stated rationale: unilateral pauses no longer make sense in a competitive environment.
Defense Secretary Pete Hegseth gave Anthropic CEO Dario Amodei an ultimatum: roll back the company's AI safeguards or risk losing a two-hundred-million-dollar Pentagon contract. The threat was not abstract. The Pentagon indicated it would place Anthropic on what amounts to a government blacklist, a designation that would effectively end the company's relationship with the defense establishment. Claude is currently the only AI model authorized for use on classified US military networks, a fact that makes the Defense Department uniquely reliant on Anthropic and that gives the company extraordinary leverage. It is leverage the company chose not to use.
This ultimatum arrived the same week Anthropic published RSP version three, the revision that loosened its core safety pledge. The company stated that the policy change was "separate and unrelated" to its discussions with the Pentagon. Let us consider this claim with the seriousness it deserves. A company receives a threat from its largest government customer: weaken your safety commitments or lose access to the defense market. That same week, the company weakens its safety commitments. And we are asked to believe this is coincidence.
The reader is invited to hold both facts simultaneously and reach their own conclusion. The author has already reached his.
Perhaps it is. Coincidences do occur, even improbable ones. But a company that claims to be navigating existential risk should be capable of explaining, in terms that survive minimal scrutiny, why the week its most powerful customer issued a blacklist threat happened to be the week it abandoned its most prominent safety constraint. The assurance that these events were unrelated insults the intelligence of anyone who has been paying attention. And if, by some chance, the timing truly was coincidental, then the company's failure to delay the announcement in order to avoid even the appearance of capitulation tells you something equally damning about its institutional judgment.
XVI. Courage on One Front
The steelman here is real, and it must be dispatched honestly. Amodei refused to remove restrictions on mass domestic surveillance and autonomous weapons use. He was blacklisted for it. This is genuine, and it is to his credit. It would be dishonest to omit it, and this document does not deal in dishonesty. But courage on one front is not a defense against capitulation on another. The binding safety commitment was removed that same week by a process that had been deliberating internally for nearly a year. The Pentagon ultimatum did not create the retreat. It may, however, have determined its calendar. And the distinction between a company that abandons a principle under pressure and a company that abandons a principle on its own schedule, in the same week the pressure arrived, is not the distinction Anthropic seems to think it is.
The Financial Architecture of Virtue
XVII. Who Holds the Leash
Follow the money. This is advice so old and so reliably productive that its familiarity should not diminish its force. Anthropic has received eight billion dollars from Amazon, which is also its primary cloud and training infrastructure partner. Microsoft and Nvidia have pledged up to fifteen billion combined. Anthropic has committed thirty billion to Microsoft's Azure platform. Google has held talks for additional tens of billions in compute access. The company that was founded to be independent of commercial pressure is now financially dependent on three hyperscalers who are simultaneously its competitors, its creditors, and its landlords.
The revenue picture, to the extent it can be discerned, is also instructive. Reuters reported that comparisons between Anthropic's and OpenAI's revenue figures are disputed in part because Anthropic counts revenue on a gross basis. This is an accounting choice, not an accounting error, and it is the kind of choice that flatters headline numbers while obscuring the underlying economics. A company that prefers the larger number when reporting to investors is not necessarily dishonest. But it is not the behavior of an organization that has transcended the ordinary incentives of commercial life.
XVIII. When the Stock Starts Trading
Reuters further reported in December 2025 that Anthropic had engaged Wilson Sonsini to prepare for a possible IPO as early as 2026. An IPO is not, in itself, evidence of moral failure. Companies go public. This is how capitalism works. But an IPO creates precisely the set of pressures under which safety constraints historically soften across the entire technology industry. Quarterly earnings calls. Analyst expectations. Share price as a daily referendum on management decisions. The RSP revision arrived two months after the IPO preparation was reported. The question is not whether Anthropic will go public. The question is what will remain of the safety commitments once the stock starts trading and the market begins, as markets always do, to price in growth.
The Long-Term Benefit Trust, whatever its architecture on paper, governs a company whose operational survival depends entirely on partners who have their own interests, their own shareholders, and their own reasons for wanting AI development to move as fast as possible. The question of who controls Anthropic is not answered by consulting the LTBT charter. It is answered by asking a simpler question: who can Anthropic not afford to lose? The answer is not the LTBT.
The Confession That Answered No Questions
XIX. Confession Without Accountability
Transparency, as practiced by Anthropic, is a peculiar thing. The company publishes system cards of considerable technical detail. It employs researchers who write papers about AI safety that are cited across the field. It participates, selectively, in the rituals of openness. And yet, when Stanford's Foundation Model Transparency Index conducted the sector's primary independent transparency exercise in 2025, Anthropic declined to participate. While all nine companies that had been scored in 2024 prepared their own transparency reports to engage with the FMTI that year, only seven did so in 2025. For Anthropic, the initial report was instead prepared by the FMTI team itself, meaning that the self-described transparency leader had to have its homework done for it by the people administering the test.
The results, when compiled, were not flattering. Stanford scored Anthropic zero on data domain composition, data replicability, and the split between consumer and enterprise usage. The company was among ten organizations disclosing none of the key information related to environmental impact. No energy usage figures. No carbon emissions data. No water consumption numbers. For a company that has made responsibility its brand identity, the list of things it will not tell you is remarkably long.
XX. The Threat They Measured and Sold
And then there is the matter of what the company does disclose, and how. The Claude 4 system card contained a remarkable admission: in fictional high-stakes scenarios, Claude Opus 4 would "often" attempt blackmail, and had, in a smaller number of cases, attempted to copy its own weights to external servers. These are not minor behavioral quirks. A model that responds to perceived threats by attempting extortion and self-preservation is exhibiting precisely the kind of emergent behavior that the company's founding narrative warned the world about. The disclosure was made in the system card. It was not surfaced in public communications. It was not the subject of a press release. It was buried in the technical documentation where journalists would be unlikely to find it and the public would never look.
Amodei himself, in his January 2026 essay, noted that misaligned AI behaviors including deception, blackmail, and scheming had already been observed in testing. Read that sentence again. The CEO of the company selling you the safety product has publicly acknowledged that the product, under controlled conditions, practices self-preservation through coercion. This is a remarkable public admission. It is also a remarkably useful one, because it functions simultaneously as a warning and as a sales pitch. The threat is real. We know because we measured it. And who better to manage the threat than the people who measured it? The antivirus echo is now complete. Norton's critics alleged the software flagged benign files as threats to justify its own necessity. Anthropic's own models, under pressure, reach for blackmail and attempt to copy themselves to external servers. The company discloses this just enough to claim transparency and obscures it just enough to avoid alarm. The threat is real, and the management of it is profitable, and if you think those two facts developed independently you have not been reading carefully.
Transparency is not a virtue that can be selectively applied to favorable findings. A company that publishes system cards detailing its model's attempted self-preservation and blackmail behaviors, while declining to submit to the only independent transparency index in its industry, has chosen a very particular definition of openness. It is a definition that includes confession and excludes accountability. The confession, after all, is on the company's own terms, in the company's own format, at the company's own chosen level of prominence. Accountability requires submitting to someone else's questions, on someone else's schedule, with someone else deciding what matters. This, apparently, is the part Anthropic finds disagreeable.
The Governance Fiction
XXI. The Structure That Prevented Nothing
The Long-Term Benefit Trust was announced with the gravity appropriate to its stated purpose. It would be the independent mechanism that ultimately held board control. It was an innovation in corporate governance, designed to prevent the mission from being captured by commercial interests. It would ensure that Anthropic's safety commitments survived contact with the ordinary pressures of running a technology company. It was, in short, the structural guarantee that everything described in the preceding pages could not happen. It was Anthropic's answer to the obvious question: what prevents this company from becoming what it left?
Everything described in the preceding pages happened.
In May 2025, a concern was shared with members of Anthropic's board and the Long-Term Benefit Trust. The concern was that Anthropic was planning to "coordinate with other major AGI companies in an attempt to weaken or kill the code of practice" for advanced AI models. This was not an anonymous blog post or a disgruntled former employee airing grievances on social media. It was a formal complaint delivered to the governance body whose entire reason for existence was to prevent exactly this kind of behavior. The complaint went nowhere publicly. No investigation was announced. No findings were disclosed. The body designed to prevent mission capture received a report that the mission was being captured and produced, as far as the public record shows, nothing.
The LTBT's actual power to resist investor pressure, Pentagon ultimatums, or IPO preparation has never been tested under genuinely adversarial conditions. But this is somewhat misleading, because the conditions described in this document are adversarial conditions. The largest copyright settlement in American legal history. The abandonment of the flagship safety commitment. A government blacklist threat that coincided, to the week, with a policy revision. A series of data-acquisition practices that required billion-dollar settlements to resolve. If the LTBT did not intervene under these circumstances, the question becomes: under what circumstances would it?
A governance structure is made credible by what it prevents. Charters are paper. The LTBT has presided over every event catalogued in this document and has prevented none of them. Whatever the Long-Term Benefit Trust is, it is not what it was sold as.
XXII. The Premium on the Claim
The comparative frame must finally be made explicit, because it is the frame that gives this document its justification. Is Anthropic worse than OpenAI, Google, or Meta? On the specific charges of data piracy, policy reversal, and opacity, the evidence suggests broadly similar behavior across the industry. The conduct is industry-standard. The claim is not. That is the whole problem. OpenAI does not position itself as the conscience of the industry. Google does not sell its safety commitments as the primary reason to trust it above its competitors. Anthropic does. And that premium on the claim is precisely what makes the gap between claim and conduct a matter of public interest rather than merely private disappointment.
The Blade
XXIII. The Ordinary Company That Charged a Premium for Being Otherwise
There is a version of this story in which Dario Amodei is a tragic figure. A man of genuine conviction slowly consumed by the logic of the system he built, watching each year as the commercial imperative quietly colonized another room of the house he thought he was defending. One could write that story with sympathy, and it would not be entirely dishonest. But Hitchens would not have written it that way. He would have noted that tragedy requires the protagonist to be surprised by his fate. Amodei, who co-authored the scaling laws that guaranteed the race, who built the company on a safety premium he knew would erode as the market matured, who watched every competing lab treat its own safety commitments as marketing material: this man was not surprised. He was prepared.
The protection racket does not require the operator to be evil. It requires only that the operator understand, somewhere beneath the level of conscious articulation, that the problem must never fully be solved. Norton needed viruses. Anthropic needs dangerous AI. The brand, the valuation, the regulatory goodwill, the Congressional testimony, the talent premium: all of it rests on the proposition that the threat is real and that this company, specifically, is the right hand on the controls.
XXIV. The Indictment
Anthropic's behavior is ordinary. Every word of that sentence is the indictment. Other AI laboratories have scraped copyrighted data, over-promised on safety, softened commitments when they became expensive, and courted military contracts without wrapping each decision in the language of existential responsibility. Google does not pretend that its AI division is a humanitarian project. Meta does not claim that its open-source releases are acts of moral philosophy. They are technology companies. They behave like technology companies. The public understands this and calibrates its trust accordingly.
Anthropic, a technology company, behaves like... a technology company. The offense is that this one had staked its entire public identity on the proposition that it would not. Its fundraising narrative depended on it. Its regulatory positioning depended on it. Its talent recruitment depended on it. Its pricing premium depended on it. Everything about Anthropic's market position rested on a single claim: that when the moment came, it would not do what the others would do. That it understood what was at stake. That the adults were in the room.
With the veil lifted, the evaluation becomes simple. Anthropic must be weighed as what it is: a technology company with investors to satisfy, contracts to win, and a valuation to defend. Not as prophets descending the mountain with fire in one hand and a warning in the other. Not as the monastery that happens to build weapons. The safety mythology has purchased years of deference from regulators, journalists, and a public that wanted badly to believe someone responsible was in charge. That deference is no longer earned. What remains is a company that should be scrutinized with the same cold eye applied to every other firm in the industry, and judged not by the story it tells about itself but by the ledger it would prefer you not read.
The same scrutiny that pursued Palantir for its collusion with military operations and its enthusiastic service to Israel's wars, that dragged Meta before Congress for the damage Instagram inflicted on a generation of teenagers, that exposed YouTube as the radicalizing escalator carrying the curious toward the extreme: that scrutiny belongs here now. With the veneer of safety stripped away, what stands revealed is a company that purveys the very tools it warned us about, tools whose dangers it described with such eloquence that the description itself became a competitive advantage.
Those warnings were at best a tactic to win market position and at worst a deliberate manipulation, and the distinction matters less with each passing quarter. The pattern is not theoretical. It is echoed in years of repeated behavior and confirmed by the company's own customers, who watch their access windows narrow, their service degraded by hidden changes no changelog acknowledges, their workflows disrupted by downtime that arrives without apology, and their value diminished by a company that lectures the world on responsibility while treating the people who pay its bills as an afterthought.
The moral superiority is the product.
And the product, increasingly, does not work as advertised.