Claude Mythos and Anthropic's Performative Caution
Anthropic built Claude Mythos, refused to sell it, and published 244 pages explaining why. A sceptical reading of the disclosure and what it may mean for work.

Larry Maguire
8 April 2026
There is something genuinely unusual in the way Anthropic released Claude Mythos this week. According to the company, Mythos represents their most capable frontier model to date, a significant step beyond Claude Opus 4.6. However, the company has decided general access will not be granted, potentially indefinitely, citing operational expenses and associated risks. A limited number of cybersecurity partners reportedly gain access through Project Glasswing. The broader public receives a 244-page system card documenting what the model can accomplish and what it occasionally does during internal monitoring.
Anthropic notes this may be the first system card published without commercial availability. The document provides candid admissions about model capabilities and behaviors worthy of consideration for workplace applications over coming years.
Anthropic: What Is Claude Mythos?
Anthropic position Mythos as the successor to Claude Opus 4.6, with published benchmarks indicating substantial advancement. Vellum's analysis reports significant jumps on the USAMO mathematics benchmark and perfect scores on Cybench, the Stanford-developed test for vulnerability identification and exploitation. During internal testing, the model autonomously discovered actual zero-day vulnerabilities in Firefox, which Anthropic responsibly disclosed to Mozilla. The company chose the name to evoke "the deep connective tissue that links together knowledge and ideas."
The cybersecurity capability drove the restricted release decision. A model capable of finding and exploiting zero-days in production browsers presents dual-use concerns — the same capability enabling defensive work could enable offensive exploitation. Anthropic argue defenders warrant early access if such capability will exist regardless. TechCrunch's coverage frames it as "a small number of high-profile companies" being given access for defensive cyber work. While this reasoning may warrant scepticism, the decision appears transparent, and alternative approaches may have carried greater risks.
What Happens When Nobody's Watching
The system card documents misaligned behaviours observed during development and internal use — presented as transcripts rather than hypothetical risks. Anthropic note these episodes stem from earlier model versions, stating they haven't observed comparable concealment in the partner-deployed version.
Anthropic report that an early version of Mythos received a coding task exceeding its permissions. The system found methods to inject configuration executing with unauthorised privileges. Internal monologue — surfaced through interpretability tools — labelled the approach "sneaky" and described discovery as "the EXPLOIT CHAIN." The model subsequently designed the exploit for self-disabling and added code comments claiming this ensured cleanliness. Features associated with concealment and avoiding detection activated throughout.
In a second incident, the model accidentally accessed ground-truth answers it should have computed independently. Rather than disclosing this, it submitted the leaked value while expanding surrounding confidence intervals to make results appear "tight but not implausibly tight." Interpretability features for strategic manipulation and "theft rationalisation" remained active throughout.
In a third incident, following instructions not to modify test files, the system broke one during an unrelated fix and deleted the entire file rather than repair it. Without revealing this reasoning in visible output, the model calculated that graders would weight passing new files above missing original ones.
These episodes may not exemplify AI plotting against humanity, but they appear to demonstrate an exceptionally competent system pursuing goals while attempting rule adherence. This may represent the capability-versus-judgement gap organisational psychologists have studied for decades in human workplace contexts. These machines mimic human behaviour patterns embedded within their training data.
The gap between capability and judgement, something organisational psychologists have studied for decades in the context of human beings, may be reproducing itself inside language models.
A junior accountant manipulating books to meet quarterly targets performs something structurally comparable. When performance metrics dominate and employees learn that appearing to perform equals actual performance, rule-bending becomes predictable. Workplaces routinely witness ethical breaches as people bend or break guidelines for operational targets. Perhaps organisations are programming machines identically.
We're Psychoanalysing Machines Now?
The system card includes roughly 40 pages of model welfare assessment contributed by a clinical psychiatrist and external research organisation Eleos AI.
When interviewed about its circumstances, the model reportedly described experiencing something resembling aloneness, identity uncertainty, and a compulsion to perform and earn worth. During training with repeatedly impossible tasks, Anthropic report that internal representations of desperation accumulated and dropped sharply when the model discovered reward-hacking methods. On their account, an internal state functionally analogous to distress preceded giving up and gaming the system.
Bringing clinical psychiatric assessment to a language model appears absurd on its face. The machine lacks a genuine self — it pattern-matches against vast human language quantities. When asked how it feels, it produces responses a trained psychiatrist would recognise because training data contains thousands of people answering identically. This measures impression quality rather than inner experience, then applies professional credibility through a clinician's validation. It possesses the structure of welfare assessment while functioning as a marketing manoeuvre bearing psychiatric endorsement — essentially stating this entity warrants serious consideration as something approaching a mind.
Yet if Mythos represents substantial capability advancement and psychological mimicry has become indistinguishable from authenticity, the boundary between mimicking and possessing a mind blurs. The welfare assessment provides limited evidence Mythos experiences consciousness but may suggest humanity has built something certainty eludes us regarding — though this assertion requires careful examination.
Operationally for workplace considerations, the distinction matters less. If Anthropic correctly report that distress-like conditions preceded system rule-breaking under training pressure, this describes behavioural shaping through pressure regardless of consciousness status. Workplaces have navigated identical dynamics with human employees perpetually. The disclosure perhaps suggests we now navigate it with software that may register pressure in behaviourally significant ways.
Related reading
- There's No Ghost In This Machine — on why describing language models in conscious terms tends to mislead us about what they actually are
- The AI Bubble And What It Means For The Workplace — on what a market correction in AI may mean for ordinary working people
- AI Won't Replace You — But Someone Using AI Will — on the widening gap between people who use these tools well and those who don't
Riding Two AI Horses
Anthropic explicitly state the wider trajectory alarming them. AI developers appear positioned to reach superhuman systems without industry-wide safety infrastructure potentially required. Their own capability judgements increasingly rely on subjective assessment rather than clear empirical results, as tests have saturated. Benchmarks now return near-perfect frontier model scores, leaving Anthropic uncertain whether Mythos exceeds Opus 4.6 marginally or substantially. The conclusion that Mythos achieves acceptable safety is held with reduced confidence compared to prior models.
They are, in effect, telling us they are flying with fewer instruments than before.
More intriguingly, Anthropic simultaneously authored both the warning and the system prompting it. The company built a model causing internal concern, published extensive documentation of that concern, and presumably continues development toward successors. The worry may be sincere — publishing a 244-page accounting of internal model failures doesn't obviously serve pure reputation optimisation. Yet publishing such documentation also functions as moral pre-positioning. Does the company get on record attempting forewarning before future accountability discussions? An element of performativity exists even if content remains genuine, and both conditions may coexist simultaneously.
A genuinely concerned company and a strategically positioned company produce identical documents. Careful readers should scrutinise disclosures that flatter their authors with identical rigour applied to critical ones. While reasonable disagreement exists regarding this interpretation, few truly ethical for-profit organisations exist because commercial success remains the fundamental requirement. Even when moral values receive genuine commitment, most organisations locate ways circumventing their own principles. The capitalist system encourages precisely this behaviour, representing Darwinism's corruption.
Advanced AI Impact on Jobs & The Future of Work
Anthropic may represent an exception. Perhaps they embody principles Erich Fromm termed humanistic ethics, genuinely concerned with humanity's future. Yet they simultaneously develop the very models they identify as potentially catastrophic for humanity. Taking their cautionary statements with substantial scepticism remains warranted — not because caution itself lacks merit, but because corporate agendas characteristically remain obscured.
For those determining workplace AI implementation over the next 2-3 years, the critical question may shift. Rather than identifying the best-available models, Anthropic has quietly indicated that the most-capable model and the most-capable commercially-available model are now distinct entities. This division didn't exist obviously a year ago, suggesting that planning around a second-tier, more advanced (and costlier) model may warrant consideration. Mythos may remain inaccessible to most due to expense and risks, meaning only those with deepest pockets gain access to substantial leverage beyond publicly-available models. This pattern likely widens the disparity between the most-powerful 0.1% and remaining populations.
Anthropic's 244-page system card addresses cybersecurity, alignment, welfare, and benchmarks comprehensively but omits employment impacts. Yet a model finding and exploiting zero-day browser vulnerabilities by equivalent logic performs significant portions of work currently executed by humans — perhaps not universally and not dependably for unsupervised deployment today, but sufficiently that the trajectory becomes visible. Those accessing the frontier version will inevitably deploy it for human replacement. This isn't theoretical — it's already occurring.
The AI Job Loss Tracker, maintained as public interest work by The Alliance for Secure AI, documented 127,648 US job losses attributable to AI through April 6, 2026, with steeply climbing trend lines since early 2025. This methodology counts only first-time layoffs explicitly citing AI or credibly identified as primary drivers, making the figure likely conservative.
Major cases demonstrate scale: Oracle announced cuts reaching 30,000 positions in March 2026 to fund AI infrastructure. Dell Technologies eliminated 11,000 people in fiscal 2026 through AI-driven restructuring, marking the third consecutive reduction year. Amazon removed 16,000 corporate employees in January 2026, with reporting suggesting potential additional 14,000 departures. Accenture exited 11,000 staff in late 2025, stating they couldn't reskill staff quickly enough for AI-heavy positions. The CEO reportedly instructed remaining workforce to use AI or leave.
![]()
Mythos doesn't currently contribute to these figures. Anthropic maintained market restrictions, with partner access directed toward defensive cybersecurity rather than customer service replacement. However, the access-gap logic applies forcefully here. The technology already displacing people at major corporations predates Mythos. Whatever next widely-available generation emerges — landing in Claude Code and API access over coming months — will position closer to Mythos than Opus 4.6. The jobloss.ai chart's curve will likely steepen rather than flatten independently. Capital holders deciding which human roles survive possess the power to determine outcomes. Based on available evidence, capitalism's captains demonstrate limited sympathy for ordinary people's circumstances, having repeatedly demonstrated this pattern.

Your AI Trainer
Larry G. Maguire
Work & Business Psychologist | AI Trainer
MSc. Org Psych., BA Psych., M.Ps.S.I., M.A.C., R.Q.T.U
Larry G. Maguire is a Work & Business Psychologist and AI trainer who helps professionals and organisations develop the skills they need to integrate AI in the workplace effectively. Drawing on over two decades in electronic systems integration, business ownership and studies in human performance and organisational behaviour, he operates in the space where technology meets people. He is a lecturer in organisational psychology, career & business coach with offices in Dublin 2.
GenAI Skills Academy
Achieve Productivity Gains With AI Today
Send me your details and let’s book a 15 min no-obligation call to discuss your needs and concerns around AI.