Filter All Everyday Ecosystem Image Generation Coding App Builders Research Digital Architects Academic Mentors Video Music & Voice Local / Private AI Local Image Generation Local Video Generation AI Agents

Claude Fable 5

Anthropic · Released June 9, 2026

9.9 /10 Overall Rating
Official Website

What It Actually Is

There’s a number that makes this review easy to write: 80.3%. That’s Claude Fable 5 on SWE-Bench Pro — the benchmark that doesn’t care about toy problems, only whether AI can fix actual bugs in actual production codebases. GPT-5.5 scores 58.6%. The previous king, Opus 4.8, scored 69.2%. Fable 5 doesn’t just win — it wins by a margin that makes you double-check the numbers.

But SWE-Bench Pro is only half the story. FrontierCode Diamond — Cognition’s benchmark for whether models can write token-efficient, production-quality code — tells the other half. Fable 5: 29.3%. Opus 4.8: 13.4%. GPT-5.5: 5.7%. That’s not a lead; that’s a different sport. And the model achieves these scores at medium reasoning effort, meaning it burns fewer tokens to produce better code. The expensive model that’s actually cheaper per real-world task.

The Stripe case study isn’t a press release fantasy. A 50-million-line Ruby codebase — the kind of monolith that makes engineers sweat — got migrated in a single day. Work that would have taken a full team two months. The model planned, executed, self-verified, and delivered. On CursorBench, Cursor’s CEO said it “opened up a class of long-horizon problems that were out of reach for earlier models.” On the Senior Engineer Benchmark, it scored 91/100 — while GPT-5.5 and Opus 4.8 both landed in the low 60s.

This is what Mythos-class architecture looks like when you wrap it in safety guardrails and hand it to developers. The guardrails are real — queries on cybersecurity, biology, and chemistry get routed to Opus 4.8 (still excellent, but not the full engine). But for the 95%+ of coding work that doesn’t trigger safety classifiers, you’re working with the most capable model ever released to the public. The agentic coding era just got its clearest champion.

Key Strengths

  • SWE-Bench Pro 80.3% — the new SOTA: The benchmark that tests real-world software engineering just got a new all-time record. Fable 5 leads GPT-5.5 (58.6%) by 21.7 points and its predecessor Opus 4.8 (69.2%) by 11.1 points. This isn’t a close race — it’s a different league.
  • FrontierCode Diamond 29.3% — token efficiency redefined: Cognition’s benchmark for high-quality production code shows Fable 5 at 29.3%, Opus 4.8 at 13.4%, and GPT-5.5 at 5.7%. The model achieves leading scores even at medium reasoning effort — meaning less token burn for better results.
  • Real-world proof at 50 million lines: Stripe used Fable 5 to migrate a 50-million-line Ruby codebase in one day — work that would have taken a full team two months. Not a benchmark. Not a demo. Production code in a production codebase.
  • Vision-native coding: Rebuilds web apps from screenshots alone. Extracts precise numbers from scientific figures. Completed Pokémon FireRed with vision only — no helper harnesses, no game-state data. The model reads your screen and codes from what it sees.
  • Long-horizon autonomous work: Plans, delegates to sub-agents, writes and runs its own tests, and self-corrects across multi-day sessions. Persistent file-based memory improved Slay the Spire performance 3× more than Opus 4.8. It doesn’t just start strong — it stays strong.
Benchmark Snapshot
  • SWE-Bench Pro — 80.3% (SOTA) Real-world software engineering. 21.7 points above GPT-5.5 (58.6%) and 11.1 points above Opus 4.8 (69.2%). The largest lead any model has ever held on the definitive coding benchmark.
  • FrontierCode Diamond — 29.3% (SOTA) Token-efficient, production-quality code. 2.2× Opus 4.8 (13.4%) and 5.1× GPT-5.5 (5.7%). Achieves leading performance even at medium reasoning effort.
  • Senior Engineer Benchmark — 91/100 Surpasses GPT-5.5 (62/100) and Opus 4.8 (63/100) by a massive margin. Tasks designed to test senior-level engineering judgment.
  • CursorBench — SOTA State-of-the-art on Cursor's benchmark for IDE-integrated coding. 'Opened up a class of long-horizon problems out of reach for earlier models.'

Honest Limitations

  • ⚠️ Access suspended for non-US nationals: On June 12, 2026, the US government issued an export control directive suspending all access to Fable 5 and Mythos 5 for any foreign national — whether inside or outside the United States. Anthropic has had to disable the model for all customers to ensure compliance. All other Anthropic models remain available. Anthropic disagrees with the directive and is working to restore access. Check their announcement for the latest status.
  • Premium cost: $10/$50 per million tokens is roughly 2× Opus 4.8 ($5/$25). Token efficiency partially offsets this on complex tasks, but light users will feel the bill. Pro subscribers get included access through June 22, then credits.
  • Safety routing on flagged topics: Queries touching cybersecurity, biology, chemistry, or model distillation get automatically routed to Opus 4.8. Triggers in <5% of sessions with some false positives. Legitimate security researchers may need the restricted Mythos 5 via Project Glasswing.
  • Third-party evals still emerging: Anthropic’s own benchmarks are detailed and example-rich, but full LMSYS Arena and Artificial Analysis numbers aren’t available yet on launch day. Early signs are very positive.
  • Best in the right environment: Fable 5 shines brightest in Claude Code and API integrations. The claude.ai chat experience is strong, but the model’s agentic capabilities truly unlock with proper tooling.

The Verdict: The coding crown just changed hands — decisively. Claude Fable 5 doesn’t just beat GPT-5.5 on SWE-Bench Pro — it beats it by 21.7 points. It doesn’t just lead FrontierCode Diamond — it leads by 5×. And unlike synthetic benchmark wins, the real-world receipts are already in: 50 million lines migrated in a day, vision-only game completion, autonomous multi-day engineering sessions. The previous Opus 4.8 was the scalpel king; Fable 5 is the scalpel king who also runs the entire operating room. Yes, it costs 2× more per token. Yes, <5% of sessions get safety-routed to Opus 4.8. But for the kind of deep, complex, long-horizon engineering that defines professional software development in 2026 — this is the strongest coding model anyone can access. Period.