🔍 Reviews | Jun 11, 2026 | 17 min read

By AI Tool Briefing Team

Claude Fable 5 Review: Anthropic's Best Model Yet

Claude Fable 5 shipped on June 9, 2026 — Anthropic’s first public model built on the same Mythos architecture that the company spent April calling too dangerous to release. The Glasswing-tier capabilities got distilled, safety-trained, and pushed through the public API at $10 per million input tokens and $50 per million output. The benchmarks that came with it are not subtle. SWE-Bench Pro: 80.3% vs. GPT-5.5’s 58.6%. FrontierCode: 29.3% vs. GPT-5.5’s 5.7%. GDPval-AA: 1932 vs. GPT-5.5’s 1769 and Gemini 3.1 Pro’s 1314.

Those are the headline numbers. The harder number is the price tag — five times what Gemini 3.1 Pro charges for the same kind of work.

So the review question this time isn’t “is Fable 5 better.” It is better. The question is whether being objectively better at coding, agentic workflows, and long-context reasoning is worth a 5x premium when your competition is a Google model that does most of the same jobs at $2/$12 per million tokens.

Quick Verdict

Aspect Rating
Overall Score ★★★★★ (4.9/5)
Best For Coding teams, agentic workloads, enterprise migrations
API Pricing $10/M input, $50/M output
Free Access Window Pro/Max/Team/Enterprise plans until June 22
SWE-Bench Pro 80.3% (GPT-5.5: 58.6%, Gemini 3.1 Pro: 54.2%)
FrontierCode 29.3% (GPT-5.5: 5.7%)
GDPval-AA 1932 (GPT-5.5: 1769, Gemini 3.1 Pro: 1314)

Bottom line: The best AI model on the market right now. Also the most expensive one with a serious capability case to back the price. Use the free window through June 22 to figure out whether your workload justifies the post-window credit cost.

See the official Fable 5 announcement

Aspect	Rating
Overall Score	★★★★★ (4.9/5)
Best For	Coding teams, agentic workloads, enterprise migrations
API Pricing	$10/M input, $50/M output
Free Access Window	Pro/Max/Team/Enterprise plans until June 22
SWE-Bench Pro	80.3% (GPT-5.5: 58.6%, Gemini 3.1 Pro: 54.2%)
FrontierCode	29.3% (GPT-5.5: 5.7%)
GDPval-AA	1932 (GPT-5.5: 1769, Gemini 3.1 Pro: 1314)

What Fable 5 Actually Is

Two months ago, Anthropic published Project Glasswing — the closed enterprise program built around Claude Mythos, the model the company had publicly flagged as too capable to release. Glasswing partners paid $25/$125 per million tokens and got vetted access to a model that, by Anthropic’s own assessment, surpassed expert humans at finding and exploiting software vulnerabilities.

Fable 5 is what happens when you take that architecture, run it through a much heavier safety post-training pass, distill the dangerous edges off, and turn it into something the company can actually sell to the broader market. The marketing name positions it as a successor to the Opus line. The technical reality is that Fable 5 is the first public model in what Anthropic now calls the Mythos class — the same family tree, with the public-release safety cuts the original Mythos didn’t get.

The naming break matters. Anthropic could have called this Opus 5. They didn’t. Fable 5 sits on its own product line at its own price point, and the Opus tier still exists below it. Claude Opus 4.8 — the May 28 release that cut Fast Mode pricing 3x — is now the second-tier flagship. That’s a deliberate product structure decision, not an accident. Opus is the workhorse. Fable is the headline.

The Benchmarks That Justify the Price

Three numbers do most of the work in explaining why Fable 5 commands the premium it does.

SWE-Bench Pro at 80.3%. SWE-Bench Pro is the hardened version of SWE-Bench Verified — real-world GitHub issues that require multi-file edits, working test runs, and patches that actually resolve the bug. GPT-5.5 lands at 58.6%. Gemini 3.1 Pro at 54.2%. The 22-point gap over GPT-5.5 is the biggest delta any frontier model has posted against a peer on this benchmark since SWE-Bench Pro went public. For teams running agentic coding workflows where the model has to ship a patch, not just suggest one, that gap maps to the difference between “the model finishes the work” and “the model hands back something a human still has to clean up.”

FrontierCode at 29.3%. FrontierCode is a benchmark created by Cognition that measures production-grade code quality — specifically whether AI-generated patches are mergeable, correct, and adherent to test requirements. It tests the kind of code that has to survive a real code review, not competitive programming puzzles. GPT-5.5 scores 5.7%. That’s not a percentage point gap. That’s a category gap. The interpretation Anthropic is selling, and the published methodology supports, is that Fable 5 reaches a class of production coding reliability that current frontier peers don’t reach at all. Third-party benchmarks still read better in practice than in isolation. Even at half the published delta, the result is in a different range than peer models.

GDPval-AA at 1932. GDPval is an economic-value-weighted aggregate task benchmark originally created by OpenAI; the GDPval-AA variant cited here is the Artificial Analysis leaderboard version, maintained by Artificial Analysis. It’s a closer proxy for “how much work does this model actually get done in production” than coding benchmarks alone. Fable 5 at 1932, GPT-5.5 at 1769, Gemini 3.1 Pro at 1314. The Gemini gap is the eye-catching one. The GPT-5.5 gap is the practical one. For mixed enterprise workloads where coding, research, analysis, and reasoning all overlap, the 163-point GDPval-AA gap is where the per-query value of Fable 5 lives.

The composite picture is consistent across all three. Fable 5 is meaningfully better than peer models on the work most enterprises care about most. The benchmarks don’t lie about that. The question is whether the magnitude of “better” justifies the magnitude of the price.

The Stripe Migration That Sold the Pitch

Anthropic’s launch material led with a specific customer story, which usually means the customer story is the part that punches hardest in sales conversations.

Stripe used Fable 5 to migrate a 50-million-line Ruby codebase in roughly one day. Their internal estimate for the same migration with a full engineering team was two months. That’s not a productivity-tool benchmark. That’s a “the work happens or it doesn’t” benchmark, and Stripe published a number Anthropic could quote.

The structural reason that result is possible at all — and the reason it would not have been possible on Opus 4.8 — is the combination of Fable 5’s coding accuracy with the agentic orchestration layer it ships with. Fable 5 doesn’t just write better patches than its predecessors. It coordinates much longer chains of patches with much higher per-step accuracy. At 80% SWE-Bench Pro, the per-step error rate is low enough that long agentic chains stay coherent without human intervention at every junction.

The same architecture that produced the Mythos cybersecurity scores produces this. The capability is genuine. The question — and Anthropic clearly anticipated it — is whether the safety post-training that turned Mythos into Fable 5 preserves enough of that capability to deliver the same results in customer hands. Stripe’s published number suggests yes.

For teams sitting on large-scale codebases where the migration math has historically been “this is too expensive to do” — legacy Ruby, legacy Python 2, legacy Java EE, legacy COBOL — Fable 5 changes the math. The cost basis on those migrations just dropped by a factor that makes the queued backlog actionable. That’s the enterprise pitch Stripe inadvertently underwrote.

The Pricing Question Nobody Wants to Answer

Now the hard part.

Fable 5 lists at $10 per million input tokens, $50 per million output. Gemini 3.1 Pro sits at $2/$12. That’s 5x the input cost and 4.17x the output cost. For a workload that’s 1M input and 1M output per task, you’re looking at $60 on Fable 5 against $14 on Gemini — a 4.3x cost multiplier on every query.

For comparison, GPT-5.5 lists at $5/$30, putting it in the middle. DeepSeek V4-Pro post the permanent 75% price cut is at $0.44/$0.87 for the kind of workload most enterprises don’t run on DeepSeek anyway. The competitive landscape is genuinely spread out at this point, and Anthropic positioning Fable 5 at the top of the spread is a choice that requires the capability gap to actually pay for itself.

Three questions decide whether it does.

How much of your workload needs Fable 5 specifically? Not all of it. The right architecture for most teams is a router that sends the easy stuff to a cheaper model and reserves Fable 5 for the queries where the capability gap is load-bearing. Code migrations, multi-file agentic patches, FrontierCode-class reasoning problems — Fable 5. Routine drafting, summarization, simple Q&A — Gemini 3.1 Pro or Claude Opus 4.8 standard mode at $5/$25.

What’s the failure cost on the queries you’d send to Fable 5? A patch that doesn’t ship is not a free outcome. A migration that takes two months when it should take one day costs real engineering time. If Fable 5’s per-query accuracy advantage means you don’t have to redo the work, the implicit hourly rate of the capability gap can absorb the 5x token cost easily. If you’re using it for jobs where the accuracy gap doesn’t matter, you’re overpaying.

Are you using the free window through June 22? The smart move for the next eleven days is to run your actual production workloads through Fable 5 under the free Pro/Max/Team/Enterprise access window. Get a real read on what the capability gap is worth for your specific work. After June 22, the model shifts to usage credits, and the answer becomes a budget decision. Before June 22, the answer is just data.

The Honest Limitations

Fable 5 is the best model. That doesn’t mean it’s the best model for every job. Three real limitations.

Latency is not Fast Mode latency. Fable 5 runs at standard Opus inference speeds — slower than Opus 4.8 Fast Mode, and noticeably slower than GPT-5.5 Instant on short-context tasks. For interactive chat workflows where response time matters more than per-response quality, the speed difference is real. Anthropic has not announced a Fable 5 Fast Mode tier, and given the model architecture, one may not be technically straightforward.

The safety training is conservative. The same post-training pass that distilled the Mythos cybersecurity risks out of the model also tuned Fable 5 to be more refusal-prone than Opus 4.8 on edge cases. For most enterprise use cases this doesn’t matter. For security research, red-teaming, and certain categories of penetration testing work, the conservative defaults will frustrate users who could legitimately use the capability. The Anthropic-Pentagon decision earlier this year hangs over this — Anthropic’s safety posture has trade-offs, and Fable 5 is the most cautious model the company has shipped.

The capability gap on routine work is smaller than the headline suggests. GDPval-AA at 1932 vs. 1769 is meaningful. It is not a category gap. For a knowledge worker writing emails, summarizing documents, and drafting routine deliverables, the Fable 5 advantage over GPT-5.5 will be present but subtle. The 5x price gap will be present and not subtle. The case for Fable 5 weakens the further you get from the work the benchmarks measured.

Fable 5 vs. GPT-5.5 vs. Gemini 3.1 Pro

The three-way comparison for enterprise buyers as of June 11.

Factor	Claude Fable 5	GPT-5.5	Gemini 3.1 Pro
API Input/Output (per M tokens)	$10 / $50	$5 / $30	$2 / $12
SWE-Bench Pro	80.3%	58.6%	54.2%
FrontierCode	29.3%	5.7%	(not published)
GDPval-AA	1932	1769	1314
Best for	Coding, migrations, agentic workflows	Mixed enterprise, ecosystem	Cost-sensitive scale, Workspace integration
Speed	Standard	Fast	Fast
Free tier access	Through June 22	None at this tier	Free tier available

The picture for most procurement decisions doesn’t end with picking one. The picture is a router architecture that uses Fable 5 where the capability matters, Gemini 3.1 Pro where cost matters, and GPT-5.5 where ecosystem fit matters. The coding-specific comparison between Fable 5’s predecessors and GPT-5.5 is the closest reference point for what changed with this release, and the short answer is that Fable 5 makes the coding case for Claude more compelling than it has been at any prior version.

Who Should Use Fable 5

The buyers where Fable 5 is the right answer:

Engineering teams running agentic coding workflows. SWE-Bench Pro at 80.3% is the number that justifies the price tag, and if your team uses Claude Code routines or comparable agentic harnesses, the per-query cost is small compared to the per-query value.
Enterprises facing large-scale code migrations. The Stripe pattern generalizes. Legacy migrations that have been on the “too expensive” list for years are now in the “ship in days” range.
Research and analysis teams hitting GDPval-class workloads. Where the work mix is heavy on reasoning, multi-source synthesis, and long-context coherence, the GDPval-AA gap maps directly to output quality.
Anthropic Pro/Max/Team/Enterprise subscribers, through June 22. The free window makes the test cheap. Use it.

Who Should Look Elsewhere

The buyers where the right answer is something else:

Cost-sensitive teams with mixed workloads. Gemini 3.1 Pro at $2/$12 covers a meaningful share of frontier-class work for less than a quarter of the per-query cost. If your work doesn’t lean heavily on the capabilities Fable 5 advantages on, the math doesn’t work.
Interactive chat-first workflows. For UX-sensitive cases where response latency matters more than the marginal quality gap, Opus 4.8 Fast Mode at $10/$50 ships faster and may be the better Anthropic option.
Security research, red-teaming, sensitive policy work. The conservative safety post-training will surface as friction on legitimate edge-case work. Consider whether the capability gap is worth the refusal overhead.

Frequently Asked Questions

How is Fable 5 different from Claude Mythos?

Fable 5 is the public-release derivative of the Mythos architecture. The original Mythos remains restricted to Project Glasswing partners and has not been released publicly. Fable 5 is what Anthropic was able to release after the safety post-training pass distilled the cybersecurity-class capabilities down to a level the company considers safe for general API access. The benchmarks are still class-leading, but the architecture is intentionally less dangerous than the Glasswing tier.

What happens after the free window ends on June 22?

Anthropic shifts Fable 5 access on Pro/Max/Team/Enterprise plans to a usage-credit model. Subscribers will get a baseline credit allocation with each plan; usage beyond that will pull from purchased credits or convert to API billing at the published $10/$50 rate. The exact credit allocations per plan are scheduled to be published before the window closes.

Is Fable 5 worth 5x the price of Gemini 3.1 Pro?

For workloads that use the capability gap — coding, agentic workflows, long-context reasoning, large-scale analysis — yes. The per-query value of the accuracy advantage easily absorbs the per-query token cost. For workloads that don’t use the capability gap — routine drafting, summarization, simple Q&A — no. The 5x premium pays for nothing you can measure on those queries.

How does Fable 5 compare to Opus 4.8?

Fable 5 is the new top tier; Opus 4.8 is now the workhorse below it. The capability gap is real but uneven. Fable 5 is meaningfully better on coding, agentic orchestration, and graduate-level reasoning. Opus 4.8 is faster, cheaper, and roughly comparable on routine knowledge work. Most production deployments will end up using both, routed by query type.

Can I use Fable 5 with Claude Code?

Yes. Fable 5 is available as a backend in Claude Code as of the June 9 release, alongside the existing Opus 4.8 backend. The pricing differential is significant for high-volume coding workflows — running Claude Code on Fable 5 will cost meaningfully more per session than running it on Opus 4.8, and for many workflows the Opus 4.8 quality is sufficient.

Does Fable 5 have the same context window as Opus 4.8?

Yes. 200K tokens of context, with the same long-context behavior characteristics. The architectural changes in Fable 5 are in capability, not context length.

What about the 5x compute cost — is that sustainable?

Anthropic has not published the underlying cost basis for Fable 5 inference, but the company’s Opus 4.8 Fast Mode pricing cut demonstrated that frontier-tier cost basis is dropping faster than public pricing. The same curve probably applies to Fable 5, which means the price-to-capability ratio should improve over the model’s lifecycle even if the headline rate doesn’t move immediately.

Our Take

Fable 5 is the best AI model on the market on June 11, 2026. Not “in some categories.” On the categories that matter most for the work most enterprises are paying frontier models to do. The SWE-Bench Pro number is real, the FrontierCode number is real, the GDPval-AA number is real, and the Stripe migration story is the kind of customer reference that gets repeated in board meetings for the rest of the year. Anthropic’s product positioning is justified.

The price is also justified, with caveats. $10/$50 is a premium tier, and the right buyer treats it that way — a model you route to when the capability gap matters, not a default. The mistake to avoid is paying Fable 5 rates for queries that Opus 4.8 or Gemini 3.1 Pro would handle at a fraction of the cost without measurable quality loss. A router architecture that uses Fable 5 where it earns its keep and cheaper models everywhere else is the right shape for most production deployments.

The strategic read on Anthropic’s release is that the company is no longer pretending to compete on price. Gemini won the price tier. DeepSeek won the value tier. Fable 5 is Anthropic’s bet that the capability frontier is where premium pricing actually pays for itself, and the benchmarks make the case as well as anyone has made it in this cycle. Whether the bet works depends less on the model and more on whether enterprise buyers learn to architect their workloads to use Fable 5 selectively rather than universally. The teams that figure out the routing pattern first will get the most value out of the new model. The teams that send everything to Fable 5 because it’s the best will find that “best” doesn’t always pencil out.

The free window through June 22 is the right place to start. Run your real production workloads through it. Measure what the capability gap is actually worth on the work you do, not the work the benchmarks measure. Decide the credit-tier question on data, not on the announcement post.

This is the best model Anthropic has shipped. It might also be the model that finally turns the Mythos architecture into a product line worth the engineering bet behind it. For coding teams specifically, the case is closed. For everyone else, the case opens on June 22.

Last updated: June 11, 2026. Sources: Anthropic Claude Fable 5 announcement · Anthropic Project Glasswing · SWE-Bench · GDPval benchmark · FrontierCode benchmark · Claude Opus 4.8 release notes.