OpenAI & Anthropic's AI Model Lockdown, Open Models Rise & Google's Firing — The Weekly Diff #5
Updated
TL;DR
Table of Contents
GPT-5.6 "Sol" Ships, and the US Government Decides Who Gets It
OpenAI previewed GPT-5.6 Sol, which it's calling a next-generation model, and the US government will decide who gets to use it. The rollout is being staggered at the request of the Trump administration, with access vetted rather than open to all.
The naming drew groans first, since calling something a next-generation model while shipping it as a point release invites the obvious question of why it isn't just GPT-6, and the Sol/Terra/Luna tier names landed as OpenAI plainly envying Anthropic's knack for naming things. The pricing didn't help the mood either. Sol runs $5 for input and $30 for output per million tokens, which is steep at a moment when the rest of the market is racing the other way.
The vetting is the part worth sitting with, though. Putting an approval layer on who can use a commercial model at home is new, and people are almost entirely against it.
A "preview" gate is easy to wave away as temporary, though it sets a precedent. The moment access to a general purpose tool depends on approval, the default flips from "anyone can build on this" to "ask first," and defaults are sticky.
For most working developers, the practical takeaway is simpler. If your model access can be revoked or rationed by a third party, that's now a real supply-chain risk for your product, not a hypothetical one.
Anthropic's Mythos Released to "Trusted" US Orgs
Anthropic's side of this is messier and more interesting. The US allowed Anthropic to release its Mythos model to "trusted" US organizations, which sounds like loosening up until you read the fine print. Mythos is probably the best model in Anthropic's lineup, and access to it has basically become a political bargaining chip, to the point where the NSA reportedly lost access amid a dispute with the company.
The rollout even confused people on the basics, like why the government would clear Mythos while keeping the supposedly safer, more guarded Fable locked down. That confusion tells you the public picture of these model tiers is still fuzzy. The sharper argument is over how good Mythos actually is. The skeptical take, which came up a lot in the Will It Mythos? discussion, is that it's just a normal model with the safety rails removed, and that any current model would surface the same vulnerabilities if it weren't trained to refuse. The people who've actually pointed these models at real security work disagree, and the most convincing version of their case is that the edge isn't raw intelligence, it's persistence. Fable doesn't just answer and stop, it keeps digging, sometimes turning up bugs in reverse-engineered binaries that a standard model walks right past.
Meanwhile, Fable 5 is reportedly on track to return soon after being pulled and it worked out perfectly for Anthropic. They got a mountain of free press that amounts to "we're so powerful the government had to step in," and the moment competitors answered by shipping equally strong models, the case for regulating any of it fell apart. When the moat you're advertising is "too dangerous to release," and three other labs ship the same thing next week, the moat was never really there.
CAUTION
Be careful reading model marketing as a security guarantee in either direction. "Too dangerous to release" and "perfectly safe" are both positioning. If you're relying on a model for security work, evaluate it on your own code and threat model, not the press release.
The Rise Of the Open Weight Models
Every one of those gating decisions points the same direction, and the open-weight model world is happy to take the handoff. A few pieces this week captured the moment well. There's a measured look at the gap between open-weight and closed-source LLMs, an argument about the unbearable cheapness of open-weight models, the news that Asian AI startups are launching Mythos-like models while the export ban drags on, and a broader case that for most of the world, open source AI is the only way forward.
The most striking figure from the cheapness discussion was a developer reporting a full month of real work for under two dollars, and noting it isn't even subsidies doing it, since hosts like DigitalOcean and Cloudflare serve the same open models at similar prices. That kind of commoditization is exactly the squeeze the premium labs are trying to escape, and it raises an uncomfortable question, if most everyday tasks run fine on open models, what are you actually paying the premium for? This is the same thread I picked up when GLM-5.2 took the open-weights crown in last week's issue, and it has only sped up since.
A couple of honest caveats are worth keeping in mind. "Open source" and "open weights" aren't the same thing, since open weights gives you the finished model without the training data or recipe to rebuild it, and the two kept getting blurred together.
The bigger worry is whether any of this lasts, since today's best open models mostly exist because of corporate generosity that can be switched off at any time. There's also a real strategic question of whether the US is throwing away its lead by export-banning frontier models while open, largely non-US labs quietly catch up to the quality everyone else can actually use.
If you want to start playing with this yourself, my older walkthrough on getting started with DeepSeek and OpenRouter and the roundup of open-source coding agents worth trying are still the fastest way in.
Pro Tip
The pragmatic workflow that keeps showing up is to use a cheap open model for generation and a frontier model for review and debugging. You get most of the quality for a fraction of the bill, and you're not fully exposed if either provider changes the rules.
Smart Model Routing Lands in Claude, Codex and Cursor
If the problem is cost, the natural next question is how to spend less without babysitting which model handles what. A new smart model router for Claude, Codex and Cursor tries to answer that by sending each request to the cheapest model that can handle it, automatically.
It's a clean idea, and the discussion on Hacker News went straight to the catch that makes or breaks these tools, prompt caching. Bouncing between models means you keep missing the cache, and since cache hits are where most of the savings actually live (the cache only lasts about five minutes), a careless router can cost you more than just sticking with one model. There's a reliability tax on top, since smaller models are likelier to stop early, throw errors, or get stuck in loops, so a "cheaper" call can quietly turn into three calls plus a manual cleanup. The honest version of this tool weighs caching and failure rates in its routing, not just the sticker price per token.
It pairs well with the rest of the cost control toolkit, and I've written before about wiring your own keys and subscriptions into your editor in using your Gemini subscription with opencode and GitHub Copilot with a custom API key. Routing is the logical next layer on top of those.
Nub Brings a Bun Style All-in-One Toolkit to Node.js
Nub is a Bun-like (pun intended) all-in-one toolkit for Node.js, bundling the runtime conveniences that have made Bun and Deno feel so pleasant into something that sits on top of Node rather than asking you to leave it.
Plenty of teams love Bun's developer experience and can't justify moving a production codebase off Node just to get it, so a toolkit that brings the speed and the batteries-included feel while keeping you on the runtime you already trust is an easy yes. The early reception was warm, with one person reporting they'd moved an entire monorepo over with zero issues and others asking the sensible questions about whether it runs on Cloudflare Workers and inside Docker. That's the right instinct, since a tool like this lives or dies by where it can actually run.
INFO
The interesting subtext here is that Node's competition has made Node better. The pressure from Bun and Deno is exactly why a project like Nub, and Node's own recent built-in features, exist at all.
Fired by Google for Building the Google Workspace CLI
Justin Poehnelt says he was fired by Google for creating the Google Workspace CLI, a genuinely useful open-source tool, as a side project. It's the kind of 20% time work Google used to be famous for celebrating.
One part of the argument is that Google has lost the plot, going from encouraging side projects to firing people for them, which fits the wider pattern of swapping a loved open tool for a worse closed one, something I touched on when Antigravity replaced the Gemini CLI. The other part is fair too, that he'd put Google's logo and brand colors on a public googleworkspace GitHub org without permission, which is a genuine trademark problem, but the obvious fix there is to strip the logos and rename it, not fire the guy. Both can be true at once. Either way, the chilling effect is the real story, since the lesson every engineer at a big company just learned is that shipping a useful side project can now cost you your job.
Microsoft's "Quantum Leap" Undone by Basic Python Errors
A researcher claims Microsoft's supposed "quantum leap" doesn't hold up because of basic Python errors, with the argument being that fixing the bug invalidates the result the research was built on.
The detail that makes it land is that Microsoft's next generation chip was reportedly built "with the help of its own agentic AI," which sets up the obvious punchline that the discussion didn't miss, that AI can now hallucinate quantum computing claims about as well as humans can. The substance underneath the jokes is real, though, since the critique is that fixing the bug invalidates the result the whole thing was built on.
Whether or not AI wrote the offending code, the lesson is the same one I keep coming back to in the problem with AI-generated code and why vibe coding is the fast food of coding. Speed without verification just gets you to the wrong answer faster.
The most advanced lab in the world can still be tripped up by a mistake a careful code review would have caught, which is a useful reminder that capability and correctness are not the same thing, in your codebase or anyone else's.