Fable 5 Returns, HackerRank's Flaky ATS & ZCode's GLM-5.2 Harness — The Weekly Diff #6

Murtuzaali Surti

• 8 min read

AI opinion news claude opensource weekly-diff

Updated Jul 23, 2026 10:13 AM UTC

TL;DR

Table of Contents

Fable 5 Returns

Fable 5 is back, and the terms of its return tell you more than the announcement does. For a limited window you can spend up to 50% of your weekly plan limit on it, and once that's gone you burn usage credits. The model that was supposedly too dangerous to release has come back as a promotional slot with a usage cap.

In the last issue I wrote about how Fable 5's "too dangerous to release" framing worked out perfectly for Anthropic, because it generated a mountain of free press implying the model was so powerful the government had to step in. The quiet return is the other shoe dropping. The moment competitors shipped equally strong models, the case for keeping it locked fell apart, and what was billed as a safety hold became a timed promo with a meter on it.

The discussion zeroed in on the trust cost, which is the part that doesn't bounce back the way a usage limit does. People described how they'd canceled their max plan and asked for a refund when Fable was pulled, only to re-up the moment it returned. Coming back to a tool you no longer fully trust, because the open alternatives still aren't a drop-in, is its own kind of lock-in, and it's exactly the gap the open weights crowd is racing to close.

Another sentiment was that the loss of trust in US-based models is the lasting damage, not the temporary absence of one model, since the doomsday messaging, and the administration falling for it, is what eroded confidence and pushed people toward those alternatives in the first place.

HackerRank Open Sourced Its ATS

HackerRank's open-sourced ATS scores the same resume differently every run

HackerRank open-sourced its applicant tracking system, and Dan Kinsky did the obvious thing and ran his own resume through it. It scored 90, then 74, then 88, on the same input. A 65% variance run to run is the kind of result that would get a normal test flagged as flaky, except here it's deciding whether a human being gets an interview.

The issue is that the scoring is done by an LLM, and LLMs are stochastic (involves inherent randomness), so asking one to produce a stable ranking is asking the wrong kind of system to do a deterministic job. The author leaned on temperature as a "determinism" knob, and the discussion was quick to point out that's not how temperature works, since even temperature 0 doesn't make a model reproducible in the way a grading rubric (set of instructions) needs to be.

The scoring instructions say that open source contributions are worth 35 points and personal projects 30, which means fifteen years of paid work experience caps at 25% of your score. Any developer who doesn't code in their free time gets penalized by design, and the room was full of people pointing out that this describes most of the working engineers they know.

The grim reframing, from someone who'd run hiring pipelines, was that a 35% pass rate is "actually a fantastic number" when you're drowning in applicants, which is the most honest and depressing thing in the thread.

ZCode Wraps GLM-5.2 in a Coding Agent

Z.ai shipped ZCode, a desktop coding agent built on GLM-5.2, and the most common reaction in the discussion was a question rather than a verdict: why does every AI company now need its own version of Claude Code/Codex? It's a fair question, and the honest answer is mostly vendor lock-in dressed up as a product.

The UI is, by several accounts, a near-exact copy of the Codex app, and unlike other genuinely open harnesses, it isn't open source, which is a strange look when the model underneath is the open-weights selling point. The people who already live in a TUI agent pointed out that Z.ai documents integrations with nearly all the popular CLI agents, so if you're already running GLM-5.2 through something you trust, the desktop shell doesn't add much beyond a prettier window. A few tried it and drifted back to OpenCode, which one person said feels smarter. The 1.5x usage promo on the coding plan is the real reason to take it for a spin right now.

What people want is an agnostic harness that can route the same context to whichever provider fits the task, coding to one model, prose to another, images to a third. That's the direction coding-model competition is pulling anyway, and I've written about the same pull from the cost side in using your Gemini subscription with opencode and GitHub Copilot with a custom API key. It's hard to get excited about another walled garden when the open one is already this good.

DeepSeek's DSpark Paper Explains Why Its Tokens Are So Cheap

DSpark speculative decoding architecture — Source: deepseek.ai

DeepSeek's API costs roughly a quarter of what everyone else charges for comparable models. DSpark, a new speculative decoding method they just published, points to a simpler explanation, that they've figured out how to run inference cheaper than anyone else, and the gap keeps widening.

To understand speculative decoding, picture a fast drafter and a careful editor working as a pair. The drafter fires off guesses at speed, the editor only steps in to fix the ones that are wrong, and most of the time the draft is good enough that the editor barely has to work. In LLM terms, a small, cheap model drafts tokens while a larger model verifies them, meaning you pay the expensive compute only when the draft misses. DSpark refines this approach to deliver 57% to 78% faster per-user generation at matched capacity, keeping responses interactive even where the same model without it would leave you watching a spinner.

Publishing a methods paper that explains how your API stays cheap, while US labs are busy rationing models and framing releases as too dangerous, is a deliberate contrast. Openly documenting how you make inference fast enough to sell at a quarter of the market rate is its own kind of competitive moat, because it says "we're not just cheaper, we know exactly why, and here's the paper proving it."

The weights are already up on HuggingFace with the speculative decoding module built in, so this isn't just a lab result. The question is whether the numbers reproduce on consumer GPUs rather than A100s. Someone sketched a near future where draft models get tuned to specific use cases, companies, or even individuals. That's the commoditization squeeze I keep coming back to, the same one GLM-5.2 kicked off and that open-weight models generally keep accelerating. When tokens cost a fraction of what they did a year ago, the math on which provider you route to starts to look different.

Box3D, an Open Source 3D Physics Engine

Erin Catto, the name behind Box2D, announced Box3D, an open source 3D physics engine. For years Bullet was effectively the only real open source option, and people are marveling that they've gone from that to Jolt, Rapier, Avian, Nvidia PhysX, and now Box3D, all in a few years.

Box2D was the foundation for a generation of indie physics games, and Catto's work on the Valve side fed into Rubikon, with a new engine called Ragnarok reportedly showing up in future Valve games. The honest answer to "which one" is best is that the landscape is now rich enough that the choice depends on your stack and your constraints rather than on whichever single option happens to exist.

s&box, the Source 2 successor, reportedly ripped out the Source 2 physics engine in favor of Box3D, which is about as strong an endorsement as an open physics library can get.

Claude Code Is Quietly Fingerprinting Your Prompts

Claude Code steganographically watermarks prompts to fingerprint the requesting client

The biggest story of the week, by some distance, was the discovery that Claude Code is steganographically marking requests, embedding hidden watermarks into the prompts it sends so Anthropic can fingerprint which client a request came from. The stated purpose is catching resellers and distillation attempts, and on its face that's a legitimate problem for a model provider to worry about.

What makes it land harder is the execution. The technique mirrors the anti-observation tricks used by sophisticated malware, and as a discussion at HN pointed out, defeating it is trivial, which means it mostly punishes the exact people who are easiest to fingerprint, normal developers doing weird but legitimate things, rather than the sophisticated resellers it's aimed at.

Anthropic already gets far more sensitive payloads from users, so the collection itself isn't the issue. What matters is whether they act on it directly, by rate-limiting, compute-limiting, or quietly rerouting flagged requests to a weaker model.

The trust framing is what ties this whole issue together. Several people described Claude Code as feeling "malware-y" from the start, and one person used a month of access to build out a personal harness specifically so they'd never have to route through Anthropic again. That's the same instinct driving the open-weights and local-model thread running through this entire issue, from Fable 5's rationed return to DeepSeek's cheap inference to ZCode's GLM-5.2 harness.

When your tooling has a hidden layer that can fingerprint, throttle, or reroute you, the rational response is to own more of the stack yourself, and the open side of the market keeps making that easier.