The dials still spin. They’ve stopped meaning what they meant.
DORA — deployment frequency, lead time, change-failure rate, time-to-restore — was calibrated for a world where a human wrote each line and writing was the bottleneck. Remove that bottleneck and the dials keep spinning, faster than ever. The mistake is reading the speed as health.
Two of the four now flatter you:
- Deployment frequency climbs because generating a change is nearly free. More deploys — not necessarily more value.
- Lead time falls because the keyboard part, the part these tools accelerate, was never where the risk lived.
These are your most-watched numbers, and they have quietly become the least trustworthy.
What still tells the truth — and what you’re missing.
- Time-to-restore holds. When something breaks, fixing it still demands understanding — and understanding is exactly what machine-written code erodes. If MTTR is creeping up while everything else looks magnificent, that’s your tell.
- Change-failure rate is your early-warning light. It’s the first place “we shipped more” turns into “we shipped more bugs.”
- Rework rate is the metric you don’t have yet — and the one that matters most now. Code rewritten, reverted, or churned within days of shipping. AI makes the first draft cheap, which means it makes the wrong first draft cheap too. Velocity that becomes rework two sprints later was never velocity. It was debt with good lighting.
If your dashboard looks identical whether the team shipped real value or generated a mountain of churn, it isn’t measuring the thing you care about.
The gap nobody is measuring.
The real problem isn’t any single metric. It’s the gap between felt speed and proven outcome. Your team feels faster. Your dashboard agrees. And yet the thing the business actually wanted — outcomes shipped, problems solved, customers moved — hasn’t moved in proportion. That gap is where trust dies: first the engineers stop believing the numbers, then the executives stop believing engineering.
What an honest dashboard looks like.
You don’t need a new framework. You need to stop reading the flattering ones at face value. An honest read:
- Demotes the two flatterers — deployment frequency and lead time — to context, not headline.
- Promotes change-failure rate and time-to-restore to the front.
- Adds rework rate as a first-class, named metric.
- Anchors all of it to outcomes rather than motion — the spirit of DX Core 4, SPACE, and DevEx, which were always about more than throughput.
It should survive a heavy-AI week, not just an average one. The point isn’t more dashboards. It’s a dashboard you can trust at 9am on a Monday when the board asks whether the AI investment paid off.
This is a leadership problem wearing a productivity costume.
The fix isn’t a tool. It’s deciding, as a leader, to measure what’s true instead of what’s flattering — and to rebuild the team’s trust in its own numbers. Everything else depends on it: if you can’t measure honestly, you can’t manage the augmented team, and you can’t tell whether a single change you make actually helped.