AI Productivity Is a New Discipline. Even for CTOs.
Part 2 of our velocity series. Once one dev-hour stops shipping one point, the things built on that assumption — how you plan, how you hire, how you build capability — break next. And nobody has the playbook yet.
Jun 20, 2026
In Your Velocity Chart Is Lying To You, we wrote about how AI tooling broke the one assumption every project tracker rests on — that one developer-hour ships one point of work — and how we rebuilt velocity tracking to learn each person's real multiplier from history.
That was the measurement problem. It turns out to be the easy half.
The same broken assumption sits quietly underneath almost everything else leaders rely on: how they plan, what they let ship, how they hire, and how they build capability. Those are breaking too — and unlike velocity tracking, there's no settled playbook to copy.
And this isn't only a CTO's problem. If your company ships a digital product — software, a SaaS platform, an app, an automated service — the shift reaches your roadmap, your hiring plan, and your P&L. It's a CEO, COO, and product-leadership question as much as an engineering one; the CTO just tends to see it first. Here's where the old playbook fails, and what we think replaces it.
Why isn't a Gantt chart enough anymore?
Because planning used to assume a fixed conversion between people, time, and output — and AI turned that conversion into a moving, per-person variable.
For a decade, the CTO's planning job was essentially allocation: take a roughly stable team velocity, spread it across a calendar, sequence the dependencies, and you had a Gantt chart that was a defensible forecast. The hard part was ordering the work, not predicting the rate — and even that was never airtight. Software teams are famously bad at hitting their estimates, and late delivery has always been closer to the rule than the exception. The old model was already shaky. But the unit underneath the forecast was at least stable enough to plan around.
That rate is now the unstable part. As we covered in Part 1, the multiplier is different per person, different for small tickets versus large ones, and it drifts every time the tooling improves. When one senior engineer running parallel agents ships four to seven points in the time the chart scores as "one hour," a Gantt drawn against an average is fiction the day you draw it.
So the job changes shape. It stops being sequence the tasks against a known velocity and becomes continuously reallocate toward wherever leverage is highest this month — much closer to portfolio or capital allocation than to a construction schedule. You're not managing a plan anymore. You're managing a fast-moving distribution of leverage, and the distribution is the thing worth watching.
And this is genuinely new even for experienced CTOs. The instincts that made someone great at the old game — reliable estimation, steady roadmaps, a feel for how long things take — can now actively mislead. There is no twenty-years-of-practice for a reality that's eighteen months old.
What new bottlenecks does the speed create?
When output multiplies, the constraint moves downstream — to review, verification, and the call on what actually ships. And the failure mode gets subtler: AI produces fluent, well-argued output that's wrong, which is far harder to catch than something that visibly breaks.
The bottleneck moves. When a developer ships four to seven times the work, the team's limit stops being how fast can we write this and becomes how fast can we review, test, and trust it before it reaches production. Feed 5× the volume into a review pipeline sized for 1× and you haven't removed the queue — you've relocated it. That's exactly where Part 1's gap between active-coding time and calendar time comes from: the code is done in two hours and then sits for two days waiting on a human to vouch for it.
The errors also change shape. Old bugs were often considerate enough to look like bugs — it didn't compile, the test went red, the output was visibly garbage. AI's characteristic failure is the opposite: confident, plausible, well-explained, and wrong. A wrong answer with a convincing rationale is far harder to spot than one that obviously looks wrong, so the probability of a mistake slipping through climbs even as raw productivity does. We've written before about how even 98% accuracy isn't enough at scale — two errors in a hundred become six hundred in thirty thousand.
The implication for leaders is uncomfortable: you can't just bank the speed. Some of the capacity AI frees up has to be reinvested into verification — stronger tests, dry-runs, human-in-the-loop checkpoints where the cost of being wrong is high, and review discipline that scales with output. The teams that count only the acceleration, and not the new verification load it creates, ship more — and ship more that's quietly wrong, found later.
How do you hire for a role that doesn't have a job description yet?
You stop hiring against titles and credentials — because the roles, the consensus, and the training pipeline don't exist yet — and start hiring for judgment, adaptability, and demonstrated leverage.
Look closely and the scaffolding you'd normally lean on isn't there:
- The roles aren't defined. There's no settled title, and no agreement on what an "AI engineer" does versus an "AI-leveraged developer" versus a "forward-deployed" someone. You can't copy a job description that the industry hasn't written.
- Academia hasn't caught up. Degree programs teach a stack that moves faster than the syllabus can be approved. And the bootcamp boom is mostly selling certainty — a certificate — about skills that are still actively forming. There is no accredited course in "orchestrate a fleet of agents to ship 5×," because the thing barely existed last year.
When the usual signals — years-of-X, keyword match, a credential — stop measuring anything reliable, the signals that survive are harder to fake: taste and judgment (knowing what's worth building, and how to verify what the AI produced), how fast someone actually learns and adapts, and the concrete leverage they can already demonstrate with these tools.
The same thing reshapes how you manage the people you already have. The multiplier is individual, learned, and drifting — so leveling, comp, and career frameworks calibrated to seniority proxies like tenure, story points, or lines of code quietly mismeasure people. You end up managing for trajectory and leverage, not for proxies. And the honest part: there is no consensus yet. Anyone selling you a finished competency model for the AI era is selling certainty that doesn't exist.
Where do training and workshops actually fit?
They do double duty: they build the literacy that drives adoption, and they're the cheapest way to find the high-ROI projects — because the people who know the painful workflows sit in operations, not on the AI team.
AI leverage doesn't spread by buying licenses. It spreads when people understand what's newly possible and build the instinct to reach for it on the right problem. A good workshop demystifies the technology and turns a handful of power users into a literate organization that can actually carry adoption.
But the more underrated payoff is discovery. The best AI projects live at the intersection of two kinds of knowledge: what's newly technically possible (the engineers know this) and what operations actually does all day (the operators know this, and nobody else really does). Put both groups in a room and that intersection surfaces fast — and it surfaces ranked by ROI, anchored to workflows that real people run, not to a strategy deck's guesses.
That's also why this beats a top-down AI mandate. When the teams who will use the tools help choose them, they own the result, which is most of the adoption battle — and the projects that come out are real, because they're sync'd to operations rather than imagined above them. You de-risk and you discover at the same time.
What this all comes down to
The velocity chart was just the first instrument that turned out to be calibrated for a world that no longer exists. The Gantt chart, the job description, and the training calendar are the next three.
The leaders who navigate this well won't be the ones who find the perfect new framework — there isn't one yet, and anyone claiming otherwise is guessing. They'll be the ones who treat AI productivity as something to continuously measure, hire toward, and build capability around, while staying honest that the playbook is still being written. The hour stopped being a constant. The org chart, the JD, and the training plan are next — and the work is to stop hard-coding assumptions and start learning them from reality.
Reflekt Lab runs AI strategy workshops and fractional-CTO engagements to help teams navigate exactly this. If your planning, hiring, or adoption playbook feels a step behind, let's talk.