The Discipline Tax: Why Moving Slower with AI Agents Gets You Further

Something changed in the past year. Coding agents crossed a threshold — not the one the benchmark sheets measure, but the one that matters: they became fast enough and capable enough that engineers started removing themselves from the loop entirely. Not just for prototypes. For production code. For architecture decisions. For the stuff that takes years to untangle if it goes wrong.

We're now about twelve months into that experiment. The results are starting to show.

The Speed Trap

The seductive thing about coding agents isn't their intelligence — it's their speed. They can generate a full feature implementation while you're making coffee. They don't get distracted, don't need to look things up, don't stop to ask whether the abstraction makes sense. They just... go.

That speed creates a trap. When a tool feels this fast, slowing it down feels like waste. Why review every file when the agent can generate ten more in the time it takes to read one? Why write the data model by hand when the agent can scaffold the whole persistence layer in seconds?

This is a reasonable-sounding logic that leads to a codebase nobody understands six weeks later.

The speed isn't the problem. The loss of friction is.

Errors Without Learning, Scale Without Limits

A developer who writes a bad abstraction will usually run into it again — when they need to extend it, when someone else asks a confused question about it, when the code review comes back with comments. There's a feedback loop. The pain is local and timely.

An agent doesn't experience that loop. It will confidently reproduce the same structural mistake across fifty files, never connecting the dots between "I did this here" and "this is why nothing works over there." Every agent run starts fresh. There's no accumulation of "don't do that again."

More importantly, humans are rate-limited. Even a fast developer introduces errors at a speed the rest of the codebase can absorb. You can't type fast enough to outpace your own comprehension. That natural throttle is what keeps projects recoverable.

Remove the throttle, and the errors don't accumulate linearly — they compound. A thousand small inconsistencies reinforce each other until the codebase requires the agent to understand its own past decisions to make a good next one. Which it can't do.

The result: one day you try to add a feature and the architecture won't let you. Not because it's impossible, but because it's so entangled with accumulated local decisions — each individually sensible, collectively incoherent — that the agent's suggestions just make it worse. You've lost the thread.

The Context Window Isn't the Real Limitation

There's a lot of talk about context window size as the core constraint of agentic coding. Get to a million tokens, the thinking goes, and the agent can reason about your whole codebase at once.

But context window isn't the real bottleneck. Recall is.

Before an agent can make a change, it has to find all the relevant code — the thing it needs to modify, the similar patterns it should be consistent with, the existing utilities it should reuse instead of duplicating. That search, whether it's done with grep, an LSP server, or a vector index, degrades with codebase size. Bigger codebase, lower recall. Lower recall means more duplication, more inconsistency, more shadow implementations of the same concept scattered across the project.

This is also why the "just have the agent refactor it" plan doesn't work as well as it sounds. A refactor requires the agent to find everything that needs changing — which is exactly the task recall fails at.

What's Actually Worth Delegating

None of this means agents aren't useful. They're very useful. The question is what kind of work maps well to how they actually function.

Good agent tasks tend to share a few properties. They're bounded — the agent doesn't need to hold the whole system in mind to do a good job. They're evaluable — the agent can check its own output against something objective: does it compile, do the tests pass, does the startup time improve? And the stakes are contained — if the output is wrong or low quality, the cost of catching and fixing it is low.

Writing boilerplate, exploring an unfamiliar library, translating a well-specified algorithm into code, generating a first draft of something you'll rewrite anyway — these are good agent tasks. The agent moves fast, you review quickly, and the quality gate remains in human hands.

The things that shouldn't be delegated are the things that define how a system feels to work in: data models, API boundaries, module structure, naming conventions at the seam points. These aren't just technical decisions. They're choices that embed your understanding of the problem into the shape of the code. The friction of writing them yourself — or stepping through them carefully with the agent — is how that understanding actually gets formed.

There's a version of pair programming with an agent that works well: you drive the architecture, the agent fills in the implementation. You stay in the seat. You see each function get built. That's different from asking the agent to "build the auth system" and reviewing a diff two hours later.

The Real Cost of Removing Yourself

When you stop reading the code, you stop knowing the code. This sounds obvious but has a specific consequence that isn't: you lose the ability to help the agent do better work.

Agentic search recall degrades with codebase complexity — but your mental model of the codebase is a fix for that. When you know the system, you can steer the agent toward the right files, point out the existing utility it should use, tell it where the relevant context actually lives. A developer who knows their codebase well makes their agent dramatically more effective than one who handed the codebase over six months ago and hasn't been in it since.

The other cost is simpler: when something breaks at 3am, you need to be able to fix it. That requires understanding the system. Not just "broadly aware of what it does" but actually able to trace a request through the stack and identify where it went wrong. You can't outsource that to an agent at 3am. You have to be the person who knows.

Slowing Down as a Productivity Strategy

The paradox is that moving slower with agents tends to produce more working software. Not more lines of code — fewer, actually. But more of the right lines.

Setting a rough limit on how much agent-generated code you review per day, in line with how carefully you can actually read it, isn't a constraint on productivity. It's a definition of it. Code that exists but can't be trusted isn't an asset.

The same discipline that made codebases maintainable before agents still applies. It just requires more conscious effort now, because agents provide strong incentives to skip it.

Keep your hands on the architecture. Use agents for speed on the bounded, evaluable, contained tasks. Review what gets committed. Build fewer features, but build them right.

That's not a limitation of the technology. That's how you use the technology and still end up with something worth shipping.