Saturday, April 4, 2026

Bugs Are Missing Scenarios

When the spec drives the code, every bug maps to a gap in the matrix

Traditional debugging works backward. Something breaks. You find the defective code. You fix it. You write a regression test. The fix lives in the code. The test is an afterthought.

When your system is built from a spec — golden values, reference calculator, completeness check — the mental model inverts. The code isn't wrong. It was never told what to do for that combination of inputs. The spec has a gap. The fix isn't in the code. It's in the spec.

Add the scenario. Approve the expected values. Run the system. The code either already handles it — the scenario passes — or it doesn't, and the agent fixes the code to match. Either way, the bug is now a permanent fixture in the spec. It runs on every build. It can never recur.

Severity Tells You What's Missing

A production bug in a compliance calculation. How bad is it? The answer depends on what was missing from the spec:

A known gap we deferred. The dimensional matrix had the cell marked as uncovered. We hadn't written the scenario yet. Approve it. Low severity — the system already identified the risk.

A blind spot in the matrix. The dimensions existed but nobody checked that specific intersection. An early clock-in combined with a late clock-out — both shift boundaries adjusted simultaneously. Add the scenario. Check neighboring intersections for similar gaps. Medium severity — the model was complete but coverage wasn't.

A missing dimension. The entire axis of variation was absent. "Actual times vs final times" wasn't a dimension at all — one time field was doing two jobs. Adding the dimension means every intersection involving it is now a potential gap. High severity — the domain model was incomplete.

A wrong golden value. The spec said actual_start_time: "08:00" for a worker who clocked in at 7:50 and chose "I'll wait." The golden value encoded the bug as correct. All three verification legs — golden values, reference calculator, production — agreed on the wrong answer. Critical severity — the verification chain worked perfectly on a wrong input.

Each severity maps to a different amount of work:

Known gap         → approve one scenario
Blind spot        → add scenario, check neighbors
Missing dimension → add column to matrix, review all new edges
Wrong golden value → correct the spec, cascade the fix

Triage Becomes Structural

Bug triage in most teams is a negotiation about priority. How many users are affected. How bad is the workaround. What sprint does it fit in.

When bugs are missing scenarios, triage becomes a diagnostic question: what kind of gap is this?

A known gap means the matrix already identified the risk — pull it off the backlog. A blind spot means the matrix needs a new scenario and its neighbors need review. A missing dimension means the domain model needs architectural work. A wrong golden value means the domain expert made an error — the scariest kind because the system enforced it faithfully.

The answer determines the fix. No negotiation. No sprint planning. The gap type dictates the response.

The Spec Grows From Production

Every bug adds a scenario. Every scenario closes a gap. Over time, the spec covers more of the dimensional matrix. Not through imagination — through discovery.

Production is the ultimate test environment. Real users. Real data. Real combinations nobody anticipated. When something breaks, it maps to dimensional coordinates. The matrix shows the gap. The scenario fills it.

I've watched this happen across two apps now. The spec starts with ten or fifteen scenarios — the obvious paths. Within weeks, production teaches you the edges you missed. Each incident becomes a scenario. The spec ratchets toward completeness one discovery at a time.

The question stops being "what did we miss?" and becomes "what has production taught us?" Different question. Different relationship with bugs. Every incident is a spec improvement, not just a code patch. And once the scenario exists, the completeness check ensures it's verified on every build — by the agent, automatically, forever.

This is part of a series on spec-driven development with AI agents. Previous: Humans Think in Dimensions, Not Test Cases. Next: Where Specs Work and Where They Don't.

Back to insights

Book & App — Launching September 2026

Without Expectation

Debugging Life's Complex Systems

The same systematic approach engineers use to debug complex systems — applied to the complex system of your life. Learn to observe without judgment, distinguish symptoms from root causes, and run small experiments that compound into massive change.

23 chapters
AI prompt templates
iOS companion app
Print, digital & audio

Visit the Book Site Learn More

If you liked this, you might also like...

Where Specs Work and Where They Don't

Spec-driven development works for about 30% of a typical application. That happens to be the 30% where bugs cost the most.

Humans Think in Dimensions, Not Test Cases

Nobody sits down and writes fifty test scenarios from scratch. They describe how the world varies — and the scenarios are the intersections.

Bugs Are Missing Scenarios

Severity Tells You What's Missing

Triage Becomes Structural

The Spec Grows From Production

Content

Company

Connect