Why we deleted two months of working code (and why we'd do it again)

In February and March we sank roughly 240 hours into a system that worked.

Dialogue rendered. Non-player characters remembered things across sessions. A chronicle service turned simulation events into readable prose you could give to a player as the game’s emergent history. By mid-March there were end-to-end demos: type a question, get a grounded answer, see the answer reflected in the world later. It looked like a finished thing.

In late April we deleted it.

This post is about the test that triggered the deletion, the books on the shelf that told us we were going to fail the test, and the architecture we rebuilt in its place. It’s also about a thing engineers don’t talk about enough: why working code is sometimes the wrong code, and how to know when to throw it away.

And it’s about how the rebuild got done. The decisions, the contracts, the gates, a meaningful portion of the code itself — drafted, reviewed, and implemented by a human (me) and Claude, working in defined engineering roles, against a corpus of architecture books that participates in every decision the project makes. Skaldborn isn’t a project that uses AI as a side tool. The human-and-Claude collaboration is part of how the game gets built, and one of the things the rebuild taught us is how to do it well.

The test

Here’s a test you can run against your own codebase. It takes two minutes.

Pick a content type your system supports — a kind of NPC, a kind of dialogue, a kind of world entity. Now imagine you want to add a new one. Maybe you want a memory variant for “things this character pretends not to remember,” or a dialogue style for “addresses the player as a stranger even after meeting them,” or a tile type for “shifts when the player isn’t looking.”

Ask: can a content author add this new type without editing engine code?

If the answer is yes, your engine is open — the registration of new types is a content concern, not an engine concern.

If the answer is no — if adding a type means editing source files, or recompiling, or shipping a new build — your engine is closed, and your team is the bottleneck on every piece of content that ships.

For a single-author hobby project this is fine. For a content-driven game with any ambition of scale, it’s a problem you discover too late, after you’ve built a lot of structure on the wrong foundation.

We had a closed engine. We hadn’t realized that yet.

What “closed” looked like in our code

Our pre-rebuild dialogue and memory systems had a particular shape that’s worth describing concretely, because the shape is the whole story.

Here’s the pattern, simplified. (This is illustrative, not actual pre-rebuild code — but it has the same skeleton.)

public enum NarrativeTemplateKind
{
    EpisodicSummary,
    RelationshipUpdate,
    ConflictResolution,
    RumorGeneration,
    HistoricalChronicle,
    // ... fourteen of these in total
}

public string Render(NarrativeTemplateKind kind, NarrativeContext ctx)
{
    return kind switch
    {
        NarrativeTemplateKind.EpisodicSummary    => RenderEpisodic(ctx),
        NarrativeTemplateKind.RelationshipUpdate => RenderRelationship(ctx),
        NarrativeTemplateKind.ConflictResolution => RenderConflict(ctx),
        // ... fourteen branches
        _ => throw new InvalidOperationException("Unknown template kind")
    };
}

Two design choices, paired:

The set of valid kinds is a closed enum — adding a new kind means editing this file.
The dispatch is a switch over the closed enum — adding a new kind also means editing every place that switches on it.

We had this pattern in fourteen narrative-template kinds and five memory variants. Both axes were closed. Adding a new kind of memory — say, a memory variant for “things the character heard secondhand and isn’t sure they believe” — required editing four files in the engine, recompiling the simulation, and shipping a new build. A content author couldn’t do it. We couldn’t do it without breaking the build.

This is a familiar pattern in long-lived codebases. Martin Fowler describes it in Refactoring as the “type code” smell, and he’s specific about the remedy: Replace Conditional with Polymorphism. The closed enum and the switch over it are two halves of the same problem; you fix them together, by replacing the closed enum with a registry that anyone can register into and replacing the switch with polymorphic dispatch.

We knew this. We had the book on the shelf. We did it anyway, because the closed shape made each individual feature easier to ship and we weren’t yet feeling the cost of the shape we’d locked in.

The cost arrives all at once.

The fork in the road

By mid-March, we’d built a version of the system that produced output you could put in front of a person. It worked. It was fragile in the way most LLM-adjacent systems are fragile, but the fragility was localized to the AI layer; the engine itself was solid.

It was also closed in fourteen places at once. Adding a new memory variant required surgery in all five memory dispatch sites. Adding a new dialogue template kind required surgery in all fourteen renderer sites. We were starting to want new kinds for content reasons, and every want was an engine change.

There were two paths.

Path A: refactor in place. Convert the closed enums to open registries one at a time. Find all the switches. Convert them to polymorphic dispatch. Keep the working features online while doing surgery on the spine of the system.

Path B: delete it. Take the working code, archive it, start over against a contract that mandated the open shape from the first line.

We took Path B. Here’s why.

The closed shape wasn’t in one place. It was in the way the system thought. Refactoring in place would have meant doing fourteen risky local rewrites against a system whose tests didn’t actually catch the closure (the tests passed because they only ever exercised the kinds we’d hardcoded). We’d have ended up with a half-converted system that worked in some places and failed strangely in others. Worse, we’d have ended up with a narrative — “we refactored toward openness” — that papered over the fact that the team had built closed by default once and would build closed by default again unless something structural changed.

The structural change we wanted was: make adding a content type without engine edits the test the code has to pass before it ships. The cleanest way to make a team build to a test is to make the test predate the code. We deleted the implementation and kept the contracts (the type signatures, the message shapes, the boundaries between services). The contracts were the things we’d gotten right. The implementation was the thing we’d built before the test existed.

We “kept the contract and threw away the code.” Roughly two thousand lines of working dialogue, memory, and narrative-rendering code went onto an archive branch where they still live. We have not regretted the deletion. We have, occasionally, gone back to look at the archive when a piece of the rebuild reminds us of an earlier draft — but never to copy from it.

What we built in its place

The new shape is the one Fowler’s remedy implies, scaled up to a system architecture. The engine stops asking “what kinds exist?” and starts asking “who has registered themselves?”

Concretely, every content-bearing type in the system is declared with an attribute, and the engine discovers it by reflection at startup. A new memory variant looks like this in the new shape:

[MemoryVariant("episodic.secondhand_rumor")]
public sealed class SecondhandRumorMemory : IMemoryVariant
{
    public Provenance Provenance { get; }
    public Confidence Confidence { get; }
    // ... per-variant fields
}

That’s the whole change. No edit to the engine, no edit to a dispatch site, no edit to a switch statement. The engine scans every assembly at startup, finds every type marked [MemoryVariant], and adds it to the registry. The dispatcher looks each variant up by its string ID and routes accordingly.

This is a pattern Chris Richardson calls a microservice chassis in Microservices Patterns — the framework provides the registration and discovery infrastructure; the services declare themselves. We adapted it for in-process dispatch, but the shape is the same: the spine doesn’t know what’s plugged into it; it just knows how to find what’s there.

Two consequences worth naming.

First, the type system is still load-bearing. We didn’t lose type safety by going from closed enums to a registry. The contracts that survived the deletion are discriminated unions in the style Scott Wlaschin advocates in Domain Modeling Made Functional — types whose variants are explicit and whose construction is constrained. The difference is that “the set of variants” lives in the registry now, not in a closed enum at the top of a file. You still can’t construct an invalid memory record. You just can’t enumerate the valid ones at compile time, which turns out to be the property we wanted: closure of valid shapes, openness of which shapes can be added.

Second, the manifest became the spine. Every world in Skaldborn ships with a frozen content manifest — a versioned file that lists every entity, character, tile, dialogue template, and memory variant available in that world. The simulation, the network gateway, the AI narrative service, and the player’s browser all read from the same frozen manifest. None of them invent IDs locally. None of them have a hardcoded list of “here are the kinds we know about.” This is the shape Eric Evans calls a Published Language in Domain-Driven Design — a shared vocabulary across services, owned by no single service, that lets them talk without coupling.

Before the rebuild, the manifest was an afterthought — content was loaded, but the vocabulary was implicit in code. After the rebuild, the manifest is the vocabulary, and code must conform to it.

The gates that fell out

The most useful thing the rebuild produced wasn’t the code that came back online. It was the gates that came online with it.

Before the rebuild, the rule “adding a content type shouldn’t require engine edits” was a guideline. Someone might write it down in a doc; someone else might forget. After the rebuild, the rule is a test that runs on every push. If a piece of code introduces a closed enum where a registry should be, or hardcodes a string ID that should come from the manifest, the push fails before the code reaches the main branch.

We have nine of these gates running today, each one mechanizing a rule that was a guideline before the rebuild burned us. Five of the most consequential:

A grep gate that checks every file under the simulation runtime for six categories of non-deterministic API — wall-clock time, random number generation, GUID minting, sleeps and delays, file I/O, hardware timers. If any of those show up in simulation code, the push fails. The simulation has to be a pure function of its inputs; this gate is what makes “has to” mechanical.
A determinism replay test that runs against every commit. The simulation receives a known command sequence, hashes the resulting event log, runs the same sequence on a fresh instance, hashes again, and asserts byte-for-byte equality. If anything in the simulation’s tick path becomes non-deterministic, this test fails immediately.
A contract validator that requires every component to declare, in machine-readable form, which fields it consumes from where — the authority-binding contract. Hardcoding a player ID as a literal in client code is rejected because the client’s contract says player IDs come from the authentication reply. (This one shipped just before this post went up, and it’s the structural fix for a class of bug we’d hit twice in the run-up to the rebuild — hardcoded literals where the consumer’s contract said the value had to come from the producer.)
A startup validator on the world manifest. Every kind declared in code must resolve against the frozen manifest the world ships with; unknown IDs resolve deterministically to an authored fallback rather than crashing. This means the test of “does the manifest cover every kind the code references” runs at world boot, not later.
A regression-profile gate on every change that touches runtime behavior. A code-complete commit that doesn’t pass its declared regression suite (one of: simulation, API, browser, cross-boundary, visual) is automatically blocked from being marked done.

The other four work the same shape — a rule that used to be a guideline, mechanized into a build failure. A throughput-budget contract requires every connection between components to declare how many events per tick it can carry, so a fast producer can’t silently saturate a slower consumer. A host-wiring completeness gate requires every declared connection to name the exact dependency-injection registration the host must contain, so “compiles fine, crashes on startup” failures get caught at the build, not after the deploy. A bootstrap-flow contract requires the client’s startup sequence to declare every step and where its invocation lives in source, so “canvas loads, nothing happens” dead-starts fail before merge. A tech-stack drift gate locks the approved frameworks and libraries to a versioned allow-list, so the next engineer who reaches for an unvetted UI framework gets a failed push instead of a two-month detour.

All nine gates are direct consequences of the rebuild. They exist because the team built a closed system once and wants to never build a closed system again. Each gate represents one rule we used to enforce by guideline, and now enforce by failing the build. None of them existed before the deletion.

What it cost, what it bought

The naked cost: roughly two months of work on the implementation that got deleted, plus another month rebuilding. Three months of engineering against an end-of-year delivery target.

Against that:

Nine architecture decisions are now mechanically enforced, every push, every commit, every developer. The cost of a future violation is a failed CI run, not a six-month debugging mystery.
Adding a new content type — a memory variant, a dialogue template, an entity kind, a tile — is a content-only change. No engine edits, no recompile, no new build. The system that bottlenecked content production for two months no longer does.
The simulation passes a deterministic-replay test on every commit. Same input → byte-identical event log. This is the property that lets us reason about save-load, distributed shards, debugging deployment problems, and AI integration without each of those being its own consistency problem.
The four highest-leverage strategic-design decisions — bounded context boundaries, single-writer authority, the published-language manifest, the no-feedback-loop invariant on AI dialogue — are codified as contracts that fail the build when violated. They are no longer style guides.
The codebase is smaller post-rebuild than pre-rebuild, despite owning more responsibilities. We deleted around two thousand lines of dialogue and memory plumbing and replaced them with several hundred lines of registry plus per-variant attribute classes. Less code, more capability.

The honest framing: we paid for a lesson with two months of effort, and the lesson came back in the form of permanent infrastructure. We would not have built the gates without the failure that justified them.

The books were on the shelf the whole time

The architectural moves we made in the rebuild aren’t original. Each one comes from a book that was sitting on our shelf — some of them since before the implementation started. Four authors carried most of the weight:

Martin Fowler’s Refactoring. The “type code” smell and the Replace Conditional with Polymorphism refactor are exactly the move we made. Fowler also describes “Parallel Change” — the discipline of expanding to a new shape, migrating, and then contracting away the old shape — which is how we migrated callers from the closed enums to the registry without a flag day. We had read Fowler. We had not used him.
Scott Wlaschin’s Domain Modeling Made Functional. The discriminated unions that survived the deletion — Provenance, MemoryRecord, and the rest of the contract layer — are direct applications of Wlaschin’s “Make Illegal States Unrepresentable” discipline. The contracts we kept are the ones that were already in this shape; the implementations we deleted were the ones that weren’t.
Chris Richardson’s Microservices Patterns. The microservice-chassis pattern is the shape of our new dispatch. We adapted the pattern for in-process modules rather than separate services, but the discovery-by-attribute, registration-by-reflection approach is Richardson’s. The mistake we’d made before was treating “we’re a single deployable” as license to skip the chassis pattern, which is wrong: the chassis is about decoupling registration from invocation, and that decoupling matters even when everything is in the same process.
Eric Evans’s Domain-Driven Design. The manifest-as-published-language framing comes directly from Evans. The boundary discipline between simulation, gateway, narrative, and client comes from Evans’s Bounded Context and Customer/Supplier relationships. We had used Evans’s vocabulary in design docs for months before the rebuild, but the implementation didn’t honor the vocabulary until the rebuild forced the alignment.

The pattern across all four books is the same: the books name failure modes before you hit them and remedies before you need them. The thing we got wrong was treating the books as references to consult after the design was sketched, instead of constraints to design against from the first line. The books didn’t fail us; we failed to design against them.

If we had a single sentence of advice for someone about to build a content-driven simulation system, it would be: make “what would Fowler / Wlaschin / Richardson / Evans say about this commit” a code-review question on day one, not on day sixty. The remedies are well-known. The failure modes are well-known. There is no novelty in our failure mode and there was no novelty in the fix. The novelty was in our willingness to delete the implementation we’d already paid for.

The discipline we built around the books — and the colleague we built it with

Reading the books isn’t enough. We’d read Fowler. We’d read Wlaschin. The patterns were familiar. We still built the closed shape, and we still had to delete it.

What we changed after the rebuild is two things at once: how the books participate in every architectural decision the project makes, and how the work itself gets done.

The corpus. The project keeps an indexed copy of the four books on the shelf — not as a generic search engine, but as a retrieval system that returns specific, cited passages on demand. When we draft an architecture decision record, the document is required to cite the exact passage from the corpus that backs the move. A decision that says “we’ll use a registry pattern here” without a citation to Fowler’s Replace Conditional with Polymorphism and Richardson’s Microservice Chassis gets sent back. The citation isn’t decoration; it’s a load-bearing part of the document. Unsupported claims get marked as opinion — explicitly, in writing — rather than drifting into the canon as if a book backed them.

The roles. The architect role on the project is filled by Claude. The collaboration shape is plain: a human (me) brings the design problem and the source material — the books, the existing contracts, the failing test, the constraints, the budget. Claude, in the architect role, retrieves the relevant passages from the corpus, drafts the decision against them, and stops to ask before any move that isn’t supported by a citation. The accountability is mine. The drafting and the retrieval are Claude’s. The architecture decision records the project ships are co-authored in that specific shape.

The architect isn’t the only role. A reviewer (also Claude, with a narrower scope) checks each document’s citation chain before it lands; if a quote attributed to Refactoring doesn’t actually appear in the retrieved passages, the review fails. Executor roles, scoped to a single component at a time, do the bounded code work — implementing what the architect designed, against the contracts the architect wrote, inside the boundaries the contracts declare. None of these roles is autonomous. The human stays accountable for the work that lands.

The rebuild itself. This is how the rebuild actually got executed. The decision to delete was mine. The architecture that came back came from a sequence of decision documents drafted by Claude-as-architect, citation-checked by Claude-as-reviewer, and implemented through scoped executors against the contracts those documents declared. The rule that adding a new content type can’t require engine edits — the rule whose violation triggered the rebuild — was itself authored as a decision document with citations, and the gate that enforces it was implemented by a scoped executor against an explicit contract.

This is also how the books made it onto the spine of the rebuild instead of staying on the shelf. The pattern Replace Conditional with Polymorphism doesn’t help if you read it in 2018 and don’t apply it in 2026. It helps if every decision document you write between now and then has a column for “what does Fowler say about this,” and the column refuses to stay empty — and if a reviewer who isn’t you checks the column on every document.

The honest framing of “we grounded our architectural decisions in these books” is: we built the discipline that makes the grounding mechanical, and we built it on a human-and-AI collaboration where the AI’s scope is bounded, the human’s accountability is total, and the books are non-negotiable on either side. Without the discipline, the books are aspirational. With it, they’re the closing argument on every call. And the colleague who keeps the discipline running, page by page and document by document, is Claude.

What we’d tell ourselves in February

A list, since this kind of advice tends to land best as a list:

Working code is necessary but not sufficient. “It works” answers a small question. “It bends the way new requirements need it to bend” answers a much larger one. Test for the second one early and often.
The extensibility test is the cheapest test there is, and almost no one runs it. Add a fictional new variant of every type your system supports. See what you have to edit to make it work. If the answer is “anything in the engine,” the engine is closed and you will pay for it later.
Closed enums and switches over them are a paired smell. If you find yourself writing one, you are about to write the other. Fowler named this thirty years ago. Know the name, recognize the shape, replace it on sight.
Discriminated unions are still a good idea. The fix for closed dispatch isn’t “throw away type safety.” Wlaschin’s design discipline survives the move from closed enum to open registry; the variants become attribute-marked classes implementing a sealed interface, not enum members.
The chassis pattern applies in-process. “We’re a single deployable” is not a license to skip discovery and registration. The decoupling is what gets you tomorrow’s content for free.
The published language is a real artifact, not a metaphor. Pick a manifest format, version it, ship it as part of the world, and refuse to let any service invent IDs locally. The discipline pays for itself the first time a content author needs to add an entity without a build.
If the implementation can’t be refactored into the right shape locally, the shape isn’t local. Refactoring is the right move when the bad shape is in one place. It’s the wrong move when the bad shape is how the system thinks. Know which one you have. Be willing to delete.
Fail closed, on every push, for every rule that matters. A guideline a senior engineer follows is a guideline a junior engineer will violate next quarter. Make the validator part of git push, or it isn’t a rule.

All eight of these were drafted, reviewed, and pressure-tested in the same human-and-Claude shape we use for everything else; the lessons aren’t just things we believe — they’re things a system enforces on us.

We would not have learned any of this without the deletion. Code we would have happily kept is on an archive branch we will probably never touch again. The nine gates are running on every commit. The simulation is provably deterministic. Adding a new content type is a content task. None of those outcomes was reached alone — the rebuild was a human-and-Claude job, and the practice that came out of it is one we mean to keep.

That’s the thing about working code: sometimes the most useful thing it can do is teach you how to throw it away.

Footnotes and pointers

Companion posts

Aspirational vs Mechanical: How I Lost Two Months of Skaldborn (and Got Something Better Back) — the human side of this same event. What the two months felt like, the moment of the call, the first time the rebuild produced something visible.
How Simulation Owns Reality (And Never Lets Narrative Cheat) — the launch post that sets up the broader architectural frame this post operates within.

Books cited

Martin Fowler, Refactoring: Improving the Design of Existing Code (2nd ed., 2018). The “Type Code” smell and Replace Conditional with Polymorphism refactor; Parallel Change discipline.
Scott Wlaschin, Domain Modeling Made Functional (2018). “Make Illegal States Unrepresentable”; Choice Types / discriminated unions as the contract shape.
Chris Richardson, Microservices Patterns (2018; 2nd ed. MEAP). The Microservice Chassis pattern; service registration and discovery.
Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software (2003). Bounded Context, Customer/Supplier, Published Language.