How simulation owns reality (and never lets narrative cheat)
A player walks into a tavern in Stormfjord. The keeper, an NPC, looks up and says: “My brother died at sea last winter. The currents took him near the Veyr coast.”
The world, in fact, has no record of a death at sea last winter. There was no winter expedition. There is no brother. Nobody named him. No grieving entry was written into the keeper’s family record. The line came from a generative model that was given a system prompt about taverns and grief and produced a sentence that sounded right.
The player asks the next NPC about the keeper’s brother. That NPC has never heard of him — but the model, trying to be helpful, writes the second NPC into the same fiction. Now two NPCs share a story that the world never produced. By the time the player has talked to four NPCs, the world’s history is a tangle of contradictions the simulation never emitted.
This is the failure mode that happens when you let the AI author state. Skaldborn is built so it can’t.
The rule, in three sentences
The rule is three lines:
Simulation owns reality. Narrative interprets reality. AI assists storytelling.
Three lines. Every component contract in the codebase is required to inherit them. They aren’t a mission statement. They’re a fence around what each layer is allowed to do.
The strongest formulation:
The simulation produces the world. All other systems interpret, present, and enrich that world without controlling it.
Operationally, who’s allowed to write what?
- The simulation is the only system that may modify world state.
- Events are the canonical record of what happened. They’re append-only, immutable, and they carry an
event_id, thetickthey were produced at, and the manifest version they replay against. By design, every event also carries a causation reference back to the event that produced it; the implementation of that field is queued for the next phase, but the contract is canon. - The narrative service, the AI gateway, the memory service, and every client read events and may produce derived projections — chronicles, rumors, NPC memories, UI state. None of them may write back into world state.
In words: there is one writer, and many readers. The writer is deterministic. The readers are interpreters.
Why “AI in games” keeps breaking without this rule
You don’t have to take the rule on faith. The shipped AI-narrative game systems of the last few years demonstrate what happens without it — and they break in remarkably consistent ways. Two patterns dominate the public record.
Pattern 1 — fact invention. The LLM authors world content that doesn’t exist.
The clearest catalogue is Reed Berkowitz’s AI Powered NPCs — Hype, or Hallucination? from December 2023. Berkowitz lays the pattern out in plain terms: a model “might suggest meeting for coffee” in a world where there is no coffee, no character can leave the bar, and there’s nothing next door to walk to. Or worse — an NPC invents “the dark swamp to the south inhabited by witchlings and a mysterious magic sword,” and the player spends a real-world hour searching for content that never existed.1
The barkeep example in the same piece is sharper. The model has the barkeep befriend the player and invite them home for dinner and a game of Rutanny — “which is normal in a chat situation but can ruin immersion in a game. Because after the barkeep says this he just stands there. Because the barkeep isn’t programmed to be able to leave his bar. He isn’t programmed to walk around freely. Even if he could, there is no house created for him to go to. And there is no family. And there is no kind of game called Rutanny.”1 The dialogue layer wrote a fact the world layer couldn’t honor. The fiction breaks at the seam between them.
Pattern 2 — coherence collapse. The model drifts from earlier facts as the conversation grows.
Yuan Gao’s 2020 AI Dungeon playthrough is the canonical writeup. Gao’s protagonist checks his pockets and finds “two bits, a silver dollar, a keycard for a hotel room.” A few scenes later he has nothing in his pockets. Then he discovers “a few bits, a nice looking house key, a small notebook and your smart phone.” The central plot device — a man with red eyes — is forgotten entirely; a random blonde character takes over the story and kills the protagonist repeatedly until Gao manually retcons the death as a dream sequence. Gao’s verdict: AI Dungeon “struggles to build a consistent world with consistent lore, and cannot (yet) hold a plot line very well,” and games “play like fevered dreams, with ideas and events shifting.”2
The pattern persists. AI Dungeon’s own help docs describe the mechanism in production: when context exceeds the model’s window — “around 4000 tokens for free players” — “the AI loses its ability to look back and reference certain parts of the story” and “it’s just making it up as best it can.”3
It isn’t only text adventures. Mantella, the Skyrim AI-NPC mod, summarizes each conversation on exit so it fits in the next prompt — and a fork called Pantella shipped specifically to replace that summary memory, with its README writing: “No more lossy summaries that forget details because the LLM doesn’t know to include them/can’t shove everything into one paragraph.”4 Replika users on r/Replika report the same shape of failure across long-running accounts: “characters losing names, dropping inside jokes, and asking things they should already know,” driven by a memory architecture that “compresses or trims older context for some accounts.”5 Character.AI users hit it too — one widely-quoted complaint reads simply: “The AI… tends to forget previous messages… leading to inconsistencies.”6
The unifying observation comes from a January 2024 paper that formalizes the argument. Xu, Jain, and Kankanhalli, in Hallucination is Inevitable, write: “we formalize the problem and show that it is impossible to eliminate hallucination in LLMs… LLMs cannot learn all the computable functions and will therefore inevitably hallucinate if used as general problem solvers.”7
You cannot prompt-engineer the failure away. You cannot fine-tune it away. You can only bound it — by keeping the AI on the consumer side of an authoritative event log it isn’t allowed to write to. The fix is structural, not behavioral.
The three layers
Here’s the shape, top to bottom:
graph TD
Client["Client (browser)<br/>displays state, sends commands"]
Gateway["Gateway (network edge)<br/>validates and routes commands"]
Sim["Simulation (sole writer)<br/>decides outcomes, emits events"]
EventLog[("Authoritative event log")]
Narrative["Narrative<br/>prose, chronicle, LLM"]
Memory["Memory<br/>NPC episodic, belief"]
Projections["Projections<br/>read models, query API"]
Client -->|"commands (intent)"| Gateway
Gateway -->|commands| Sim
Sim -.->|"events (authoritative)"| EventLog
EventLog -->|read-only| Narrative
EventLog -->|read-only| Memory
EventLog -->|read-only| Projections
Sim -->|state + events| Gateway
Gateway -->|broadcasts| Client
Two arrows matter:
- Down (commands and events): the client sends intent — “I want to move here” — and receives outcomes — “you moved here.” The client never sees its own intent reflected back as truth; the simulation decides whether the move happened, and what it looks like in the world.
- Out (the event fan): every downstream system — narrative, memory, projections — reads from the authoritative event log. None of them writes back. The arrows go one way.
This is a familiar pattern (event-sourced architectures, CQRS), but in a generative-AI game it’s load-bearing in a way it usually isn’t. The standard event-sourcing argument is “we want auditability and replay.” Ours is stronger: the AI is downstream of the truth, period, and we will refuse to ship code that violates that.
Two more facts matter:
- Single writer per shard. Each part of the world — we call them partitions — has exactly one simulation owner. No two simulations ever try to mutate the same entity.
- Partition-local determinism. The same input on the same shard at the same tick produces the same output, every time. This is the property that makes replay safe.
That second one — determinism — is what makes the entire architecture defensible. It’s what turns “AI cannot author world state” into a typing discipline rather than a prayer.
The receipts: three places the rule is enforced
So how does the codebase actually keep narrative from cheating? Three layers of enforcement, from runtime evidence down to the type system.
Receipt #1 — the determinism test
In our test suite there’s a small validator called Skaldborn.Simulation.Movement.Validation. It does this:
- Build a fresh simulation with a known content fixture.
- Run a 10-command sequence against it.
- Hash the resulting event log. Hash the resulting world state. Record both.
- Build a second fresh simulation with the same fixture.
- Run the same 10-command sequence.
- Assert: the two event-log hashes are byte-identical. Assert: the two final positions match.
If anything in the simulation depended on wall-clock time, system entropy, network ordering, or — and this is the load-bearing case — AI-generated content sneaking back into the simulation path, the hashes would diverge and the test would fail.
This is the simplest possible demonstration of the rule. Same commands → same events → same world. The simulation is a pure function of its input, and we have a test that says so.
Receipt #2 — the determinism gate
Determinism in tests isn’t enough. Code drifts. Someone reaches for DateTime.UtcNow because they need a timestamp; six months later we have a heisenbug nobody can reproduce. So we don’t trust ourselves — we run a grep gate on every push.
The gate is two files. The first is the rule set, a JSON file:
{
"rules": [
{
"id": "SD001",
"pattern": "Guid.NewGuid",
"rationale": "Non-deterministic identity. Replay would produce different IDs.",
"approved_alternative": "IDeterministicIdGenerator or SHA256 content hash"
},
{
"id": "SD002",
"pattern": "DateTime.Now|DateTime.UtcNow",
"rationale": "Wall-clock time breaks replay determinism.",
"approved_alternative": "Inject IClock; pass tick-derived time"
}
// ... SD003 (Random), SD004 (Task.Delay/Thread.Sleep),
// SD005 (System.IO/System.Net), SD006 (Stopwatch)
]
}
The second is a shell script — about thirty lines of bash — that greps every C# file under the simulation namespace against the rule set, and exits non-zero on any match. It’s wired into the pre-push git hook.
The point isn’t the cleverness of the script. The point is: the rule isn’t just documented; it’s a machine-readable file plus a small validator that runs on every push. If a future engineer (or a future LLM-assisted refactor) tries to slip non-determinism into simulation code, the push fails before the code lands.
This is the layer that converts a written rule into a property the codebase has. It’s also boring on purpose — boring is what makes governance survive contact with deadlines.
Receipt #3 — the type system
The strongest enforcement happens before the code compiles. Here it is, from the spec for our HistoricalRecord type — what characters remember:
public abstract record Provenance;
public sealed record EventBacked(
EventId SourceEventId
) : Provenance;
public sealed record BeliefConsensus(
EventId PromotionEventId,
PropositionHash PropositionHash
) : Provenance;
public sealed record Legend(
IReadOnlyList<EventId> SourceEvidence,
LegendBasis Basis
) : Provenance;
Read it carefully. Provenance is the field that records where this memory came from. There are three flavors. Every flavor takes at least one EventId.
You cannot construct a BeliefConsensus without supplying a PromotionEventId. The type design refuses; once the implementation lands against this contract — scheduled for the next phase — the compiler refuses. There is no shape of Provenance that doesn’t trace back to canonical events.
So when a character “remembers” something, that memory is, at the type level, anchored to a real event the simulation actually emitted. The memory can be partial, mistaken, biased, distorted by faction or fear — those are valid memory states — but it cannot be invented. The type system won’t let it.
The rule is mechanically enforceable — not by prompt engineering, code review, or guidelines, but by a contract the C# compiler will refuse to compile against. The contract is canon today; the implementation is queued for the next phase. The compiler, unlike code review, is unimpressed by deadlines and unmoved by clever arguments.
What actually happens when an LLM tries to cheat
Walk back to the tavern. Imagine the dialogue model produces a plan claiming the keeper’s brother died at sea last winter. Here’s what happens, step by step, in our pipeline:
-
The plan is structured, not free-text. The model’s output is a JSON object with
intent,target_entity_ids,referenced_facts,tone, and a few other fields. The model can’t just emit prose. It has to commit to a specific intent and specific entity references. -
Every entity reference is looked up. The validator walks
target_entity_idsagainst canonical world state. The keeper’s brother doesn’t exist as an entity. The plan is rejected. -
Every fact reference is looked up. The validator walks
referenced_factsagainst the canonical event log. There’s no event matching “death at sea, winter, Veyr coast” anywhere in the keeper’s family causal chain. The plan is rejected. -
No narrative event is emitted. Because the plan failed validation, no
narrative.dialogue.responseevent is written to the authoritative event store. The rendering step never runs. The line never reaches the player.
The model never gets to “say” the false thing. The hallucination doesn’t survive contact with the canonical store. There’s no compensating action, no retcon, no save-file fixup — the false fact never made it past the boundary in the first place.
The dialogue spec is explicit about a corollary:
The whole game keeps working without the model. NPCs don’t stop existing, the world doesn’t pause, fights still resolve, economies still run. What goes away is the prose. The translator goes offline; the world stays.
“AI assists storytelling” — that’s the specific verb. Not participates in storytelling. Not co-authors the world. Assists. When the assistance is unavailable, the thing it was assisting is unaffected. That’s the test.
The honest status: what ships today, what doesn’t
Some of what we’ve described is running in production-shape on main right now. Some isn’t. We want to be straight about which is which, because it’s part of the story of how we got here.
- The simulation core — world state, event envelopes with deterministic identity, the tick loop, the manifest loader — runs today. Test fixtures replay deterministically. The Movement Validator hashes match.
- The gateway and the browser client — the boundary that enforces “client never writes state, server never trusts client” — run today. The reflection-based event-routing they use is a recent rewrite (more on that in a moment).
- The content pipeline — the build-time tooling that produces the world’s content manifest — runs today.
- The narrative service, the dialogue pipeline, and the NPC memory pipeline — do not run on
maintoday. The contracts are frozen, the type designs are accepted, but the implementation isn’t there.
Earlier this year we had all three of those services implemented. They worked end-to-end. Player typed, NPC replied, memory got recorded, the chronicle service rendered prose. It looked like a finished thing.
But in February we ran an extensibility test against the implementation. The test was simple: can a content author add a new kind of memory, or a new kind of narrative template, without modifying engine code? The answer was no. The implementation had hardcoded enums for fourteen narrative-template kinds and five memory variants. Every dispatch path was a switch over those enums. Adding a new kind meant editing four files in the engine and shipping a new build.
We had a rule — “content extensibility shouldn’t require engine edits” — and the implementation didn’t satisfy it. Two paths to recover:
- Soften the rule.
- Delete the implementation.
We deleted the implementation. The contracts survived; the code didn’t. Several thousand lines of working code, and we put it in an archive branch and started over.
The new implementation will use a registry pattern (attribute-driven discovery, no closed enums) so adding a memory variant or a narrative template is a content-only change. It’s queued for a later development phase. In the meantime, main ships with the simulation core, the gateway, the client, and the content pipeline — the parts that own truth. The parts that interpret truth are deliberately offline until we can rebuild them under the rules they were supposed to obey.
This isn’t a recovery story we’re embarrassed about. It’s the story we wanted to tell. The whole point of the architectural frame in this post — simulation owns reality — is that the rule is more important than the implementation. We caught an implementation that didn’t deserve the rule, and we fixed the implementation, not the rule.
A new layer of enforcement (just shipped)
The most recent enforcement layer landed earlier this week.
The pattern we wanted to prevent looks like this:
// In the browser client. Hypothetical. Don't do this.
const playerEntityId = "player-001"; // hardcoded literal
moveCommand({ entity: playerEntityId, x: 4, y: 7 });
That code looks innocent. It’s also wrong. The browser client doesn’t get to decide what the player’s entity ID is; the server owns identity. If the server’s notion of identity ever changes shape (hashed IDs, longer IDs, namespaced IDs), the client’s hardcoded literal is now lying to the simulation.
The fix is a contract called authority bindings, ratified earlier this week and running on every push today as a contract-plus-validator pair. Every component contract in the repo now has to declare, machine-readably, which fields it consumes from where:
{
"authority_bindings": [
{
"binding_id": "player-entity-id",
"consumed_concept": "player-entity-id",
"authority_source": {
"kind": "seam_payload",
"seam_id": "session-authenticated-reply",
"field": "character_entity_id"
},
"binding_site": {
"expected_in": "src/skaldborn-client/src/network/auth-handler.ts",
"fingerprint": "this._state.playerEntityId = msg.character_entity_id"
},
"forbidden_literals": ["\"player\"", "\"player-001\""],
"rationale": "Player entity ID is server-authoritative; the client must read it from the authentication reply, not invent it."
}
]
}
There’s a grep validator that runs on every push. It checks:
- Does the file at
expected_inactually exist? - Does it contain the
fingerprintsubstring? - Does the codebase contain any of the
forbidden_literalsoutside test fixtures?
If any of those checks fails, the push fails. Static analyzers (Roslyn for C#, ESLint for TypeScript) are queued to take the same checks deeper, into expression analysis.
This contract is what we built to prevent a specific class of bug from being possible to ship. A bug landed a few weeks ago where the client was holding onto a literal placeholder ID when the server authentication reply changed shape. The simulation kept running correctly. The client kept showing pixels. But every move command from the client was silently dropped because the entity ID didn’t match anything the simulation knew about. The authority of the simulation was preserved — that’s why nothing crashed — but the client’s lie was invisible until a player tried to walk and nothing happened.
The authority-bindings contract is the rule that says: every authoritative ID in a component has one source, declared, in writing, with a fingerprint check. You can’t hardcode it locally. You can’t keep a backup copy “just in case.” There is one root, one binding, one source-of-truth.
What’s next
Some of this is familiar shape: event log, projections, single-writer per partition, deterministic replay. We didn’t invent it. What’s unusual is how tight the rule runs — all the way through the AI — and the boring infrastructure to keep it tight under deadline pressure. The non-obvious bit is the LLM-offline-correctness invariant. A lot of “AI in games” architectures are honest event-sourced underneath but have an asterisk: except for the AI bits. No asterisk here. The simulation is the simulation, with or without the model. The model is a translator. Translators are nice to have. They are not load-bearing.
This is the first post in the launch arc. Up next, two paired companions on the early-2026 rebuild: Aspirational vs Mechanical (story) and Why we deleted two months of working code (engineering). After those, the launch arc continues with deterministic event sourcing at the partition level, then a tour of the sixteen simulation systems, with a shorter post on the manifest model in between.
If you want to follow along, subscribe via the form at the bottom of any page — one short email when the next post lands. If you want to argue, write to devlog@skaldborn.com.
Everything else is the boring engineering of making it true.
Footnotes
-
Reed Berkowitz (“Rabbit Rabbit”), AI Powered NPCs — Hype, or Hallucination? Curiouser Institute on Medium, December 9, 2023. medium.com/curiouserinstitute/ai-powered-npcs-hype-or-hallucination-11ddfc530e33 ↩ ↩2
-
Yuan Gao (Meseta), I attempt to play a coherent story in AI Dungeon: Attempt 1, a noire-future-fantasy/mystery. Medium, December 6, 2020. meseta.medium.com ↩
-
AI Dungeon Help Center, Why does the AI forget or mix things up? help.aidungeon.com ↩
-
Pantella (Pathos14489/Pantella, GitHub) — fork of Mantella with a ChromaDB-based memory replacement for the original’s summary-based memory. README quote verbatim from the ChromaDB Memory Manager section. github.com/Pathos14489/Pantella ↩
-
RoboRhythms, Why Replika Memory Suddenly Stops Working for So Many Users. roborhythms.com/replika-memory-broken-fix ↩
-
Lark Birdy, Negative Feedback on LLM-Powered Storytelling & Roleplay Apps, Cuckoo Network blog, April 17, 2025. Quote verbatim from the “Technical Limitations in Storytelling Bots — Context/Memory Limits” section. cuckoo.network ↩
-
Ziwei Xu, Sanjay Jain, Mohan Kankanhalli, Hallucination is Inevitable: An Innate Limitation of Large Language Models, arXiv:2401.11817 (submitted January 22, 2024; revised February 13, 2025). arxiv.org/abs/2401.11817 ↩