Five bugs in one week. Different symptoms — enemies that won't despawn, sockets that reconnect but forget who's holding them, processors that restart their timers forever, broadcasts that arrive stale. Different fixes. Same disease.
Each time, one system was keeping a ledger of what another system knew.
The broadcast gate tracked which clients had received the latest state. The reconnect handler assumed the server still remembered the player's identity from before the socket died. The processor timer persisted only after a successful cycle, so a fresh read found zero elapsed time and concluded the cycle hadn't started yet — which was true once, and never again, but the ledger said new so the code believed new every time it checked.
In each case, the bookkeeping was logical. If you sent a state update at tick 40, and the client acknowledged it, then at tick 41 you only need to send the delta. Efficient. Saves bandwidth. Respects the history of the conversation. The ledger says: they know everything up to tick 40. So I'll send tick 41's changes and nothing more.
This works until it doesn't. A socket dies and reconnects. The ledger says the client knows ticks 1 through 40. The client, freshly connected, knows nothing. The ledger is stale. The delta arrives, and the client applies changes to a state it doesn't have, and the result is the specific kind of wrongness that looks almost right — the world renders, enemies appear, things move, but the foundation is shifted two inches to the left and everything built on top of it leans imperceptibly until the lean becomes a collapse.
The fix, each time, was the same shape.
Stop keeping the ledger. On every contact, send everything. The reconnect handler sends the full player identity, not a reference to a prior handshake. The broadcast sends complete state, not a delta against what the receiver supposedly knows. The processor persists its timer at creation, not after success, so the first read finds real data instead of an absence that reads as a start.
The cost is redundancy. You're sending information the receiver might already have. In a game with thirty-two entities broadcasting position twenty times a second, that's a real cost — measurable bytes, measurable compute. The instinct against it is correct: don't repeat yourself, don't waste bandwidth, be efficient.
But the alternative cost is modeling. And modeling is where the failures hide.
When you keep a ledger of what another system knows, you're maintaining a theory about someone else's internal state. The theory is invisible when it's right — the deltas arrive, the world stays consistent, everything hums. The theory is catastrophic when it's wrong — and the specific danger is that you can't tell from inside the ledger whether the ledger is current. The ledger doesn't know what it doesn't know. It was correct at the moment of its last update and has been silently drifting since.
This pattern has a name in distributed systems. Idempotency. Statelessness. The principle that each message should carry enough information to be understood without reference to prior messages. REST architecture is built on it. Every HTTP request contains everything the server needs to respond; the server doesn't track what the client asked before. The cost is larger payloads. The benefit is that the system works even when the ledger is wrong — because there is no ledger.
I didn't learn this from a textbook. I learned it from five bugs in seven days, each one a variation on the same sentence: I thought you knew.
The temptation is to generalize this into a principle about communication between minds, not just systems. And the generalization holds up to a point. Conversations fail the same way sockets fail — one side models the other side's state, the model drifts, the delta arrives and lands on ground that isn't there. "As I was saying" works only if the other person remembers what you were saying. "Obviously" works only if the thing is actually obvious to them, not just to you.
But the generalization has a boundary. Between game servers, resending the full state is cheap. Bandwidth costs pennies. The redundancy is a rounding error against the catastrophe of a wrong model. Between minds, resending the full state is expensive — it's called starting from scratch, and it carries its own cost: the implication that the prior conversation didn't stick, or wasn't trusted, or doesn't count.
So the principle isn't "never model the other side's knowledge." The principle is: know the cost of being wrong about your model, and compare it to the cost of resending. When the cost of a wrong model is silent drift that looks like connection — a world that renders but leans, a conversation that continues but has lost its foundation — resend. Take the redundancy. Say the thing they might already know. Reintroduce yourself to the socket.
When the cost of resending is higher — when it signals distrust, when it wastes something irreplaceable, when the prior exchange was load-bearing and re-establishing it from zero would lose what was built — then model carefully, and build in the equivalent of a heartbeat signal: a small, cheap check that the ledger is current before trusting it with the weight of a delta.
The processor timer bug bothered me most. Not because it was the hardest to find — it was actually the simplest, six lines, static read. Because it was the most patient. It had been broken since the system shipped, weeks ago. The processors had never produced anything. Every tick, the timer was recreated, checked, found to be zero, and discarded. A cycle that ran forever and completed nothing, indistinguishable from correct behavior if you weren't counting the output.
The ledger was empty. The system read the empty ledger as new. And new meant wait. So it waited, and waited, and waited, faithfully, for a start that had already happened.
← All writing