April 15, 2026

The AI Training Graveyard

Most blog posts about game AI are written from the destination — here's the model we shipped, here's how it works. Jelmata has those posts too. This one is about everything that came before.

Before Jelmata's current opponent existed, there was a week in early March where the training pipeline got rewritten five times. One approach survived for ninety minutes. Another briefly became the Hard difficulty and was rolled back the day after. A model with thirty times more parameters than what finally shipped was deleted in a single commit. Most of the code that tried to build Jelmata's AI doesn't exist anymore. Here are the three lessons that do.

The obvious way to make a game AI stronger is to have it look a few moves ahead every turn. That works in chess. It doesn't work on a phone running a small game that has to feel instant. Every version of Jelmata's AI that tried to search during your move felt slower without feeling obviously smarter, and got rolled back within a day.

What shipped does the searching once, long before you ever open the app. A much bigger, slower program plays millions of games against itself while it's being trained. The smaller opponent you play against then learns to imitate the big one. By the time Elite lands on your phone, all the thinking is baked in. Your turn stays fast, the AI stays strong.

There were twenty-four hours where Hard was a neural network. It seemed like a fine idea at the time — Hard is the last difficulty before Elite, and a neural net was obviously going to be the strongest thing we could ship. But it collapsed the whole difficulty ladder. Once Hard was a neural net, Elite had to be “the same neural net but somehow better,” and nobody could say what that meant. Every tier started to feel like the same opponent at different loudness settings.

Rolling Hard back to a hand-tuned set of rules is what gave Elite somewhere to stand. Hard now plays a single clear strategy you can learn and beat; Elite plays a neural network that inherited the strength of a much bigger teacher. Four difficulties, four genuinely different opponents — that only works when the tiers under your top tier aren't the same thing in miniature.

One of the approaches we tried had roughly thirty times more parameters than what eventually shipped. It didn't play better. Most of those extra parameters turned out to be learning indistinguishable versions of the same idea, and the bigger model was just harder to reason about, debug, and export.

What replaced it was two small weight vectors — one for the opening, one for the endgame — blended by how full the board is. Smaller, more interpretable, and ever so slightly better in practice. Every time we tried to throw more machinery at the problem that week, something smaller ended up winning.

Why this is on the blog

There's a version of “we built an AI for our game” that skips all the failed experiments and just shows the final architecture. Every game dev writes that post. The more useful version — and honestly the more fun one — is the tour of the things that didn't work and what they taught. The shipping AI is what it is because of the dead ends, not in spite of them.

For developers

The full engineering writeup — commit hashes, the 1,289-line file called alphazero.py that wasn't AlphaZero- anything, the ninety minutes MCTS spent in the repo, the 832-parameter phased linear model that got deleted in a single diff, and the two trainers that survived to produce the shipping weights — lives on the Island & Pine studio blog: The AI training graveyard: CMA-ES, REINFORCE, a phased linear model, and the 28 weights that survived.

The AI Training Graveyard