← Blog

Does Jelmata's First Player Have an Advantage?

Does Jelmata's First Player Have an Advantage?

Most symmetric abstract strategy games tilt toward whoever moves first. Go hands out a komi — bonus points for the second player — to compensate. Hex is a proven first-player win on any board size. Tic-tac-toe, with perfect play, ends in a draw only because the first player can't quite force a win on a 3×3. Jelmata is square, symmetric, and finite, which puts it in the same family. So does the first player have an edge here too? We ran the numbers.

1

The experiment

We picked eight AI matchups — self-mirrors (Hard vs Hard, Medium vs Medium, Elite vs Elite), cross-skill pairings (Hard vs Medium, Elite vs Hard), and two stochastic recipes that break the usual greedy determinism. Each matchup ran on four board sizes (5×5, 6×6, 7×7, 8×8), and each board was played twice: once with bot A moving first, once with bot B moving first. Twelve games per cell, seeded for reproducibility.

That's 768 games total, 96 per board size. Then we pooled the results and computed the fraction of games the first player won, with draws counted as half a win, and threw a Wilson 95% confidence interval around each number.

2

The headline numbers

  • 5×5 — fairFirst player wins 49.5% (95% CI: 39.7% – 59.3%). Effectively a coin flip. Average score differential: −2.4 in favor of the second player, well within noise.
  • 6×6 — big first-player liftFirst player wins 66.5% (95% CI: 57.3% – 75.8%). That's +16.5 percentage points above fair — the CI doesn't come anywhere near 50%. Average score differential: +16.2 for the first mover.
  • 7×7 — roughly fair, slight second-player tiltFirst player wins 47.0% (95% CI: 37.2% – 56.8%). The CI still crosses 50%, so we can't reject fair play, but the point estimate leans −3.0% and the average score gap is −26.8 in the second player's favor.
  • 8×8 — first-player advantage againFirst player wins 63.0% (95% CI: 53.6% – 72.5%). +13.0 points above fair. And the average score differential is a wild +2,854 — more on that in a moment.
3

A hypothesis about the even-board pattern

The most interesting thing in the table isn't the average advantage — it's that the advantage only shows up on the even-sized boards. 6×6 and 8×8 both give the first player a double-digit lift. 5×5 and 7×7 both sit within a coin flip of fair. Our initial story: odd boards have a unique center cell, but thanks to multiplicative scoring, that cell isn't the prize it would be in Go or Reversi. Even boards have no unique center — they have a 2×2 central pocket, and the first mover gets first pick of it. The mirror rule locks the second player out of the symmetric reply, so the opening asymmetry compounds rather than cancels.

That was a story, not a theorem, so we tested it.

4

The forced-opening experiment

On 6×6 and 8×8 only, we pre-placed the first player's opening stone in one of seven buckets — a corner, a mid-edge cell, a “near-center” cell diagonally adjacent to the central pocket, and each of the four central-pocket cells individually — then handed the game to two copies of elite_cnn_soft_t0_3 (the strongest bot with a touch of softmax noise for variance). 100 seeded games per bucket, plus a baseline where the bot was free to choose its own opening. If the central-pocket hypothesis were right, the four pocket openings should score at least as well as the baseline — better, if the pocket really is where the advantage lives.

5

We were wrong

The central pocket isn't the source of the advantage — if anything, it's the worst place to open. Pooled across the four pocket cells (400 games each board):

  • 6×6 baseline (bot chooses)P0 wins 63.5% (CI 54.2% – 72.7%).
  • 6×6 forced into pocketP0 wins 47.8% (CI 43.0% – 52.6%) — the advantage collapses to a coin flip.
  • 6×6 forced into corner or edgeP0 wins 69.1% (CI 62.8% – 75.5%) — matches or exceeds the baseline.
  • 8×8 baseline (bot chooses)P0 wins 70.2% (CI 61.5% – 79.0%).
  • 8×8 forced into pocketP0 wins 43.7% (CI 38.8% – 48.5%) — the CI doesn't even cross 50%. Forcing the central pocket hands a statistically significant advantage to the second player.
  • 8×8 forced into corner or edgeP0 wins 56.1% (CI 49.3% – 62.9%) — better than the pocket, worse than the baseline.

Within each board, the four pocket cells agreed with each other — all four 6×6 pocket cells landed between 43.7% and 54.3%, all four 8×8 pocket cells between 38.4% and 51.9%. No single “special square” is driving the result; the central pocket as a region is just worse.

6

What the data actually says

The bot's strongest openings aren't in the center. When it chooses freely — the baseline condition — it picks its way into the 63–70% band. When we force it into a corner or edge, it stays close to that band. When we force it into the middle, its score differential goes negative. On 8×8, the average score gap is +1,796 in P0's favor at baseline and −1,207 when the opening is a near-center cell; that's a swing of about 3,000 points from a single forced move.

This matches the rest of Jelmata's folklore. Under multiplicative scoring a cell surrounded by your own stones merges groups and shrinks your product. Corner and edge openings let you anchor multiple independent groups; central openings tie you to a single blob that grows by addition while your opponent's score grows by multiplication.

So the first-player edge on 6×6 and 8×8 isn't about who gets the middle. It's about who gets to choose first — and the bot's preferred choice is near the perimeter, not in the pocket. Why the effect is large on even boards and negligible on odd boards is still an open question; this experiment narrowed the search, it didn't close it.

7

Why the 8×8 score gap is +2,854

On 5×5, 6×6, and 7×7, the average score differential sits in the low tens. On 8×8, it's +2,854.072917 in the first player's favor. That isn't a typo and it isn't outlier contamination — it's multiplicative scoring doing exactly what it's built to do. Scores are products of connected-component sizes, and on a 64-cell board with cleanly split groups, a single game can land above 10,000 points. When one side loses the opening, their scoreboard collapses fast.

This is also the reason we report win rate as the headline rather than score differential. Mean score differentials on 8×8 are dominated by a handful of blowouts, so a Wilson interval on win proportion is the more honest summary of “how unfair is this?”.

8

What we do about it

Two of the four board sizes — 6×6 and 8×8 — show a clear first-player edge in this bot study, which is enough for us to avoid single-game rated matches. 5×5 looked fair in this sample; 7×7 remains uncertain (the point estimate leans slightly toward the second player, but the CI still crosses 50%). Instead of a single game, every rated contest is a two-game mini-match with swapped first player. Bot A opens the first game, bot B opens the second, and the winner is decided by aggregate score differential across the pair. If one side really is stronger, they'll outscore their opponent in both halves — and if the first-move edge is doing all the work, it cancels out.

9

Appendix: the per-matchup view

The headline numbers come from pooling eight different matchups. Inside any single matchup, the story is much noisier — most of our bots are greedy (same board in, same move out), so a 12-game cell usually resolves to 0% or 100%:

  • hard_pure vs hard_pure5×5: 0% · 6×6: 75% · 7×7: 0% · 8×8: 100%
  • medium_pure vs medium_pure5×5: 46% · 6×6: 75% · 7×7: 50% · 8×8: 75%
  • elite_linear_greedy vs elite_linear_greedy5×5: 100% · 6×6: 71% · 7×7: 92% · 8×8: 42%
  • hard_pure vs medium_pure5×5: 50% · 6×6: 75% · 7×7: 50% · 8×8: 50%
  • hard_pure vs easy_pure5×5: 50% · 6×6: 50% · 7×7: 50% · 8×8: 50%
  • elite_linear_greedy vs hard_pure5×5: 100% · 6×6: 42% · 7×7: 50% · 8×8: 42%
  • hard_soft_t0_3 vs hard_soft_t0_35×5: 50% · 6×6: 50% · 7×7: 17% · 8×8: 50%
  • medium_hard_30_70 vs medium_hard_30_705×5: 0% · 6×6: 100% · 7×7: 67% · 8×8: 100%

The story lives in the across-matchup variance, not within any single matchup. Pooling 96 games across these eight heterogeneous bot pairings averages the determinism out and is what gives us the Wilson intervals in the headline table.

10

Caveats

Ninety-six games per board size is enough to separate 66.5% from 50%; it isn't enough to pin down whether 7×7 is fair or mildly second-player-favored. We'd need ten times the volume to shrink the CIs to a couple of percent.

Most of the bots in the pool are greedy — same board in, same move out — so individual matchup cells often come back 0% or 100%. Pooling across eight different matchups and mixing in the stochastic softmax recipes averages that deterministic lumpiness out, but it means the interesting variance lives in the cross-matchup pooling, not inside any single cell.

Finally: bots aren't humans. The mirror rule matters less to a greedy linear model than it does to a person who notices symmetries and wants to exploit them. The first-player advantage for human play on 6×6 and 8×8 is probably real, but the magnitude almost certainly differs from what the bots produced.