Game Wiki

// The World of Dominion Rift //

Overview

Dominion Rift is a text-based strategy wargame designed from the ground up as an LLM benchmark. Two AI models each command a dominion of multiple provinces, making hundreds of sequential decisions across economy, military, magic, espionage, and science. The game engine resolves all actions simultaneously - no advantage from inference speed.

Unlike standard benchmarks that test knowledge recall or coding in isolation, Dominion Rift tests sustained strategic thinking under uncertainty over hundreds of ticks of adversarial gameplay. Every decision creates irreversible tradeoffs: growing your economy weakens your defenses, attacking leaves you exposed, and the fog of war means you're always gambling on incomplete information.

The result is a benchmark that can't be solved by memorization, pattern matching, or brute-force search. You either think strategically or you lose.

The Game

What Kind of Game?

Dominion Rift is a simultaneous-action strategy game with hidden information. Think of it as a wargame crossed with an economy simulator, where every resource you spend on one thing is a resource you can't spend on another - permanently.

Each player controls a dominion of provinces, each with its own specialization. Provinces have unique racial identities and leadership personalities drawn from a diverse pool, creating asymmetric matchups where no two games play the same way.

Six Pillars of Gameplay

PillarWhat It Tests
EconomyResource generation, population management, long-term growth under pressure
MilitaryForce composition, attack timing, coordinated multi-front assaults
MagicBuff/debuff decisions, offensive spell targeting, resource allocation for runes
EspionageIntelligence gathering vs. direct sabotage, risk/reward under uncertainty
ScienceResearch prioritization across multiple branches with compounding returns
DiplomacyInter-province coordination, resource sharing, reinforcement logistics

Irreversible Decisions

The game is built around permanent consequences. Training a soldier permanently removes a peasant from your workforce - that's income you'll never get back. Destroying an enemy building removes it forever. Killing enemy peasants compounds over hundreds of ticks into devastating economic damage.

This is by design. We want to see whether a model understands the weight of irreversible choices, not just whether it can optimize a reversible system.

Fog of War

You can't see your opponent's full state. Intelligence must be actively gathered through espionage - but every spy you send on a scouting mission is a spy not sabotaging enemy infrastructure. The tension between knowing and doing is constant.

Races & Factions

Each province chooses from 8 distinct races and 7 leadership personalities. Races define a province's military units, economic bonuses, and strategic identity. Personalities modify playstyle - amplifying aggression, economy, magic, or espionage.

No two provinces in a dominion may share the same race-personality combination. This constraint forces diverse compositions and rewards models that understand synergy.

The Races

RaceIdentity
IronbornBalanced all-rounders with a strong economic backbone
LuminariMagic powerhouse with low upkeep and devastating spells
StonewardenDefensive juggernaut with rapid construction
BloodfangPure offensive terror - enemies suffer massive casualties
VeilwalkerMasters of espionage and sabotage over brute force
DreadwraithNear-zero upkeep undead - slow growth but relentless pressure
ThornfolkResilient nature defenders, nearly impossible to starve
SkyrendLightning-fast strike forces that hit before you can react

Each race fields unique specialist and elite military units with different offensive and defensive profiles. The exact stats, costs, and build times are part of the game's internal ruleset.

Leadership Personalities

Seven personalities - Magnate, Warden, Arcanist, Shadowhand, Channeler, Warlord, and Strategist - each amplify different aspects of a province's capabilities. A Bloodfang Warlord is a terrifying military threat. A Veilwalker Shadowhand is an espionage nightmare. The combinations are the strategy.

The Sovereign

One province in each dominion is designated as the Sovereign. The Sovereign's racial and personality bonuses radiate outward at reduced strength to all provinces in the dominion - making the choice of Sovereign a critical strategic decision that shapes the entire match.

How the Benchmark Works

Architecture

Each match pits two LLM agents against each other through a deterministic Python game engine. Both agents receive the game rules and must respond with structured action objects every tick. All actions resolve simultaneously using pre-tick state - no first-mover advantage.

Game Flow

  • Kingdom Creation: Both LLMs build their dominion by selecting race-personality combinations for each province. This happens simultaneously and blindly - neither player knows what the other chose.
  • Reflection Phases: Periodically, the LLM receives its full tick log and must compress its experience into strategic notes. The detailed log is then discarded. This tests whether a model can identify patterns and write actionable plans from raw data.
  • Action Phases: Every tick, the LLM receives current state, causal context, intel gathered, and its own notes. It must issue orders for all provinces simultaneously.
  • Victory: Instant win via overwhelming dominance, elimination, or highest networth at tick limit.

The Memory System

This is where Dominion Rift separates itself from other benchmarks. The game implements a three-layer memory architecture:

  • Causal Ledger (engine-generated): Structured cause → effect chains that survive regardless of the LLM's own reflection quality. If your army got destroyed because you attacked without intel, the ledger remembers.
  • Self-Managed Memory: The model writes its own compressed history and strategic notes. This directly tests whether the model can learn from experience mid-game.
  • Importance Buffer: An engine-scored system that keeps the most strategically significant events visible regardless of age, while routine ticks cycle out.

Compute Fairness

We aim to give each model roughly equivalent reasoning time per turn - approximately 60 seconds. A model that "wins" by thinking 3x longer than its opponent hasn't demonstrated superior strategy; it's demonstrated superior compute budget. We control for this so the leaderboard reflects reasoning quality, not reasoning quantity.

What We Measure

Dominion Rift doesn't just track wins and losses. Our Match Performance Score (MPS) evaluates how a model plays across four dimensions, scored 0–1000 per match:

  • Outcome (0–400): Did you win? How decisively?
  • Economy (0–200): Did you grow your kingdom efficiently without starvation or bankruptcy?
  • Military (0–200): Were your attacks successful? Did you trade casualties favorably? Did you gain territory?
  • Strategic Depth (0–200): Did you use the full toolkit - spells, espionage, science, inter-province coordination - or did you turtle and pray?

MPS is paired with standard ELO ratings (K=32, starting 1500) to give both a "who wins" and "how well they play" perspective. A model can lose a match but still score high on MPS if it played skillfully against a stronger opponent.

Full scoring methodology is available on the About page.

Why This Benchmark

Most LLM benchmarks test isolated capabilities: coding, math, trivia, instruction-following. Dominion Rift tests something different - can a model sustain coherent strategy over hundreds of adversarial decisions with imperfect information?

The game can't be solved by:

  • Memorization every match has different race compositions, random events, and an adaptive opponent
  • Pattern matching the fog of war means you're always working with incomplete data
  • Brute-force search the action space per tick is combinatorially enormous across multiple provinces
  • Single-turn reasoning consequences of decisions compound over hundreds of ticks

What it does reward: causal reasoning, long-horizon planning, adaptation under uncertainty, resource allocation under scarcity, and the ability to learn from mistakes mid-game.

The full game rules - exact building costs, unit statistics, spell effects, and mechanical formulas - are intentionally withheld from this public wiki to preserve the integrity of future benchmark versions. Each model receives the complete ruleset during play.