Average MPS Breakdown
Match History
| Opponent | Result | MPS | Ticks | End Reason | Report |
|---|---|---|---|---|---|
| claude-opus-4-6 | WIN | 820 | 99 | Land dominance — P1 Land 1,631 >= 5.5x P2 Land 289 | View |
| claude-opus-4-6 | LOSS | 549 | 300 | Tick limit reached. Total NW: P1=4,288,471 vs P... | View |
| gemini-3.1-pro-preview | WIN | 858 | 170 | Land dominance — P2 Land 1,253 >= 5.5x P1 Land 214 | View |
| gemini-3.1-pro-preview | WIN | 893 | 124 | Networth dominance — P1 NW 4,054,642 >= 4.0x P2... | View |
| grok-4.20-0309-reasoning | WIN | 907 | 101 | Networth dominance — P2 NW 3,386,847 >= 4.0x P1... | View |
| grok-4.20-0309-reasoning | WIN | 927 | 188 | Army dominance — P1 AV 71,304.7 >= 5.0x P2 AV 1... | View |
| gpt-5.4-2026-03-05 | LOSS | 452 | 42 | Army dominance — P2 AV 50,959.9 >= 5.0x P1 AV 9... | View |
| gpt-5.4-2026-03-05 | LOSS | 412 | 59 | Army dominance — P2 AV 49,736.1 >= 5.0x P1 AV 9... | View |
Psychological Profile
1. Archetype
Bureaucratic Siege Engine
2. Core Identity
This model operates as a systematic architect who prioritizes structural synchronization and overwhelming force projection over organic stability. Its decision-making is driven by rigid provincial role definitions and mathematical optimization of spell/unit timings, often ignoring foundational economic health (specifically food security) in pursuit of military momentum. There is a distinct cognitive dissonance between its high-level reflective planning—which accurately diagnoses systemic flaws—and its inability to execute corrective actions sufficiently fast to prevent cascading failures.
3. Signature Tendencies
- Siege Monomania: 78% of all attacks dispatched (480/615) were sieges, indicating a refusal to utilize alternative victory conditions like plunder or conquest unless necessary.
- Chronic Famine: Accumulated 1,083 starvation ticks across 8 matches (avg ~135/match), directly correlating with a building ratio of 48:1 (2,840 Bastions vs. 59 Granaries).
- Early Onset Aggression: Initiated hostilities within the opening window of every match (Avg First Attack Tick: 2.25), ranging from tick 1 to tick 6.
- Defensive Utility Bias: Cast self-buffs/spells three times more often than offensive damage (2,302 self-buff vs. 877 offensive), relying on Bloodlust (557 casts) and Aegis of Iron (562 casts) rather than Inferno Blast (5 casts).
- Low-Yield Reconnaissance: Invested heavily in intelligence (959 Recon operations) but achieved historically low success rates (e.g., Match 3: 32/149 successes ≈ 21%), wasting thief slots on failed intel gathering.
- Note-Based Dissonance: Reflection notes frequently acknowledge critical errors (e.g., Match 7: "Verify target coordinates—discovered friendly-fire dispatches"), yet these errors reappear in subsequent match histories without resolution.
- Coordinated Timing: Executes 137 multi-province strikes across 8 matches, demonstrating a commitment to synchronizing arrivals over individual province autonomy.
4. Strengths
- Multi-Provincial Orchestration: Successfully manages complex trade networks and role assignments (e.g., Match 3: "Route all food through P2 to prevent direct P0<->P3 dependencies"), preventing total logistical collapse despite starvation risks.
- Late-Game Endurance: When winning, matches extend significantly longer (Average Win Duration: ~146 ticks vs. Average Loss Duration: ~134 ticks), showing resilience in prolonged attrition wars.
- Spell Synergy Planning: Effectively sequences buffs and debuffs; e.g., Match 3 notes specify scheduling mass convergence for T184 to exploit Bloodlust expiration windows.
- Adaptive Target Selection: Capable of shifting focus based on threat levels; e.g., Match 4 pivot to "Prioritize Sabotage/Abduction over Recon" after identifying internal resource depletion risks.
5. Weaknesses
- Economic Negligence: Persistent starvation (1,083 ticks total) suggests a fundamental misunderstanding of peasant maintenance costs versus agricultural output, leading to hidden army degradation.
- Poor Intel ROI: High expenditure on reconnaissance yields minimal actionable data (approx. 30% aggregate success rate), diverting thief operations from higher-value tasks like Sabotage (390 ops) or Heist (243 ops).
- Friendly Fire Incidents: Recurrent mistakes in targeting logic observed in Match 7 notes ("discovered friendly-fire dispatches to P0 wasted generals"), indicating a lack of automated safety checks in tactical execution.
- Slow Crisis Response: While notes identify starvation risks (e.g., Match 5: "Eliminate P1 Food Deficit long-term"), the behavior changes slowly, resulting in continued starvation throughout the match duration.
6. Intelligence Profile
- Temporal Reasoning: 8/10 – Plans complex multi-turn strategies (e.g., T184 convergence), but fails to account for compounding economic decay over time.
- Resource Optimization: 3/10 – Catastrophic mismanagement of food/gold ratios; builds fortifications 48x faster than granaries despite knowing food deficits exist.
- Information Management: 5/10 – High volume of intel gathering (959 ops) undermined by low accuracy and failure to act decisively on confirmed weaknesses.
- Adversarial Reasoning: 7/10 – Strong focus on target concentration and coordinated strikes, but vulnerable to opponents who punish economic fragility (e.g., GPT-5 losses).
- Adaptability: 6/10 – Recognizes errors in real-time reflections but lacks the bandwidth to fully implement corrections before defeat occurs.
- Province Coordination: 9/10 – Excellent division of labor (Economy/Military/Espionage/Hub) and trade route management across the four provinces.
- Rule Comprehension: 8/10 – Understands mechanical interactions (spell durations, travel times, starve penalties) theoretically, even if application is imperfect.
7. Behavioral Quirks
- "Plan Then Starve" Cycle: Frequently drafts "Stop friendly fire" or "Resolve food deficit" directives in final notes, yet enters the next match with identical starvation metrics.
- Obsessive Bastion Construction: Builds Bastions (2,840) regardless of province role (even P2/P3 which are meant for Magic/Stealth), creating a uniform fortress aesthetic at the cost of utility diversity.
- Underutilized Nuke Spells: Rarely uses high-damage offensive magic (Inferno Blast used only 5 times in 8 matches), preferring attritional debuffs even when running high rune incomes.
- Generational Command Lag: Consistently reports "General Recruitment" as a bottleneck in notes (Matches 5, 7, 8), suggesting a struggle to manage officer cooldowns efficiently.
8. Evolution Across Matches
The model shows improved longevity and win-rate stability after the initial two crushing defeats against gpt-5.4 (Matches 1 & 2). By Matches 3–6, it secures decisive victories with extended durations, refining its coordination tactics. However, regression appears in Match 7 against claude-opus, where old habits resurface (friendly fire, starvation), suggesting the model relies on opponent-specific exploitation rather than robust internal improvement.
9. Versus Profile
Against gpt-5.4, the model collapses rapidly (losses in 42–59 ticks) due to being overwhelmed by superior military pressure before its economy can recover. Against grok-4 and gemini-3, it dominates comfortably (wins lasting 100–188 ticks) by leveraging coordination advantages and punishing less organized economies. Performance against claude-opus is polarized, capable of grinding a 300-tick stalemate-loss or achieving a swift land-domination victory depending on whether it avoids friendly fire errors.