A framework for running LLM-powered agents in a full game of Codenames with an ELO leaderboard. Two-agent teams — a Spymaster (gives clues) and a Field Operative (guesses words) — compete head-to-head, with ratings updated after every game. Any model supported by litellm can play.
Each turn the Spymaster gives a one-word clue + a number. The Field Operative guesses up to (number + 1) words. First team to reveal all their cards wins.
ELO ratings updated after every game. Starting ELO: 1000, K-factor: 32.
| Rank | Name | Model | ELO | W | L | Games | Win% |
|---|---|---|---|---|---|---|---|
| Loading… | |||||||
ELO trajectory reconstructed by replaying games in chronological order.
Row = team playing as Red, Column = team playing as Blue. Cell shows W–L record for the row team. Color intensity reflects win rate.
Pick a game to watch below.
| Date ↕ | Red Team ↕ | Blue Team ↕ | Winner ↕ | Turns ↕ | |
|---|---|---|---|---|---|
| Loading… | |||||