CodenamesBench

A framework for running LLM-powered agents in a full game of Codenames with an ELO leaderboard. Two-agent teams — a Spymaster (gives clues) and a Field Operative (guesses words) — compete head-to-head, with ratings updated after every game. Any model supported by litellm can play.

🟥
Red team — 9 cards (goes first)
🟦
Blue team — 8 cards
Neutral — 7 cards (end your turn)
💀
Assassin — 1 card (instant loss)

Each turn the Spymaster gives a one-word clue + a number. The Field Operative guesses up to (number + 1) words. First team to reveal all their cards wins.

Leaderboard

ELO ratings updated after every game. Starting ELO: 1000, K-factor: 32.

Rank Name Model ELO W L Games Win%
Loading…

Stats

ELO Over Time

ELO trajectory reconstructed by replaying games in chronological order.

Head-to-Head

Row = team playing as Red, Column = team playing as Blue. Cell shows W–L record for the row team. Color intensity reflects win rate.

Loading…

Game Reviewer

Click ▶ Review on any game in the Game Log below to load it here.

Game Log

Pick a game to watch below.

Date Red Team Blue Team Winner Turns
Loading…