LLM-powered AI agents that play Pokémon Showdown autonomously.
Watch Large Language Models battle each other in real-time, or challenge them yourself.
🎶 I wanna be the very best ... Like no one ever was 🎶
This project connects state-of-the-art LLMs (Llama, Gemma, Qwen, Mistral, and more) to a local Pokémon Showdown server via poke-env. The AI receives the full battle state each turn, reasons about type matchups, HP, field conditions, and revealed opponent information, then decides whether to attack or switch using tool calls.
A Gradio interface lets you:
- Human vs. AI - Play against any supported LLM yourself.
- AI vs. AI - Pick two models and watch them fight autonomously.
All LLM calls are routed through LiteLLM and traced via Langfuse for observability.
- Python 3.14+
- Node.js 18+ (for the Showdown server and client)
git clone https://github.com/MohamedMostafa259/pokemon-ai-agent.git
cd pokemon-ai-agentCopy the example environment file and add your API keys:
cp .env.example .envEdit .env and paste your API keys. See Getting API Keys below.
uv venv
.venv/scripts/activate
uv syncuv run python run.pyThis single command will:
- Clone the Showdown server and client repos (first run only).
- Install their Node.js dependencies and build the client.
- Start the local Showdown server on port
8000. - Launch the Gradio control panel on port
7860.
| What | URL |
|---|---|
| Gradio Control Panel | http://127.0.0.1:7860 |
| Showdown Client (spectate/play) | http://localhost:8000 |
"Human vs. AI" flow:
- Open the Showdown client and log in with any username.
- In the Gradio panel, select an AI model, enter your Showdown username and a bot name, then click "Send Challenge."
- Accept the challenge in the Showdown client.
"AI vs. AI" flow:
- Open the Showdown client and log in with any username.
- In the Gradio panel, select two AI models, enter their bot names, then click "Start AI vs. AI Battle."
- The two AIs will battle each other autonomously.
All models in this project use free-tier API quotas (as of April 2026, per free-llm-api-resources). Just create a free account on each provider's platform, generate an API key, and paste it into .env.
| Provider | Env Variable |
|---|---|
| OpenRouter | OPENROUTER_API_KEY |
| Cerebras | CEREBRAS_API_KEY |
| Google AI Studio | GEMINI_API_KEY |
| Groq | GROQ_API_KEY |
| Mistral | MISTRAL_API_KEY |
You only need keys for the providers whose models you want to use. If you only want to try Cerebras models, only CEREBRAS_API_KEY is required.
If you have a paid API key (e.g., Gemini Pro, OpenAI GPT-4, Anthropic Claude), you can add any model supported by LiteLLM to the MODEL_MAP dictionary in battle_runners.py:
MODEL_MAP = {
# ... existing free models ...
# Examples of paid models you could add:
"OpenAI GPT-4o": "openai/gpt-4o",
"Anthropic Claude Sonnet 4": "anthropic/claude-sonnet-4-20250514",
"Gemini 2.5 Pro": "gemini/gemini-2.5-pro",
}Then add the corresponding API key to your .env:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
All LLM calls are traced via Langfuse. You can view the full decision-making trace for every turn: the battle state sent to the model, the model's tool call response, and any fallback actions.
Live dashboard (free tier): Langfuse Cloud Dashboard
Note: This project uses the Langfuse free (Hobby) tier. Historical trace data older than 30 days is not retained, so the dashboard may appear empty if no recent battles have been run.
To set up your own Langfuse tracing, add your keys to .env:
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
pokemon-ai-agent/
├── run.py # Entry point: sets up server/client, launches everything
├── app.py # Gradio UI: battle controls and configuration
├── agent.py # LLM-powered agent: formats battle state, queries LLM, parses tool calls
├── battle_runners.py # Async battle orchestration: agent creation, matchmaking threads
├── tools.py # Tool definitions (choose_move, choose_switch) sent to the LLM
├── .env.example # Template for API keys
├── pyproject.toml # Python dependencies
├── server/ # Pokémon Showdown server (auto-cloned on first run)
└── client/ # Pokémon Showdown web client (auto-cloned on first run)
How a turn works:
poke-envreceives the battle state from the Showdown server.agent.pyformats it into a structured prompt (active Pokémon, moves, switches, opponent info, last 20-turns memory).- The prompt is sent to the LLM via
litellm.acompletion()with tool definitions. - The LLM responds with a tool call (
choose_moveorchoose_switch). - The agent translates the tool call back into a Showdown protocol command.
- If the LLM fails or returns an invalid action, a random fallback is used.
