Pokémon AI Agents

LLM-powered AI agents that play Pokémon Showdown autonomously.

Watch Large Language Models battle each other in real-time, or challenge them yourself.

🎶 I wanna be the very best ... Like no one ever was 🎶

▶️ Watch the full video demo with audio on YouTube

What is this?

This project connects state-of-the-art LLMs (Llama, Gemma, Qwen, Mistral, and more) to a local Pokémon Showdown server via poke-env. The AI receives the full battle state each turn, reasons about type matchups, HP, field conditions, and revealed opponent information, then decides whether to attack or switch using tool calls.

A Gradio interface lets you:

Human vs. AI - Play against any supported LLM yourself.
AI vs. AI - Pick two models and watch them fight autonomously.

All LLM calls are routed through LiteLLM and traced via Langfuse for observability.

Quickstart

Prerequisites

Python 3.14+
Node.js 18+ (for the Showdown server and client)

1. Clone and configure

git clone https://github.com/MohamedMostafa259/pokemon-ai-agent.git
cd pokemon-ai-agent

Copy the example environment file and add your API keys:

cp .env.example .env

Edit .env and paste your API keys. See Getting API Keys below.

2. Create venv and install dependencies

uv venv
.venv/scripts/activate
uv sync

3. Run

uv run python run.py

This single command will:

Clone the Showdown server and client repos (first run only).
Install their Node.js dependencies and build the client.
Start the local Showdown server on port 8000.
Launch the Gradio control panel on port 7860.

4. Play

What	URL
Gradio Control Panel	http://127.0.0.1:7860
Showdown Client (spectate/play)	http://localhost:8000

"Human vs. AI" flow:

Open the Showdown client and log in with any username.
In the Gradio panel, select an AI model, enter your Showdown username and a bot name, then click "Send Challenge."
Accept the challenge in the Showdown client.

"AI vs. AI" flow:

Open the Showdown client and log in with any username.
In the Gradio panel, select two AI models, enter their bot names, then click "Start AI vs. AI Battle."
The two AIs will battle each other autonomously.

Getting API Keys

All models in this project use free-tier API quotas (as of April 2026, per free-llm-api-resources). Just create a free account on each provider's platform, generate an API key, and paste it into .env.

Provider	Env Variable
OpenRouter	`OPENROUTER_API_KEY`
Cerebras	`CEREBRAS_API_KEY`
Google AI Studio	`GEMINI_API_KEY`
Groq	`GROQ_API_KEY`
Mistral	`MISTRAL_API_KEY`

You only need keys for the providers whose models you want to use. If you only want to try Cerebras models, only CEREBRAS_API_KEY is required.

Using Paid Tiers for More Powerful Models

If you have a paid API key (e.g., Gemini Pro, OpenAI GPT-4, Anthropic Claude), you can add any model supported by LiteLLM to the MODEL_MAP dictionary in battle_runners.py:

MODEL_MAP = {
    # ... existing free models ...

    # Examples of paid models you could add:
    "OpenAI GPT-4o": "openai/gpt-4o",
    "Anthropic Claude Sonnet 4": "anthropic/claude-sonnet-4-20250514",
    "Gemini 2.5 Pro": "gemini/gemini-2.5-pro",
}

Then add the corresponding API key to your .env:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Observability with Langfuse

All LLM calls are traced via Langfuse. You can view the full decision-making trace for every turn: the battle state sent to the model, the model's tool call response, and any fallback actions.

Live dashboard (free tier): Langfuse Cloud Dashboard

Note: This project uses the Langfuse free (Hobby) tier. Historical trace data older than 30 days is not retained, so the dashboard may appear empty if no recent battles have been run.

To set up your own Langfuse tracing, add your keys to .env:

LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

Architecture

pokemon-ai-agent/
├── run.py              # Entry point: sets up server/client, launches everything
├── app.py              # Gradio UI: battle controls and configuration
├── agent.py            # LLM-powered agent: formats battle state, queries LLM, parses tool calls
├── battle_runners.py   # Async battle orchestration: agent creation, matchmaking threads
├── tools.py            # Tool definitions (choose_move, choose_switch) sent to the LLM
├── .env.example        # Template for API keys
├── pyproject.toml      # Python dependencies
├── server/             # Pokémon Showdown server (auto-cloned on first run)
└── client/             # Pokémon Showdown web client (auto-cloned on first run)

How a turn works:

poke-env receives the battle state from the Showdown server.
agent.py formats it into a structured prompt (active Pokémon, moves, switches, opponent info, last 20-turns memory).
The prompt is sent to the LLM via litellm.acompletion() with tool definitions.
The LLM responds with a tool call (choose_move or choose_switch).
The agent translates the tool call back into a Showdown protocol command.
If the LLM fails or returns an invalid action, a random fallback is used.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
agent.py		agent.py
app.py		app.py
battle_runners.py		battle_runners.py
pokemon-ai-agent-demo.gif		pokemon-ai-agent-demo.gif
pyproject.toml		pyproject.toml
run.py		run.py
style.css		style.css
tools.py		tools.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pokémon AI Agents

What is this?

Quickstart

Prerequisites

1. Clone and configure

2. Create venv and install dependencies

3. Run

4. Play

Getting API Keys

Using Paid Tiers for More Powerful Models

Observability with Langfuse

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pokémon AI Agents

What is this?

Quickstart

Prerequisites

1. Clone and configure

2. Create venv and install dependencies

3. Run

4. Play

Getting API Keys

Using Paid Tiers for More Powerful Models

Observability with Langfuse

Architecture

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages