This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
autobox-engine-ts is the TypeScript implementation of the Autobox simulation runtime engine. This is a migration from the Python/Thespian actor-based implementation to a simpler TypeScript architecture using BullMQ for message passing and async orchestration.
Key Architecture Decision: This engine eliminates the actor pattern in favor of basic async orchestration with Docker providing isolation. The entire stack is unified on TypeScript.
| Feature | Python (Thespian) | TypeScript (BullMQ) |
|---|---|---|
| Concurrency Model | Actor-based (Thespian) | Message queues (BullMQ) |
| Process Isolation | Actor system | Redis-backed queues |
| Evaluator Agent | ❌ No | ✅ Yes |
| Message Passing | Actor mailboxes | BullMQ job queues |
| Agent Lifecycle | Actor spawn/stop | Worker create/shutdown |
yarn dev # Start dev server with hot reload (uses examples/ configs)
yarn build # Compile TypeScript to dist/
yarn start # Run compiled production build
yarn start:cli # Run with CLI interfaceyarn lint # Run ESLint
yarn lint:fix # Auto-fix ESLint issues
yarn format # Format code with Prettier
yarn test # Run all tests with Jest
yarn test:unit # Run unit tests only
yarn test:integration # Run integration tests only
yarn test:watch # Run tests in watch mode
yarn test:coverage # Run tests with coverage report
yarn test:ci # Run tests in CI modeyarn docker:build # Build production image
yarn docker:build:dev # Build development image
yarn docker:run # Run production container
yarn docker:run:dev # Run dev container with hot-reload
yarn docker:run:exit # Run and exit on completion
yarn docker:clean # Clean up Docker images
yarn docker:clean:all # Remove all images including dev# Development mode with example configs
yarn dev
# With specific simulation (looks in examples/simulations/)
yarn dev --simulation-name=gift_choice
yarn dev --simulation-name=crime_detective
yarn dev --simulation-name=nordic_team
# Production mode
yarn build
node dist/index.js --config=/path/to/config --simulation-name=summer_vacation
# Daemon mode (keeps server alive after simulation)
node dist/index.js --daemon --simulation-name=summer_vacationRequired environment variables (see .env.example):
OPENAI_API_KEY=your-key-here # Required for LLM processing
REDIS_HOST=localhost # Default: localhost
REDIS_PORT=6379 # Default: 6379
PORT=3000 # API server port (default: 3000)
NODE_ENV=development # development | production
LOG_LEVEL=info # info | debug | error
JWT_SECRET=your-secret # For API authenticationThe engine uses BullMQ queues for inter-agent communication instead of actors. Each agent is a BullMQ worker processing messages from its dedicated queue.
┌─────────────────────────────────────────────────────────┐
│ Express API Server │
│ (Status, Metrics, Instructions, Info) │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Simulation Runtime │
│ │
│ ┌──────────────┐ ┌─────────────────────────────┐ │
│ │ Orchestrator │───→│ MessageBroker (Redis) │ │
│ └──────────────┘ │ │ │
│ ↓ │ ┌────────────────────────┐ │ │
│ ┌──────────────┐ │ │ Agent Queues (BullMQ)│ │ │
│ │ Planner │←──→│ │ - orchestrator-queue │ │ │
│ └──────────────┘ │ │ - planner-queue │ │ │
│ ↓ │ │ - evaluator-queue │ │ │
│ ┌──────────────┐ │ │ - reporter-queue │ │ │
│ │ Evaluator │←──→│ │ - worker-N-queue │ │ │
│ └──────────────┘ │ └────────────────────────┘ │ │
│ ↓ └─────────────────────────────┘ │
│ ┌──────────────┐ │
│ │ Reporter │←──→ (All agents use MessageBroker) │
│ └──────────────┘ │
│ ↓ │
│ ┌──────────────┐ │
│ │ Worker Agents│ (ANA, JOHN, DETECTIVE, etc.) │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
↓
┌──────────────┐
│ OpenAI API │
│ (gpt-4o-mini,│
│ o4-mini) │
└──────────────┘
-
MessageBroker (
src/messaging/messageBroker.ts)- Manages BullMQ queues for all agents
- Provides
send()for message passing with 200ms delay (simple throttling) - Tracks queue processing state via
isQueueProcessing() - Handles graceful shutdown of all queues
-
Base Agent System (
src/core/agents/createBaseAgent.ts)- Each agent is a BullMQ worker listening to its queue
- Generic handler pattern for message processing
- Graceful shutdown with Promise.all coordination
- Automatic error handling and job failure tracking
-
Agent Types:
- Orchestrator: Coordinates simulation flow, decides next agent to speak
- Planner: Creates conversation plan and manages turn order
- Evaluator: Evaluates simulation progress against metrics and success criteria
- Reporter: Generates final simulation summary
- Workers: Simulation participants (e.g., ANA, JOHN in vacation; DETECTIVE, SINGER in crime)
-
AI Processing (
src/core/llm/createAiProcessor.ts)- Wraps OpenAI API with structured output via Zod schemas
- System prompts are versioned in
src/core/llm/prompts/{agent}/v0.0.1/ - Supports both structured (with schema) and unstructured responses
-
Memory System (
src/core/memory/createMemory.ts)- In-memory conversation history per agent pair
- Tracks message flow for context
- Converts internal messages to history format for LLM context
-
Simulation Registry (
src/core/simulation/registry.ts)- Singleton tracking current simulation state
- Stores agent IDs, status, progress, errors, metrics
- Used by API handlers to provide status updates
// 1. Load configuration (simulation + metrics + server)
const config = loadConfig({ simulationName, configPath });
// 2. Create simulation with agents and message broker
const simulation = await createSimulation(config, onCompletion);
// Creates: orchestrator, planner, evaluator, reporter, workers[]
// Each gets UUID and dedicated BullMQ queue
// 3. Run simulation with timeout race
await Promise.race([
orchestratorCompletionPromise, // Resolves when orchestrator signals completion
timeoutPromise // Rejects after config.timeout_seconds
]);
// 4. Graceful shutdown
await Promise.all([
orchestrator.shutdown(),
planner.shutdown(),
reporter.shutdown(),
...workers.map(w => w.shutdown())
]);
await messageBroker.close();
// NOTE: Evaluator shutdown is not yet implemented in the actual code
// 5. Clean up (unless daemon mode)
if (!daemon) {
simulationRegistry.unregister();
} else {
// Keep simulation data in registry for API access
}Simulations require three JSON configs (see examples/ directory):
-
Simulation Config (
examples/simulations/{name}.json):{ "name": "Summer vacation", "max_steps": 150, "timeout_seconds": 600, "shutdown_grace_period_seconds": 5, "task": "...", "description": "...", "orchestrator": { "name": "ORCHESTRATOR", "mailbox": { "max_size": 400 }, "llm": { "model": "gpt-5-nano" } }, "planner": { "name": "PLANNER", "mailbox": { "max_size": 400 }, "llm": { "model": "gpt-5-nano" } }, "evaluator": { "name": "EVALUATOR", "mailbox": { "max_size": 400 }, "llm": { "model": "gpt-4o-mini" } }, "reporter": { "name": "REPORTER", "mailbox": { "max_size": 400 }, "llm": { "model": "gpt-5-nano" } }, "workers": [ { "name": "ANA", "description": "this is ana agent", "instruction": "optional instruction", "context": "Role and backstory...", "mailbox": { "max_size": 100 }, "llm": { "model": "gpt-4o-mini" } } ] }Key Fields:
shutdown_grace_period_seconds: Time to wait for graceful shutdown (default: 5)mailbox: Legacy field present in example configs but NOT validated by schema (may be ignored)workers[].instruction: Optional instruction field for worker agentsworkers[].description: Optional human-readable description of worker's roleworkers[].context: Role and backstory for the worker agentevaluator: New agent for metrics evaluationlogging: Logging configuration (verbose, log_path, log_file)
-
Metrics Config (
examples/metrics/{name}.json): Defines success criteria and measurement -
Server Config (
examples/server/server.json): API server settings- Port configuration
- Logging settings
exit_on_completionflag
Located in examples/simulations/:
summer_vacation.json: Couple deciding vacation destination (ANA + JOHN)gift_choice.json: Gift selection scenariocrime_detective.json: Murder mystery with detective and suspects (DETECTIVE, SINGER, DRIVER, BANKER)nordic_team.json: Software project planning with Nordic team
All communication uses the Message discriminated union:
type Message =
| TextMessage // Content-bearing agent messages
| SignalMessage // Control signals (START, STOP, ABORT, STATUS)
| InstructionMessage // External instructions to agents (with priority field)
// InstructionMessage includes priority: 'override' | 'supplement' (default: 'supplement')
// Message routing via MessageBroker
messageBroker.send({
message: { type: 'text', content: '...', fromAgentId, toAgentId },
toAgentId: targetAgentId,
jobName: 'agent-message'
});Shared data structure passed to evaluator, planner, and reporter:
type WorkersInfo = Array<{
name: string;
description: string;
instruction?: string;
context: string;
}>;This provides agents with metadata about all workers participating in the simulation.
Located in src/api/routes/index.ts:
Health & Info:
GET /health- Health checkGET /ping- Simple ping
Simulation Control:
GET /v1/status- Simulation status from registryGET /v1/metrics- Simulation metricsGET /v1/info- Agent information (names mapped by ID)POST /v1/instructions/agents/:agent_id- Send instruction to specific agentPOST /v1/abort- Abort running simulation
Documentation:
GET /- Swagger API spec (JSON)GET /docs- Swagger UI
Prompt Versioning: All LLM prompts live in src/core/llm/prompts/{agent}/v{version}/ with:
prompt.ts- System prompt templateschema.ts- Zod schema for structured output (if applicable)params.ts- Parameter types for prompt generationindex.ts- Exports
src/core/llm/prompts/orchestrator/v0.0.1/prompt.ts) currently returns 'TODO' and is not fully implemented. It also lacks a schema.ts file.
Structured Responses: Agents that need structured output (planner, evaluator) use Zod schemas with zodResponseFormat() for guaranteed JSON structure. Orchestrator does not currently have a schema.
Error Handling: Workers automatically catch errors and log via Winston. Failed jobs are tracked by BullMQ.
Message Throttling: The 200ms delay in messageBroker.send() is a simple setTimeout() to prevent queue flooding, not sophisticated rate limiting.
Shutdown Strategy:
- Signal stop to all agents
- Wait for all agents to complete via
Promise.all() - Disconnect Redis connections
- Unregister simulation (unless daemon mode)
- Graceful period defined by
shutdown_grace_period_secondsin config
Current Status: ✅ Tests are fully implemented using Jest with TypeScript support via ts-jest.
Test Organization:
tests/
├── setup.ts # Global test configuration
├── fixtures/ # Reusable test data factories
│ ├── messages.ts # Message factory functions
│ └── configs.ts # Configuration factory functions
├── unit/ # Unit tests (isolated components)
│ ├── schemas/ # Schema validation tests
│ ├── core/ # Core business logic (memory, registry)
│ ├── transformations/ # Data transformation tests
│ ├── config/ # Configuration loader tests
│ └── utils/ # Utility function tests
└── integration/ # Integration tests (multiple components)
├── api/ # API endpoint tests
└── schemas/ # End-to-end schema validation
What's Tested:
- ✅ Message schemas and type guards (
tests/unit/schemas/message.test.ts) - ✅ Memory system (
tests/unit/core/memory.test.ts) - ✅ Simulation registry (
tests/unit/core/simulationRegistry.test.ts) - ✅ Configuration loader (
tests/unit/config/loader.test.ts) - ✅ Transformations (
tests/unit/transformations/memoryToHistory.test.ts) - ✅ Zod utilities (
tests/unit/utils/zodParse.test.ts) - ✅ API handlers (
tests/integration/api/handlers.test.ts) - ✅ Schema validation (
tests/integration/schemas/validation.test.ts)
Coverage Thresholds (configured in jest.config.js):
- Branches: 30%
- Functions: 20%
- Lines: 50%
- Statements: 55%
What's NOT Tested (intentionally excluded for now):
- ❌ BullMQ integration (requires Redis infrastructure)
- ❌ OpenAI API calls (external dependency, expensive)
- ❌ Agent handlers (complex orchestration, future mocks needed)
- ❌ Full simulation lifecycle (future E2E test suite)
See Also: tests/README.md for comprehensive testing documentation
Adding New Agent Types:
- Create handler in
src/core/agents/handlers/create{Agent}Handler.ts - Create agent factory in
src/core/agents/create{Agent}.ts - Add prompt in
src/core/llm/prompts/{agent}/v0.0.1/ - Update
createSimulation()to instantiate the new agent - Add to simulation config schema if needed
Adding New Message Types:
- Extend
MessageSchemadiscriminated union insrc/schemas/internal/message.ts - Add type guard function (e.g.,
isNewTypeMessage()) - Update agent handlers to process new type
Configuration Changes:
- Simulation config: Update
SimulationConfigSchemainsrc/schemas/internal/simulationConfig.ts - Always maintain backward compatibility or version configs
- Redis Required: Engine depends on Redis for BullMQ queues
- Docker Support: Full Docker support available with production and dev images (see DOCKER.md)
- Production: Multi-stage build with optimized image size
- Development: Hot-reload support with source mounting
- Scripts:
yarn docker:build,yarn docker:run,yarn docker:clean
- OpenAI API: All agents use OpenAI's chat completions (configurable model per agent)
- Common models:
gpt-4o-mini,gpt-5-nano,o4-mini
- Common models:
- No Concurrent Simulations: Current implementation supports one simulation at a time
- Daemon Mode: Use
--daemonflag to keep server alive for API access after simulation completes - Message Throttling: 200ms delay between messages via simple
setTimeout()to prevent queue flooding - Port Configuration: Default port is 3000, configurable via
PORTenvironment variable - Simulation Registry: Singleton pattern means only one active simulation context at a time
⚠️ Evaluator Shutdown Missing: The evaluator agent is created but itsshutdown()method is not called inrunSimulation()cleanup (seesrc/core/simulation/runSimulation.ts:30-34). This may cause graceful shutdown to be incomplete.⚠️ Orchestrator Prompt Placeholder: The orchestrator prompt atsrc/core/llm/prompts/orchestrator/v0.0.1/prompt.tsonly returns'TODO'and needs implementation.- ℹ️ Mailbox Config Ignored: The
mailboxfield in simulation configs is present in examples but not validated bySimulationConfigSchema, suggesting it may be a legacy field that's no longer used.