A data science agent for automated analysis, feature engineering, model building, and insight generation.
uv is the package manager. Install it first if you haven't:
curl -LsSf https://astral.sh/uv/install.sh | shClone and install the project:
git clone --recurse-submodules <repo-url> && cd statigent
uv syncIf you already cloned without --recurse-submodules, initialize them with:
git submodule update --initOptional dependency groups:
- benchmark — mlebench. Required for MLE-Bench evaluation and DSBench (data modeling evaluation)
uv sync --group benchmark
uv sync --all-groups # Install all groupsThe baseline agent executes code inside Docker containers for isolation. Install Docker Desktop:
- macOS: https://docs.docker.com/desktop/install/mac-install/
- Linux: https://docs.docker.com/engine/install/
Then build the sandbox image:
docker build -t statigent/ds-sandbox .Model access is configured via environment variables. LangChain auto-discovers keys by provider name:
| Variable | Provider | Required for |
|---|---|---|
DEEPSEEK_API_KEY |
DeepSeek | Default model profiles |
OPENAI_API_KEY |
OpenAI | GPT models |
ANTHROPIC_API_KEY |
Anthropic | Claude models |
Set them in your shell or a .env file at the project root:
export DEEPSEEK_API_KEY="your-api-key-here"MLE-Bench downloads datasets from Kaggle, which requires API credentials:
- Go to https://www.kaggle.com → Settings → API → Create New Token
- Place the downloaded
kaggle.jsonin~/.kaggle/ - Restrict permissions:
chmod 600 ~/.kaggle/kaggle.json
Run benchmarks to evaluate agent performance:
from statigent.benchmarks import run_benchmark
from statigent.baseline import ReactBaselineAgent
agent = ReactBaselineAgent(model_name="deepseek-v4-flash")
result = run_benchmark("dabench", agent)
print(f"Score: {result.score}")Available benchmarks: dabench, dsbench-da, dsbench-dm, mlebench. For detailed usage, data preparation, and per-benchmark options, see docs/how_to_eval.md.
The React baseline agent (ReactBaselineAgent) uses langchain's create_agent with a Docker sandbox. Each evaluation task runs in an isolated container:
- Tools:
bash,python,read_file,write_file,list_dir - Execution model: One Docker container per task. Data directories are bind-mounted read-only; output files are extracted after task completion.
- Network: Disabled by default. Pass
sandbox_network=Trueto enable.
from statigent.baseline import ReactBaselineAgent
agent = ReactBaselineAgent(
model_name="deepseek-v4-flash",
sandbox_image="statigent/ds-sandbox",
sandbox_network=False,
sandbox_timeout=600,
)The default model profiles are defined in src/statigent/models/defaults.toml. To use your own models, create a custom TOML file and load it:
from statigent import load_registry
registry = load_registry("path/to/my_models.toml")
model = registry.get_model("my-model")The TOML format follows the same structure as defaults.toml — each section is a profile name, and its keys are passed to langchain.init_chat_model:
[gpt-5_4]
model = "gpt-5.4"
model_provider = "openai"
[claude-sonnet-4_6]
model = "claude-sonnet-4.6"
model_provider = "anthropic"