Compass is a framework for serving Compound AI workflows under accuracy
and latency SLOs by switching between configurations at runtime. Given a
workflow W, a dataset D, an accuracy threshold τ, and a latency SLO
δ on target hardware H, Compass produces an inference serving system
that maintains both SLOs as load changes.
The framework has three components, split across an offline preparation phase and an online serving phase.
1. COMPASS-V — Task Optimization (compass/search/)
Explores the configuration space C and returns the feasible set
F = { c ∈ C : Acc(c) ≥ τ }. Combines Latin-Hypercube bootstrap,
inverse-distance-weighted pseudo-gradients, Wilson-CI early stopping, and
hill-climb / lateral expansion to discover F with far fewer evaluations
than exhaustive grid search.
Inputs: W, D, τ. Output: F.
2. Planner — Deployment Planning (compass/planner/)
Takes the feasible set and prepares it for a specific deployment in three
sub-stages:
- Profiler — measures end-to-end P50 / P95 / P99 latency for every config
in
Fon hardwareH. - Pareto extractor — keeps only configurations that are non-dominated on (accuracy, P95-latency).
- AQM — derives a queue-length-based switching threshold for each
Pareto-optimal config using an analytical queuing model with the SLO
budget
δ - P95.
Inputs: F, H, δ. Output: Pareto front + per-config switching policy.
3. Inference Serving System — Elastico (compass/serving/)
Wraps the Pareto front into a live serving stack: a request queue, a load
monitor, the Elastico controller, and a workflow executor. As load
changes, Elastico moves up the Pareto front (faster configs) when the
queue grows past the active threshold, and down (more accurate configs)
when load slack returns — with asymmetric hysteresis to prevent
oscillation.
Input: Pareto front + thresholds. Output: served requests that
respect both τ and δ.
This repository contains the implementation evaluated in:
M. Gravara, J. L. Herrera, S. Nastic. Compass: Optimizing Compound AI Workflows for Dynamic Adaptation. IEEE/ACM CCGrid 2026, Sydney, Australia.
compass/ Framework — three components mirror the paper:
├── search/ Offline phase 1: COMPASS-V task optimization
├── planner/ Offline phase 2: profiling, Pareto, AQM thresholds
└── serving/ Online phase: Elastico runtime adaptation
workflows/ Workflows being optimized (plug your own here):
├── rag/ RAG pipeline on SQuAD2.0 (LangChain + Ollama)
└── vision/ Cascaded YOLO detection on COCO val2017
experiments/ Thin runner scripts (argparse → compass APIs):
├── run_search.py COMPASS-V or Grid Search on a workflow
├── run_planner.py Profile → Pareto → AQM (--stage profile|pareto|aqm|all)
├── run_serving.py Elastico online experiment
└── run_baseline.py Static-baseline online experiment
plotting/ Generates the paper figures from results/:
├── plot_search.py --figure {convergence|efficiency|all}
└── plot_serving.py --figure {bars|scatter|cdf|timeseries|all}
scripts/
├── setup_ollama.sh Pull the 6 LLMs used by RAG
└── reproduce_paper.sh End-to-end orchestrator
docs/
├── adding_a_workflow.md Plug in a new workflow (Evaluator + ParameterSpace)
└── reproducing_paper.md Exact commands per figure
data/
└── squad_questions.json Tracked: 2.9 MB SQuAD2 questions used by RAG
(the FAISS index and COCO/YOLO weights are NOT
shipped — see "Workflow setup" below).
git clone https://github.com/polaris-slo-cloud/compass.git
cd compass
# Core install (just the COMPASS-V algorithm + planner + plotting):
pip install -e .
# To also run the bundled workflows, install the matching extras:
pip install -e ".[rag]" # adds langchain + faiss + sentence-transformers
pip install -e ".[vision]" # adds ultralytics + torch
pip install -e ".[all]" # bothQuote the extras (".[rag]") — bare [rag] is interpreted as a glob in zsh.
For bit-exact paper reproduction, pin to the lock file:
pip install -r requirements-lock.txtEach workflow needs a one-time setup before its first run.
# 1. Install Ollama from https://ollama.com, then start the daemon.
# 2. Pull the 6 LLMs used in the paper (~30 GB total):
bash scripts/setup_ollama.sh
# 3. Build the FAISS index from the shipped questions file (~5 min, ~10 MB on disk):
python -m workflows.rag.build_index# Download images + annotations (~1 GB) into data/coco/:
python -m workflows.vision.download_cocoYOLOv8 weights (yolov8n/s/m/l/x.pt) are auto-downloaded by the
ultralytics library on first use (~290 MB total) — no manual step needed.
bash scripts/reproduce_paper.sh # 12-15 h on reference hardware# 1. Offline: COMPASS-V search at one SLO threshold τ.
python experiments/run_search.py \
--workflow rag --method compass_v --slo 0.75 --n-samples 100 \
--output results/search/rag/multi_slo/slo_0.75.json
# 2. Offline: profile the feasible set, extract Pareto, derive AQM thresholds.
python experiments/run_planner.py --stage all \
--workflow rag \
--feasible results/search/rag/multi_slo/slo_0.75.json \
--slo-ms 1000 --work-based
# 3. Online: serve under a spike load with Elastico (--warmup preloads
# every Pareto-front model into Ollama before measurement begins).
python experiments/run_serving.py \
--pareto results/planner/pareto_frontier.json \
--slo 1000 --pattern spike --duration 180 --warmup \
--output results/serving/spike_slo1000/elastico.json
# 4. Plot every figure under figures/.
python plotting/plot_search.py --figure all
python plotting/plot_serving.py --figure allSee docs/reproducing_paper.md for the exact commands behind each figure.
Implement two interfaces and pass them to CompassV:
from compass.search import CompassV, Evaluator, ParameterSpace, NormType
class MyEvaluator(Evaluator):
@property
def n_samples(self) -> int: ...
def evaluate_partial(self, config, indices) -> list[float]: ...
space = ParameterSpace(
params={"model": ["A", "B", "C"], "k": [10, 20, 50]},
norm_types={"k": NormType.LOG},
constraints=[lambda c: c["k"] <= 50],
)
feasible = CompassV(space, MyEvaluator(...), slo=0.75).search()Full walkthrough: docs/adding_a_workflow.md.
@inproceedings{gravara2026compass,
title = {Compass: Optimizing Compound AI Workflows for Dynamic Adaptation},
author = {Gravara, Milos and Herrera, Juan Luis and Nastic, Stefan},
booktitle = {IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)},
year = {2026},
address = {Sydney, Australia},
}Milos Gravara — milos.gravara@tuwien.ac.at
Distributed Systems Group, TU Wien.
