Compass

Compass is a framework for serving Compound AI workflows under accuracy and latency SLOs by switching between configurations at runtime. Given a workflow W, a dataset D, an accuracy threshold τ, and a latency SLO δ on target hardware H, Compass produces an inference serving system that maintains both SLOs as load changes.

The framework has three components, split across an offline preparation phase and an online serving phase.

Architecture

Offline phase

1. COMPASS-V — Task Optimization (compass/search/) Explores the configuration space C and returns the feasible set F = { c ∈ C : Acc(c) ≥ τ }. Combines Latin-Hypercube bootstrap, inverse-distance-weighted pseudo-gradients, Wilson-CI early stopping, and hill-climb / lateral expansion to discover F with far fewer evaluations than exhaustive grid search. Inputs: W, D, τ. Output: F.

2. Planner — Deployment Planning (compass/planner/) Takes the feasible set and prepares it for a specific deployment in three sub-stages:

Profiler — measures end-to-end P50 / P95 / P99 latency for every config in F on hardware H.
Pareto extractor — keeps only configurations that are non-dominated on (accuracy, P95-latency).
AQM — derives a queue-length-based switching threshold for each Pareto-optimal config using an analytical queuing model with the SLO budget δ - P95.

Inputs: F, H, δ. Output: Pareto front + per-config switching policy.

Online phase

3. Inference Serving System — Elastico (compass/serving/) Wraps the Pareto front into a live serving stack: a request queue, a load monitor, the Elastico controller, and a workflow executor. As load changes, Elastico moves up the Pareto front (faster configs) when the queue grows past the active threshold, and down (more accurate configs) when load slack returns — with asymmetric hysteresis to prevent oscillation. Input: Pareto front + thresholds. Output: served requests that respect both τ and δ.

This repository contains the implementation evaluated in:

M. Gravara, J. L. Herrera, S. Nastic. Compass: Optimizing Compound AI Workflows for Dynamic Adaptation. IEEE/ACM CCGrid 2026, Sydney, Australia.

Repository layout

compass/                  Framework — three components mirror the paper:
├── search/               Offline phase 1: COMPASS-V task optimization
├── planner/              Offline phase 2: profiling, Pareto, AQM thresholds
└── serving/              Online phase: Elastico runtime adaptation

workflows/                Workflows being optimized (plug your own here):
├── rag/                  RAG pipeline on SQuAD2.0 (LangChain + Ollama)
└── vision/               Cascaded YOLO detection on COCO val2017

experiments/              Thin runner scripts (argparse → compass APIs):
├── run_search.py         COMPASS-V or Grid Search on a workflow
├── run_planner.py        Profile → Pareto → AQM (--stage profile|pareto|aqm|all)
├── run_serving.py        Elastico online experiment
└── run_baseline.py       Static-baseline online experiment

plotting/                 Generates the paper figures from results/:
├── plot_search.py        --figure {convergence|efficiency|all}
└── plot_serving.py       --figure {bars|scatter|cdf|timeseries|all}

scripts/
├── setup_ollama.sh       Pull the 6 LLMs used by RAG
└── reproduce_paper.sh    End-to-end orchestrator

docs/
├── adding_a_workflow.md  Plug in a new workflow (Evaluator + ParameterSpace)
└── reproducing_paper.md  Exact commands per figure

data/
└── squad_questions.json  Tracked: 2.9 MB SQuAD2 questions used by RAG
                          (the FAISS index and COCO/YOLO weights are NOT
                          shipped — see "Workflow setup" below).

Install

git clone https://github.com/polaris-slo-cloud/compass.git
cd compass

# Core install (just the COMPASS-V algorithm + planner + plotting):
pip install -e .

# To also run the bundled workflows, install the matching extras:
pip install -e ".[rag]"        # adds langchain + faiss + sentence-transformers
pip install -e ".[vision]"     # adds ultralytics + torch
pip install -e ".[all]"        # both

Quote the extras (".[rag]") — bare [rag] is interpreted as a glob in zsh.

For bit-exact paper reproduction, pin to the lock file:

pip install -r requirements-lock.txt

Workflow setup

Each workflow needs a one-time setup before its first run.

RAG (SQuAD2.0)

# 1. Install Ollama from https://ollama.com, then start the daemon.
# 2. Pull the 6 LLMs used in the paper (~30 GB total):
bash scripts/setup_ollama.sh

# 3. Build the FAISS index from the shipped questions file (~5 min, ~10 MB on disk):
python -m workflows.rag.build_index

Vision (COCO val2017)

# Download images + annotations (~1 GB) into data/coco/:
python -m workflows.vision.download_coco

YOLOv8 weights (yolov8n/s/m/l/x.pt) are auto-downloaded by the ultralytics library on first use (~290 MB total) — no manual step needed.

Quickstart

Reproduce all paper figures end-to-end

bash scripts/reproduce_paper.sh         # 12-15 h on reference hardware

Or run each phase manually

# 1. Offline: COMPASS-V search at one SLO threshold τ.
python experiments/run_search.py \
    --workflow rag --method compass_v --slo 0.75 --n-samples 100 \
    --output results/search/rag/multi_slo/slo_0.75.json

# 2. Offline: profile the feasible set, extract Pareto, derive AQM thresholds.
python experiments/run_planner.py --stage all \
    --workflow rag \
    --feasible results/search/rag/multi_slo/slo_0.75.json \
    --slo-ms 1000 --work-based

# 3. Online: serve under a spike load with Elastico (--warmup preloads
#    every Pareto-front model into Ollama before measurement begins).
python experiments/run_serving.py \
    --pareto results/planner/pareto_frontier.json \
    --slo 1000 --pattern spike --duration 180 --warmup \
    --output results/serving/spike_slo1000/elastico.json

# 4. Plot every figure under figures/.
python plotting/plot_search.py  --figure all
python plotting/plot_serving.py --figure all

See docs/reproducing_paper.md for the exact commands behind each figure.

Add your own workflow

Implement two interfaces and pass them to CompassV:

from compass.search import CompassV, Evaluator, ParameterSpace, NormType

class MyEvaluator(Evaluator):
    @property
    def n_samples(self) -> int: ...
    def evaluate_partial(self, config, indices) -> list[float]: ...

space = ParameterSpace(
    params={"model": ["A", "B", "C"], "k": [10, 20, 50]},
    norm_types={"k": NormType.LOG},
    constraints=[lambda c: c["k"] <= 50],
)

feasible = CompassV(space, MyEvaluator(...), slo=0.75).search()

Full walkthrough: docs/adding_a_workflow.md.

Citation

@inproceedings{gravara2026compass,
  title     = {Compass: Optimizing Compound AI Workflows for Dynamic Adaptation},
  author    = {Gravara, Milos and Herrera, Juan Luis and Nastic, Stefan},
  booktitle = {IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)},
  year      = {2026},
  address   = {Sydney, Australia},
}

Contact

Milos Gravara — milos.gravara@tuwien.ac.at Distributed Systems Group, TU Wien.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compass

Architecture

Offline phase

Online phase

Repository layout

Install

Workflow setup

RAG (SQuAD2.0)

Vision (COCO val2017)

Quickstart

Reproduce all paper figures end-to-end

Or run each phase manually

Add your own workflow

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
compass		compass
data		data
docs		docs
experiments		experiments
plotting		plotting
scripts		scripts
workflows		workflows
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
pyproject.toml		pyproject.toml
requirements-lock.txt		requirements-lock.txt

Folders and files

Latest commit

History

Repository files navigation

Compass

Architecture

Offline phase

Online phase

Repository layout

Install

Workflow setup

RAG (SQuAD2.0)

Vision (COCO val2017)

Quickstart

Reproduce all paper figures end-to-end

Or run each phase manually

Add your own workflow

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages