Reproducible, containerized labs for end-to-end Machine Learning:
-
spin up a local ML workspace,
-
run a simple training/inference pipeline,
-
and experiment with experiment tracking via MLflow.
Repo layout:
-
docker-feature-pipeline/– feature engineering lab (Dockerfile + Compose). -
docker-ml-pipeline/– training/inference pipeline lab (Dockerfile + Compose). -
docker-mlflow-lab/– MLflow tracking server lab (Compose). -
notebooks/– example notebooks you can bind-mount into the labs. GitHub
-
Quick start
-
Prerequisites
-
What’s inside
-
Usage by lab
-
Feature Pipeline Lab
-
ML Pipeline Lab
-
MLflow Lab
-
-
Volumes & data
-
GPU (optional)
-
Troubleshooting
-
FAQ
-
License
# 1) Clone
git clone https://github.com/Aloagbaye/docker-ml-lab.git
cd docker-ml-lab
# 2) Pick a lab and run it (examples below)
cd docker-ml-pipeline
docker compose up --build
Then open the printed URL(s)—typically something like:
-
app / API: http://localhost:8000 or http://localhost:8080
-
Jupyter: http://localhost:8888
-
MLflow UI: http://localhost:5000
(See each lab’s section for exact endpoints/ports.)
-
Docker Desktop (or Docker Engine) 24+ and Docker Compose v2
-
~6–8 GB RAM free recommended when running multiple services
-
(Optional) NVIDIA GPU drivers +
nvidia-container-toolkitfor GPU compute
Each lab is self-contained with its own Dockerfile and/or docker-compose.yml.
-
Images: built from local Dockerfiles where provided.
-
Ports: published to
localhostfor easy access. -
Dev volumes: local
notebooks/anddata/folders can be mounted for live editing. -
.env: you can add per-lab
.envfiles to override ports and paths.
Folder: docker-feature-pipeline/ GitHub
Goal: Explore feature engineering and data preparation in an isolated container.
Run:
cd docker-feature-pipeline
# If a compose file exists:
docker compose up --build
# Otherwise, build/run directly:
docker build -t feature-pipeline .
docker run --rm -p 8888:8888 -v "$PWD/../notebooks":/workspace/notebooks feature-pipeline
Typical endpoints (example):
- Jupyter: http://localhost:8888 (token shown in container logs)
Tip: Mount your local notebooks/ so edits persist.
Folder: docker-ml-pipeline/ GitHub
Goal: Train and/or serve a simple ML model with a predictable, reproducible stack (e.g., FastAPI + joblib model, or a CLI runner).
Run:
cd docker-ml-pipeline
docker compose up --build
Typical endpoints (examples):
-
API / web app: http://localhost:8000 (or 8080)
-
Docs (if FastAPI): http://localhost:8000/docs
Common dev loop:
-
Edit code or notebook locally.
-
Rebuild:
docker compose build(or use bind mounts for hot-reload). -
Rerun:
docker compose up.
Folder: docker-mlflow-lab/ GitHub
Goal: Bring up an MLflow Tracking Server locally (often alongside a backend store like Postgres and an artifact store like MinIO or a mounted volume).
Run:
cd docker-mlflow-lab
docker compose up -d
Typical endpoints (examples):
-
MLflow UI: http://localhost:5000
-
MinIO Console: http://localhost:9001 (if used)
-
MinIO S3 endpoint: http://localhost:9000 (if used)
-
Postgres:
localhost:5432(internal only, via service name in Compose)
Point your code to MLflow:
export MLFLOW_TRACKING_URI="http://127.0.0.1:5000"
# or use the service name inside the Compose network when calling from another container
If the lab includes MinIO, set AWS-style creds via .env (e.g., MINIO_ROOT_USER, MINIO_ROOT_PASSWORD) and configure MLflow’s artifact store accordingly.
-
./notebooks/: mount into containers at/workspace/notebooks(or similar) for persistence. -
./data/(create as needed): mount to/workspace/datafor datasets and artifacts. -
MLflow artifacts: if MinIO is included, they’re stored in the configured S3 bucket; if not, they may be on a local volume.
Cleaning up:
# Stop and remove containers but keep volumes (fast restart)
docker compose down
# Stop and remove everything including volumes (fresh reset)
docker compose down -v
If you want to use CUDA:
- Install NVIDIA drivers and nvidia-container-toolkit on the host.
Add the following to your service in docker-compose.yml:
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
Run with GPU access, e.g.:
docker compose up --build
- (On some setups you may need
--gpus allin adocker runscenario.)
-
Port already in use
Change the published port in the lab’sdocker-compose.yml(e.g.,5000:5000→5500:5000) and restart. -
Can’t reach the UI
Confirm the container is healthy:docker ps→ checkSTATUS. Thendocker logs <service>for the token/URL. -
Pip install / dependency errors
Rebuild without cache:docker compose build --no-cacheto pick up updatedrequirements.txt. -
MLflow can’t write artifacts
If using MinIO, verify credentials, bucket name, and endpoint URL; ensure the service is up in Compose. -
Windows paths
Use absolute paths or WSL2. For bind mounts indocker-compose.yml, prefer relative paths from the lab folder.
Q: Where do I put my own notebooks?
A: In ./notebooks/. They’ll appear inside the container at the mounted path.
Q: Can I point the ML pipeline lab at the MLflow tracking server?
A: Yes—start docker-mlflow-lab first, then set MLFLOW_TRACKING_URI and any artifact store env vars in the pipeline lab.
Q: Can I run all labs at once?
A: You can, but mind the ports. Either change published ports per lab or start them one at a time.
MIT (or your preferred license—update this section).
Israel Igietsemhe (Aloagbaye)
If you find issues or want enhancements, open an issue or PR.