This project is a complete end-to-end MLOps pipeline for an Emotion Detection system built using FastAPI, Scikit-learn, and NLTK. It leverages modern DevOps and MLOps practices to ensure reproducibility, scalability, and automation. The application detects emotions in text and provides an API endpoint for integration.
The system is designed with CI/CD, model registry, automated training, experiment tracking, and deployment pipelines, making it production-ready.
Hereβs the FastAPI interface of the Emotion Detection system:
- Machine Learning: Scikit-learn, NLTK
- API Framework: FastAPI (Cookiecutter template)
- Experiment Tracking & Model Registry: MLflow with DagsHub integration
- Data Versioning: DVC (backed by AWS S3)
- CI/CD: GitHub Actions
- Deployment: Docker + AWS EC2
- Container Registry: DockerHub
- Secrets Management: GitHub Secrets
- -> Data Versioning with DVC β Stores datasets and models in AWS S3 for reproducibility.
- -> Experiment Tracking β All experiments tracked with MLflow + DagsHub UI.
- -> Model Registry β Automatic promotion of the best model to production.
- -> Hyperparameter Tuning β Automated tuning logged in MLflow.
- -> Dockerized Application β Ensures consistent deployment across environments.
- -> CI/CD Pipeline β GitHub Actions pipeline automates testing, training, building, and deployment.
- -> Cloud Deployment β Hosted on AWS EC2, pulling latest Docker images from DockerHub.
-
Data & Model Management
- Datasets stored and versioned with DVC in AWS S3.
- Preprocessing and feature engineering pipelines tracked.
-
Experimentation
- MLflow logs training runs (accuracy, precision, recall, etc.).
- Hyperparameter tuning experiments stored in MLflow.
-
Model Registry
- Best-performing models are automatically promoted using a promotion script.
- Registry maintained in MLflow (via DagsHub).
-
CI/CD Pipeline (GitHub Actions)
- Runs unit tests for ML models and FastAPI endpoints.
- Executes DVC repro to rebuild pipelines if data/code changes.
- Builds Docker image and pushes to DockerHub.
- Deploys containerized app to AWS EC2.
-
Deployment
- Application runs on FastAPI with Uvicorn.
- Served via Docker container on AWS EC2 (Ubuntu).
- Stores large files (datasets, models) in AWS S3.
- Ensures reproducibility across environments.
- Tracks experiments, metrics, and artifacts.
- Centralized model registry for production models.
- Automated tuning with MLflow logging.
- Compares experiments for performance improvements.
- Tests β Build β Push β Deploy fully automated with GitHub Actions.
- Uses appleboy/ssh-action for secure AWS deployment.
- Dockerized FastAPI app.
- Auto-pulled from DockerHub into AWS EC2 instance.
- Runs on port
80
mapped to FastAPI8000
.
- Push code β GitHub Actions triggers.
- Run tests + DVC pipeline.
- Log experiments to MLflow.
- Build and push Docker image β DockerHub.
- SSH into EC2 β Pull image & restart container.
- Application live at
http://<EC2-IP>
.
βββ LICENSE
βββ Makefile <- Makefile with commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ data
βΒ Β βββ external <- Data from third party sources.
βΒ Β βββ interim <- Intermediate data that has been transformed.
βΒ Β βββ processed <- The final, canonical data sets for modeling.
βΒ Β βββ raw <- The original, immutable data dump.
β
βββ docs <- A default Sphinx project; see sphinx-doc.org for details
β
βββ models <- Trained and serialized models, model predictions, or model summaries
β
βββ notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
β the creator's initials, and a short `-` delimited description, e.g.
β `1.0-jqp-initial-data-exploration`.
β
βββ references <- Data dictionaries, manuals, and all other explanatory materials.
β
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
βΒ Β βββ figures <- Generated graphics and figures to be used in reporting
β
βββ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
β generated with `pip freeze > requirements.txt`
β
βββ setup.py <- makes project pip installable (pip install -e .) so src can be imported
βββ src <- Source code for use in this project.
βΒ Β βββ __init__.py <- Makes src a Python module
β β
βΒ Β βββ data <- Scripts to download or generate data
βΒ Β βΒ Β βββ make_dataset.py
β β
βΒ Β βββ features <- Scripts to turn raw data into features for modeling
βΒ Β βΒ Β βββ build_features.py
β β
βΒ Β βββ models <- Scripts to train models and then use trained models to make
β β β predictions
βΒ Β βΒ Β βββ predict_model.py
βΒ Β βΒ Β βββ train_model.py
β β
βΒ Β βββ visualization <- Scripts to create exploratory and results oriented visualizations
βΒ Β βββ visualize.py
β
βββ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
- Input Format Checks β Verify that the raw text data is non-empty, correctly encoded (UTF-8), and free of invalid symbols.
- Class Distribution Tests β Ensure that training/validation splits have balanced representation of emotion classes (e.g., joy, anger, sadness, fear, surprise, neutral).
- Text Cleaning Functions β Unit tests for stopword removal, lemmatization, and tokenization functions using NLTK.
- Prediction Shape & Type β Check if model outputs valid emotion labels with probability scores.
- Confidence Thresholding β Ensure probabilities sum to 1 (softmax check) and exceed a minimum confidence level.
- Overfitting Check β Validate that training accuracy vs. validation accuracy difference stays within tolerance.
- Hyperparameter Sensitivity Tests β Re-run the model with different parameter sets (tracked in MLflow) to verify stability of results.
Unit API Tests
- Verify
/predict
returns200 OK
. - Validate correct schema in response:
{
"text": "I am very happy today!",
"prediction": "joy",
"confidence": 0.94
}
Run tests locally:
```bash
pytest tests/
Hereβs the DAG representation of the pipeline:
Build and run locally:
docker build -t emotion_detection:latest .
docker run -p 8000:8000 emotion_detection:latest
FastAPI app will be available at:
http://<EC2-IP>:80/docs
Example request:
{
"text": "I am very happy today!"
}
Response:
{
"emotion": "joy"
}
DAGSHUB_PAT
β Access token for MLflow + DagsHubDOCKER_HUB_USERNAME
β DockerHub usernameDOCKER_HUB_ACCESS_TOKEN
β DockerHub tokenEC2_HOST
β AWS EC2 Public IP/DomainEC2_USER
β EC2 username (e.g.,ubuntu
)EC2_SSH_KEY
β SSH private key
- Add monitoring with Prometheus + Grafana
- Add canary deployments for safer model rollouts
- Automate data drift detection
- Integrate with Kubernetes for scaling
Feel free to fork the repo, raise issues, and submit PRs.
This project is licensed under the MIT License.
Author: Abhay Singh GitHub: AbhaySingh71 Email: [email protected]