Emotion Detection MLOps Project

Introduction

This project is a complete end-to-end MLOps pipeline for an Emotion Detection system built using FastAPI, Scikit-learn, and NLTK. It leverages modern DevOps and MLOps practices to ensure reproducibility, scalability, and automation. The application detects emotions in text and provides an API endpoint for integration.

The system is designed with CI/CD, model registry, automated training, experiment tracking, and deployment pipelines, making it production-ready.

📌 Sample Image

Here’s the FastAPI interface of the Emotion Detection system:

⚡ Tech Stack

Machine Learning: Scikit-learn, NLTK
API Framework: FastAPI (Cookiecutter template)
Experiment Tracking & Model Registry: MLflow with DagsHub integration
Data Versioning: DVC (backed by AWS S3)
CI/CD: GitHub Actions
Deployment: Docker + AWS EC2
Container Registry: DockerHub
Secrets Management: GitHub Secrets

🛠️ Key Features

-> Data Versioning with DVC → Stores datasets and models in AWS S3 for reproducibility.
-> Experiment Tracking → All experiments tracked with MLflow + DagsHub UI.
-> Model Registry → Automatic promotion of the best model to production.
-> Hyperparameter Tuning → Automated tuning logged in MLflow.
-> Dockerized Application → Ensures consistent deployment across environments.
-> CI/CD Pipeline → GitHub Actions pipeline automates testing, training, building, and deployment.
-> Cloud Deployment → Hosted on AWS EC2, pulling latest Docker images from DockerHub.

🔄 MLOps Workflow

Data & Model Management
- Datasets stored and versioned with DVC in AWS S3.
- Preprocessing and feature engineering pipelines tracked.
Experimentation
- MLflow logs training runs (accuracy, precision, recall, etc.).
- Hyperparameter tuning experiments stored in MLflow.
Model Registry
- Best-performing models are automatically promoted using a promotion script.
- Registry maintained in MLflow (via DagsHub).
CI/CD Pipeline (GitHub Actions)
- Runs unit tests for ML models and FastAPI endpoints.
- Executes DVC repro to rebuild pipelines if data/code changes.
- Builds Docker image and pushes to DockerHub.
- Deploys containerized app to AWS EC2.
Deployment
- Application runs on FastAPI with Uvicorn.
- Served via Docker container on AWS EC2 (Ubuntu).

📂 Project Highlights

Data Version Control (DVC)

Stores large files (datasets, models) in AWS S3.
Ensures reproducibility across environments.

MLflow + DagsHub

Tracks experiments, metrics, and artifacts.
Centralized model registry for production models.

Hyperparameter Tuning

Automated tuning with MLflow logging.
Compares experiments for performance improvements.

CI/CD Pipeline

Tests → Build → Push → Deploy fully automated with GitHub Actions.
Uses appleboy/ssh-action for secure AWS deployment.

Docker & AWS

Dockerized FastAPI app.
Auto-pulled from DockerHub into AWS EC2 instance.
Runs on port 80 mapped to FastAPI 8000.

🚀 Deployment Flow

Push code → GitHub Actions triggers.
Run tests + DVC pipeline.
Log experiments to MLflow.
Build and push Docker image → DockerHub.
SSH into EC2 → Pull image & restart container.
Application live at http://<EC2-IP>.

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

🧪 Testing

🔹 1. Data Validation Tests

Input Format Checks – Verify that the raw text data is non-empty, correctly encoded (UTF-8), and free of invalid symbols.
Class Distribution Tests – Ensure that training/validation splits have balanced representation of emotion classes (e.g., joy, anger, sadness, fear, surprise, neutral).
Text Cleaning Functions – Unit tests for stopword removal, lemmatization, and tokenization functions using NLTK.

🔹 2. Model Testing

Prediction Shape & Type – Check if model outputs valid emotion labels with probability scores.
Confidence Thresholding – Ensure probabilities sum to 1 (softmax check) and exceed a minimum confidence level.
Overfitting Check – Validate that training accuracy vs. validation accuracy difference stays within tolerance.
Hyperparameter Sensitivity Tests – Re-run the model with different parameter sets (tracked in MLflow) to verify stability of results.

🔹 3. API Testing (FastAPI)

Unit API Tests

Verify /predict returns 200 OK.
Validate correct schema in response:

{
  "text": "I am very happy today!",
  "prediction": "joy",
  "confidence": 0.94
}


Run tests locally:

```bash
pytest tests/

📌 DAG Workflow

Here’s the DAG representation of the pipeline:

📦 Docker

Build and run locally:

docker build -t emotion_detection:latest .
docker run -p 8000:8000 emotion_detection:latest

📜 API Usage

FastAPI app will be available at:

http://<EC2-IP>:80/docs

Example request:

{
  "text": "I am very happy today!"
}

Response:

{
  "emotion": "joy"
}

🔑 Secrets (GitHub)

DAGSHUB_PAT → Access token for MLflow + DagsHub
DOCKER_HUB_USERNAME → DockerHub username
DOCKER_HUB_ACCESS_TOKEN → DockerHub token
EC2_HOST → AWS EC2 Public IP/Domain
EC2_USER → EC2 username (e.g., ubuntu)
EC2_SSH_KEY → SSH private key

🎯 Future Improvements

Add monitoring with Prometheus + Grafana
Add canary deployments for safer model rollouts
Automate data drift detection
Integrate with Kubernetes for scaling

🤝 Contributing

Feel free to fork the repo, raise issues, and submit PRs.

📝 License

This project is licensed under the MIT License.

📧 Contact

Author: Abhay Singh GitHub: AbhaySingh71 Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.dvc		.dvc
.github/workflows		.github/workflows
docs		docs
fastapi_app		fastapi_app
models		models
notebooks		notebooks
references		references
reports		reports
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emotion Detection MLOps Project

Introduction

📌 Sample Image

⚡ Tech Stack

🛠️ Key Features

🔄 MLOps Workflow

📂 Project Highlights

Data Version Control (DVC)

MLflow + DagsHub

Hyperparameter Tuning

CI/CD Pipeline

Docker & AWS

🚀 Deployment Flow

Project Organization

🧪 Testing

🔹 1. Data Validation Tests

🔹 2. Model Testing

🔹 3. API Testing (FastAPI)

📌 DAG Workflow

📦 Docker

📜 API Usage

🔑 Secrets (GitHub)

🎯 Future Improvements

🤝 Contributing

📝 License

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

AbhaySingh71/MLops-emotion-detection

Folders and files

Latest commit

History

Repository files navigation

Emotion Detection MLOps Project

Introduction

📌 Sample Image

⚡ Tech Stack

🛠️ Key Features

🔄 MLOps Workflow

📂 Project Highlights

Data Version Control (DVC)

MLflow + DagsHub

Hyperparameter Tuning

CI/CD Pipeline

Docker & AWS

🚀 Deployment Flow

Project Organization

🧪 Testing

🔹 1. Data Validation Tests

🔹 2. Model Testing

🔹 3. API Testing (FastAPI)

📌 DAG Workflow

📦 Docker

📜 API Usage

🔑 Secrets (GitHub)

🎯 Future Improvements

🤝 Contributing

📝 License

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages