🧠 MQL5 Economic News Data Pipeline 2025

A production-grade data and machine learning pipeline designed to collect, process, and make predictions using economic news data from the MQL5 website.
The architecture integrates Python, PySpark, Airflow, Docker, Kubernetes, GCP, FastAPI, MLflow, MySQL, and JavaScript into a scalable and modular system for automated data workflows.

⚠️ Note:
Certain components of this pipeline are intentionally left out in this repository for privacy and environment-specific reasons.
The provided modules represent the core production logic and structure.

📊 Pipeline Architecture

This diagram outlines the end-to-end flow — from data ingestion to transformation, modeling, tracking, and real-time prediction delivery.

⚙️ Tech Stack Overview

Layer	Technology	Purpose
Data Ingestion	Python	Scrape MQL5 economic news data
Schema Handling	Python (pandas)	Repair and normalize broken schemas
Processing Engine	PySpark	Distributed data processing and structuring
Database	MySQL	Store cleaned and transformed data
Orchestration	Apache Airflow	Automate and schedule pipeline tasks
API Layer	FastAPI	Model serving and inference endpoints
Experiment Tracking	MLflow	Track, compare, and register models
Visualization	JavaScript	Real-time prediction dashboard
Deployment	Docker & Kubernetes (GCP)	Scalable, containerized production deployment

🧩 Step-by-Step Breakdown

1. Data Ingestion and Collection

Scrapes the MQL5 website for economic event and news data.
Uses Python scripts to extract, format, and store raw data as CSV files.
Establishes a consistent and traceable data input pipeline.

2. Schema Fix

Runs schema validation and correction using a dedicated Python script.
Fixes missing or misaligned columns, enforces consistent data types, and standardizes field naming.
Ensures clean and structured data for distributed processing.

3. Spark Processing

Utilizes PySpark for distributed data transformation and normalization.
Processes large datasets efficiently, preparing them for storage and downstream tasks.
Outputs structured, uniform datasets for transformation and analysis.

4. Data Transformation

Conducts final data cleanup and feature selection within MySQL.
Removes redundant fields, applies filters, and stores the refined dataset.
Produces a high-quality feature set ready for machine learning.

5. FastAPI ML Model UI

Represents the training, validation, and testing stages of the ML lifecycle.

Training: Trains models using the prepared MySQL dataset.
Validation: Assesses model accuracy and performance metrics (MSE, R², etc.).
Testing: Evaluates the model on unseen data to confirm reliability.

A FastAPI service exposes REST endpoints to trigger training, validation, and prediction.

6. MLflow Tracking and Model Registry

Uses MLflow to record metrics, parameters, and artifacts for every experiment.
Manages all model versions through the MLflow Model Registry.
Enables experiment reproducibility and controlled production rollout.

7. Dashboard and Live Predictions

A JavaScript dashboard visualizes live predictions and key metrics in real time.
Communicates with FastAPI endpoints for streaming results and monitoring performance.
Provides actionable insights for economic data and event analysis.

8. Deployment

Each stage of the pipeline is Dockerized for environment consistency.
Deployed on Kubernetes (GCP) for scaling, load balancing, and reliability.
Airflow orchestrates retraining, monitoring, and periodic updates.
Designed for modular scaling — each component operates independently in production.

🧭 Summary

The MQL5 Economic News Data Pipeline 2025 delivers a scalable, modular, and automated production pipeline for financial and economic data.
It unifies the full ML lifecycle — ingestion, schema repair, distributed processing, model training, versioning, and deployment — in a robust, cloud-native environment.

This repository serves as a reference architecture and implementation baseline for enterprise-grade ML systems focused on automation, reproducibility, and performance.

🎥 Video Explanation

A full playlist walkthrough explaining this pipeline — including architecture, components, and workflow execution — will be uploaded to Big Data Brain (@bdb5905) on YouTube.

Subscribe to the channel to get notified when it goes live and for more content on Big Data, Machine Learning Pipelines, and Production Systems.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
News Pipeline		News Pipeline
DOCUMENTATION.md		DOCUMENTATION.md
INSTRUCTIONS MANUAL.md		INSTRUCTIONS MANUAL.md
LICENSE.md		LICENSE.md
News ML Pipeline WorkFlow.png		News ML Pipeline WorkFlow.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 MQL5 Economic News Data Pipeline 2025

📊 Pipeline Architecture

⚙️ Tech Stack Overview

🧩 Step-by-Step Breakdown

1. Data Ingestion and Collection

2. Schema Fix

3. Spark Processing

4. Data Transformation

5. FastAPI ML Model UI

6. MLflow Tracking and Model Registry

7. Dashboard and Live Predictions

8. Deployment

🧭 Summary

🎥 Video Explanation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 MQL5 Economic News Data Pipeline 2025

📊 Pipeline Architecture

⚙️ Tech Stack Overview

🧩 Step-by-Step Breakdown

1. Data Ingestion and Collection

2. Schema Fix

3. Spark Processing

4. Data Transformation

5. FastAPI ML Model UI

6. MLflow Tracking and Model Registry

7. Dashboard and Live Predictions

8. Deployment

🧭 Summary

🎥 Video Explanation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages