Skip to content

Example pipelines built with Mage, a data orchestration tool. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.

License

Notifications You must be signed in to change notification settings

mage-ai/mage-pipeline-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mage Pipeline Examples πŸš€

A comprehensive collection of example pipelines built with Mage, the modern data orchestration platform. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.

Mage Python

🌟 What is Mage?

Mage is a modern data orchestration platform that simplifies building, running, and monitoring data pipelines. Unlike traditional orchestration tools, Mage offers:

  • 🎯 Python-first approach - Build pipelines using familiar Python syntax
  • πŸ“Š Interactive development - Develop and test pipelines in a notebook-style interface
  • πŸ”„ Real-time monitoring - Built-in observability and monitoring capabilities
  • 🧩 Modular architecture - Reusable blocks for data loading, transformation, and exporting
  • ☁️ Cloud-native - Easy deployment to AWS, GCP, and Azure
  • πŸ€– ML-focused - Specialized features for machine learning workflows

πŸš€ Mage Pro Features

For enterprise teams and production environments, Mage Pro provides:

  • πŸ’» Enterprise Support - Dedicated support and SLA guarantees
  • πŸ“Š Advanced Analytics - Enhanced monitoring, alerting, and performance insights
  • πŸ”’ Security & Compliance - Enterprise-grade security features and compliance tools
  • ⚑ High Performance - Optimized for large-scale data processing
  • 🌐 Multi-tenant Architecture - Support for multiple teams and projects
  • πŸ”„ Advanced Scheduling - Complex scheduling and dependency management
  • πŸ“Š Custom Dashboards - Tailored monitoring and reporting capabilities

πŸ“š Pipeline Examples

This repository contains a comprehensive collection of pipeline examples organized by category:

πŸ“Š Data Integration

  • API to Database - Extract data from REST APIs and load into databases
  • Multi-source Sync - Combine data from multiple APIs, databases, and files
  • Database Replication - Real-time database synchronization and replication

πŸ“¦ Batch ETL

  • CSV Processing - Process and transform CSV files with data validation
  • JSON ETL - Extract, transform, and load JSON data from various sources
  • Combine Python and SQL - Hybrid processing using both Python and SQL operations

🌊 Streaming Pipelines

  • Kafka Consumer - Real-time data processing from Kafka streams
  • Real-time Analytics - Live analytics and metrics calculation
  • Event Processing - Process and route events in real-time

πŸ€– ML Models

  • Model Training - End-to-end ML model training pipeline
  • Model Inference - Deploy and serve ML models in production
  • Guide to Accuracy, Precision, and Recall - Learn ML evaluation metrics

πŸ” Data Quality

  • Validation Pipeline - Automated data validation and quality checks
  • Monitoring Dashboard - Real-time data quality monitoring and alerting
  • Anomaly Detection - Detect and handle data anomalies automatically

☁️ Cloud Operations

  • S3 to RDS - Transfer data from AWS S3 to RDS PostgreSQL
  • Multi-cloud Sync - Cross-cloud data movement and synchronization
  • Infrastructure Monitoring - Monitor cloud resources and costs

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Docker (recommended)
  • Git
  • Mage Pro - For enterprise features, advanced monitoring, and production-ready deployments

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/mage-pipeline-examples.git
    cd mage-pipeline-examples
  2. Set up Mage using Docker (Recommended):

    # Clone Mage's quickstart template
    git clone https://github.com/mage-ai/compose-quickstart.git mage-setup
    cd mage-setup
    
    # Copy environment file
    cp dev.env .env
    
    # Start Mage
    docker compose up
  3. Access Mage UI: Open your browser and navigate to http://localhost:6789

  4. Import a Pipeline:

    Method 1: Zip Upload (Recommended)

    a. Prepare the pipeline:

    # Navigate to the pipeline directory you want to import
    cd examples/data-integration/api-to-database
    
    # Create a zip file of the pipeline
    zip -r api-to-database-pipeline.zip .

    b. Upload to Mage:

    • Open Mage UI at http://localhost:6789
    • Click on "Pipelines" in the left sidebar
    • Click "Import" button
    • Select "Upload zip file"
    • Choose your api-to-database-pipeline.zip file
    • Click "Import"

    c. Verify import:

    • The pipeline should appear in your pipelines list
    • Click on the pipeline to view and edit it
    • Follow the setup instructions in the pipeline's README

    Method 2: Manual Copy

    a. Copy pipeline files:

    # Copy the entire pipeline directory to your Mage project
    cp -r examples/data-integration/api-to-database/* /path/to/your/mage/project/pipelines/

    b. Refresh Mage UI:

    • The pipeline should appear automatically
    • If not, restart your Mage server

    Method 3: Git Clone (For Development)

    a. Clone into Mage project:

    # Navigate to your Mage project directory
    cd /path/to/your/mage/project
    
    # Clone specific pipeline
    git clone https://github.com/your-username/mage-pipeline-examples.git temp
    cp -r temp/examples/data-integration/api-to-database/* pipelines/
    rm -rf temp

Post-Import Configuration

After importing a pipeline, you'll need to configure it for your environment:

  1. Install Dependencies:

    # Install required Python packages
    pip install -r requirements.txt
  2. Configure Environment Variables:

    # Create or update .env file in your Mage project root
    echo "API_KEY=your_api_key_here" >> .env
    echo "DATABASE_URL=your_database_url_here" >> .env
  3. Update IO Configuration:

    # Edit io_config.yaml with your database and API credentials
    nano io_config.yaml
  4. Test the Pipeline:

    • Open the pipeline in Mage UI
    • Click "Run" to test the pipeline
    • Check logs for any errors
    • Verify data output

Alternative: Local Installation

# Install Mage
pip install mage-ai

# Start Mage server
mage start your_project_name

πŸ“– How to Use This Repository

1. Browse Examples

Each pipeline example is organized in its own directory with:

  • README.md - Detailed explanation and setup instructions
  • Pipeline files - The actual Mage pipeline code
  • requirements.txt - Python dependencies
  • Sample data (if applicable)

2. Choose Your Pipeline Category

  • Data Integration (examples/data-integration/) - Connect and sync data from various sources
  • Batch ETL (examples/batch-etl/) - Process large datasets in batches
  • Streaming Pipelines (examples/streaming-pipelines/) - Real-time data processing
  • ML Models (examples/ml-models/) - Machine learning workflows and MLOps
  • Data Quality (examples/data-quality/) - Data validation and monitoring
  • Cloud Operations (examples/cloud-ops/) - Cloud infrastructure and data movement

3. Import the Pipeline

Choose your preferred import method:

  • Zip Upload (Recommended) - Upload pipeline as zip file through Mage UI
  • Manual Copy - Copy files directly to your Mage project
  • Git Clone - Clone specific pipeline for development

4. Configure and Run

Each pipeline includes:

  • Prerequisites and dependencies
  • Configuration steps
  • Sample data setup
  • Running instructions

5. Customize for Your Use Case

  • Modify data sources and destinations
  • Adjust transformation logic
  • Add your own business logic
  • Scale for your data volume

πŸ—οΈ Pipeline Structure

Mage pipelines typically consist of three main components:

Data Loaders

Extract data from various sources:

@data_loader
def load_data_from_api(*args, **kwargs):
    # Your data loading logic here
    return data

Transformers

Process and transform your data:

@transformer
def transform_data(data, *args, **kwargs):
    # Your transformation logic here
    return transformed_data

Data Exporters

Load data to destinations:

@data_exporter
def export_data_to_database(data, *args, **kwargs):
    # Your data export logic here

πŸ”§ Configuration

Environment Variables

Create a .env file in your Mage project root:

# Database Configuration
POSTGRES_DBNAME=your_database
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

# API Keys
API_KEY=your_api_key
WEATHER_API_KEY=your_weather_api_key

# Cloud Configuration
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret

IO Configuration

Configure data connections in io_config.yaml:

dev:
  POSTGRES_CONNECT_TIMEOUT: 10
  POSTGRES_DBNAME: "{{ env_var('POSTGRES_DBNAME') }}"
  POSTGRES_USER: "{{ env_var('POSTGRES_USER') }}"
  POSTGRES_PASSWORD: "{{ env_var('POSTGRES_PASSWORD') }}"
  POSTGRES_HOST: "{{ env_var('POSTGRES_HOST') }}"
  POSTGRES_PORT: "{{ env_var('POSTGRES_PORT') }}"

πŸ“Š Monitoring and Observability

Mage provides built-in monitoring capabilities:

  • Pipeline Execution History - Track all pipeline runs
  • Real-time Logs - Monitor pipeline execution in real-time
  • Data Quality Metrics - Built-in data validation and quality checks
  • Performance Metrics - Track execution time and resource usage
  • Error Handling - Automatic retry and failure notifications

🀝 Contributing

We welcome contributions! Here's how you can help:

Adding New Examples

  1. Fork the repository
  2. Create a new directory for your pipeline example
  3. Include:
    • README.md with detailed instructions
    • Pipeline code files
    • requirements.txt
    • Sample data (if applicable)
  4. Submit a pull request

Guidelines

  • Follow Python best practices (PEP 8)
  • Include comprehensive documentation
  • Test your pipelines before submitting
  • Use descriptive commit messages
  • Update this README if adding new categories

Pipeline Requirements

  • Clear, well-commented code
  • Comprehensive setup instructions
  • Error handling and validation
  • Sample data or data generation scripts
  • Documentation of data sources and destinations

πŸ“š Learning Resources

Official Documentation

Tutorials and Guides

Community

πŸ› Troubleshooting

Common Issues

Pipeline fails to start:

  • Check your Python dependencies in requirements.txt
  • Verify environment variables are set correctly
  • Ensure data sources are accessible

Database connection errors:

  • Verify database credentials in io_config.yaml
  • Check network connectivity
  • Ensure database is running and accessible

Import errors:

  • Install missing dependencies: pip install -r requirements.txt
  • Check Python version compatibility
  • Verify import paths

Getting Help

πŸ“ž Support

If you find this repository helpful, please:

  • ⭐ Star the repository
  • 🍴 Fork it for your own use
  • πŸ› Report issues
  • πŸ’‘ Suggest new examples
  • πŸ“’ Share with your network

Happy Data Orchestrating! πŸŽ‰

Built with ❀️ using Mage

About

Example pipelines built with Mage, a data orchestration tool. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published