Skip to content

ProjectProRepo/Data-Engineering-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Projects

Welcome to the ProjectPro Data Engineering Projects Repository! This repository is designed for aspiring and experienced data engineers who want to master the essential skills required to build scalable data solutions. Whether you're learning to design ETL pipelines, work with big data technologies, or optimize data warehouses, this repository provides hands-on, real-world projects to enhance your expertise.

Introduction

Data Engineering is a critical field that involves the design, development, and maintenance of data infrastructure. This repository equips you with practical experience in cloud platforms like AWS, Azure, and GCP, covering topics such as streaming data, ETL pipelines, and real-time analytics. With over 33 projects, you'll gain industry-relevant skills to tackle data engineering challenges confidently.

Projects

Explore real-world Data Engineering projects covering cloud-based data pipelines, streaming analytics, ETL processes, and data lake management. Each project includes a structured dataset to help you practice with real-world data scenarios.

Sl No. Name Category Description Link
1 Azure Stream Analytics for Real-Time Cab Service Monitoring Streaming Data Processing Learn to process real-time cab service data with Azure Stream Analytics for live insights. Azure Stream Analytics
2 Build Serverless Pipeline using AWS CDK and Lambda in Python Cloud Data Engineering Create a serverless data pipeline using AWS CDK and Lambda functions in Python. Serverless AWS Pipeline
3 Build an ETL Pipeline on EMR using AWS CDK and Power BI ETL & Data Warehousing Implement an ETL pipeline using AWS EMR, CDK, and visualize insights using Power BI. ETL Pipeline on AWS EMR
4 AWS CDK and IoT Core for Migrating IoT-Based Data to AWS IoT Data Management Migrate IoT data to AWS using AWS CDK and IoT Core for real-time analytics. AWS IoT Data Migration
5 Migration of MySQL Databases to Cloud AWS using AWS DMS Database Migration Move on-premise MySQL databases to AWS using Database Migration Service (DMS). MySQL to AWS Migration
6 Build an Incremental ETL Pipeline with AWS CDK ETL Processing Implement an incremental ETL pipeline using AWS CDK to process and store big data. Incremental ETL Pipeline
7 Databricks Real-Time Streaming with Event Hubs and Snowflake Streaming & Big Data Process real-time data using Databricks, Azure Event Hubs, and Snowflake for analytics. Databricks Real-Time Streaming
8 Build a Scalable Event-Based GCP Data Pipeline using DataFlow Cloud Data Pipelines Create an event-driven data pipeline on Google Cloud using Apache DataFlow. GCP Data Pipeline
9 SQL Project for Data Analysis using Oracle Database-Part 1 SQL & Data Analysis Perform in-depth data analysis using Oracle Database and advanced SQL queries. SQL Data Analysis
10 Build an AWS ETL Data Pipeline in Python on YouTube Data ETL & Cloud Data Engineering Develop an AWS-based ETL pipeline to process YouTube data using Python. AWS ETL Pipeline
11 Build a real-time Streaming Data Pipeline using Flink and Kinesis Streaming Data Processing Set up real-time data streaming using Apache Flink and AWS Kinesis for large-scale applications. Streaming Pipeline with Flink
12 DevOps Project to Build and Deploy an Azure DevOps CI/CD Pipeline DevOps & Data Engineering Implement a CI/CD pipeline for automated deployments in Azure DevOps. Azure DevOps Pipeline
13 Graph Database Modeling using AWS Neptune and Gremlin Graph Databases Learn to model and query data using AWS Neptune and Gremlin for graph-based analytics. AWS Neptune Graph Database
14 AWS CDK Project for Building Real-Time IoT Infrastructure IoT Data Management Develop a real-time IoT infrastructure using AWS CDK for scalable data processing. AWS IoT Infrastructure
15 Build an ETL Pipeline for Financial Data Analytics on GCP-IaC ETL & Data Analytics Implement an ETL pipeline on GCP using Infrastructure as Code (IaC) for financial data processing. GCP ETL Pipeline
16 Azure Data Factory and Databricks End-to-End Project Cloud Data Engineering Build an end-to-end data engineering pipeline using Azure Data Factory and Databricks. Azure Data Factory Project
17 Movielens Dataset Analysis on Azure Big Data & Analytics Perform data analysis on the Movielens dataset using Azure cloud services. Movielens Analysis
18 Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks Data Processing Process and analyze Yelp dataset using Apache Spark and Parquet on Azure Databricks. Yelp Dataset Analysis
19 GCP Data Ingestion with SQL using Google Cloud Dataflow Cloud Data Pipelines Use Google Cloud Dataflow for structured SQL-based data ingestion and processing. GCP Data Ingestion
20 Orchestrate Redshift ETL using AWS Glue and Step Functions ETL & Cloud Data Engineering Automate Redshift ETL workflows using AWS Glue and Step Functions for optimized data processing. Redshift ETL Orchestration
21 Build a Real-Time Spark Streaming Pipeline on AWS using Scala Streaming Data Processing Create a real-time Spark streaming pipeline on AWS with Scala for high-speed data ingestion. AWS Spark Streaming
22 Hands-On Real-Time PySpark Project for Beginners Big Data & Spark Learn real-time data processing using PySpark for big data analytics. PySpark Real-Time Project
23 SQL Project for Data Analysis using Oracle Database-Part 2 SQL & Data Analysis Advanced SQL data analysis techniques using Oracle Database. SQL Analysis Part 2
24 Databricks Data Lineage and Replication Management Data Governance Implement data lineage tracking and replication in Databricks for enterprise-scale data management. Databricks Data Lineage
25 SQL Project for Data Analysis using Oracle Database-Part 5 SQL & Data Analysis Final part of an in-depth SQL data analysis series using Oracle Database. SQL Analysis Part 5
26 PySpark Project to Learn Advanced DataFrame Concepts Big Data & Spark Master advanced PySpark DataFrame concepts for large-scale data processing. PySpark Advanced Concepts
27 PySpark ETL Project for Real-Time Data Processing Streaming Data Processing Build an ETL pipeline using PySpark for real-time data transformation and storage. PySpark ETL Project
28 Build an Analytical Platform for eCommerce using AWS Services Cloud Data Engineering Develop an analytics platform for eCommerce businesses using AWS data services. AWS eCommerce Analytics
29 GCP Project-Build Pipeline using Dataflow Apache Beam Python Cloud Data Pipelines Use Apache Beam with Google Cloud Dataflow to build scalable data pipelines. GCP Apache Beam Pipeline
30 Airline Dataset Analysis using PySpark GraphFrames in Python Big Data & Graph Analytics Perform airline data analysis using PySpark GraphFrames for relationship-based analytics. Airline Data Analysis
31 Build Streaming Data Pipeline using Azure Stream Analytics Streaming Data Processing Implement a scalable real-time data pipeline using Azure Stream Analytics. Azure Streaming Pipeline
32 SQL Project for Data Analysis using Oracle Database-Part 4 SQL & Data Analysis Explore deeper SQL data analysis techniques in Oracle Database. SQL Analysis Part 4
33 Azure Project to Build a Real-time ADF Pipeline with LogicApps Cloud Data Integration Design and implement a real-time Azure Data Factory pipeline integrated with LogicApps. Azure ADF Project

Learning Outcomes

By exploring these Data Engineering projects, you will:

  • Master data pipeline development across cloud platforms (AWS, Azure, GCP).
  • Gain hands-on experience with ETL processes, real-time data streaming, and big data management.
  • Work with industry-standard tools, including Spark, Kafka, Snowflake, and Databricks.
  • Apply DevOps and CI/CD best practices to deploy scalable data solutions.

Get Started

Start your Project-Based Data Engineering journey today and become a proficient data engineer! Explore the projects, gain hands-on experience, and advance your career in data engineering.

Happy Learning! :)

About

A repository of solved projects in Data Engineering for beginners and professionals.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published