End-to-end data pipeline for ongoing clinical trials using Databricks. Ingests data from ClinicalTrials.gov API into Delta Lake (Bronze → Silver → Gold) and prepares analytics-ready datasets for sponsors, conditions, and locations.
- Learning Databricks and Delta Lake hands-on
- Exploring real-world data pipelines and analytics projects
- Interested in mastering data governance, orchestration, and CI/CD in Databricks
- Unity Catalog – managing data access and security
- Databricks Connections – integrating REST APIs and external sources
- Service Principals & User Groups – enterprise-level user management
- Delta Live Tables (DLT) – building reliable ETL pipelines
- Databricks Jobs, Pipelines & Dashboards – orchestration and reporting
- CI/CD Deployment – using Databricks Asset Bundles (DAB)
- Automating deployments with GitHub Actions for Databricks
This project is intended for learning and demonstration purposes.