A data pipeline & dashboard for analyzing trash wheel collection data.
Data Source: Trash Wheel: Semi-autonomous trash interceptors in Baltimore Harbor
| Stack | Purpose (Modern & Open-source) |
|---|---|
| dbt | Generate data transformation pipeline (models, documentation, and tests) in SQL |
| DuckDB | Analytical database engine |
| MotherDuck | Cloud deployment for DuckDB (free plan available) |
| Evidence | BI tool using SQL and Markdown |
| Github Action | CI/CD (to run pipeline, deploy docs to GitHub Page) |
. ├── dbt_pipeline/ │ ├── macros/ │ ├── models/ │ │ ├── ingest/ │ │ ├── staging/ │ │ └── marts/ │ ├── seeds/ │ ├── dbt_project.yml │ └── profiles.yml │ ├── evidence_BI/ │ ├── pages/ │ ├── sources/ │ ├── build/ │ ├── evidence.config.yaml │ ├── package.json │ └── etc. │ ├── .devcontainer/ ├── .github/workflows └── requirements.txt
dbt docs : shows SQL models info and structure.
-
Seeds
Static CSV file
trash_wheel_info: Stores wheel names and assosiated Google Sheet IDsData source: Google Sheet Link (contain 4 sheets each for a trash wheel).
-
Ingest Layer
seed:
trash_wheel_info-> stg:config_trash_wheel(for downstream extraction process)Source google sheet by
get_csv->ingest_trash_wheel(raw, untransformed data) -
Staging Layer (Transformation)
ingest_trash_wheel->->
stg_trash_format(renames columns, formats date fields and join wheel info)->
stg_trash_unique(creates clean and unique primary key of each dumpster by wheel)->
stg_trash_type(data type casting) -
Marts Layer
Fact Table:
stg_trash_type->fact_trash_collectionDimension Table:
config_trash_wheel->dim_trash_wheel
0. Clone the Repo
1. Create MotherDuck account and set environment variable MOTHERDUCK_TOKEN
2. Setup Environment
Use Github Codespace setup by .devcontainer.json, OR in local machine: Python + Node.js
| Environment | Dependencies |
|---|---|
| Python 3 | pip install -r requirements.txt |
| Node.js | cd evidence_bi/ && npm install |
3. Run the dbt Pipeline
cd dbt_pipeline/
dbt debug # Verify connection
dbt build # Run the full pipeline (ingest -> staging -> mart) and tests3. Build the dashboard by Evidence
Define motherduck connection in sources/connection.yaml;
Edit dashboard in pages/index.md;
cd evidence_bi/
echo "EVIDENCE_SOURCE__trash_wheel__token=${MOTHERDUCK_TOKEN}" > .env
npm run sources
npm run build
- test and build dbt
- mart analysis table
- incremental (ingest or refresh)
- github action
- dashboard deploy
- analysis and generate insight (TODO)