Skip to content

bchaoss/trash-wheel-analysis

Repository files navigation

trash-wheel-analysis

A data pipeline & dashboard for analyzing trash wheel collection data.

Data Source: Trash Wheel: Semi-autonomous trash interceptors in Baltimore Harbor

diagram-export-10-9-2025-4_17_41-AM-overlay image

DBT DuckDB MotherDuck Evidence


Data Stack

Stack Purpose (Modern & Open-source)
dbt Generate data transformation pipeline (models, documentation, and tests) in SQL
DuckDB Analytical database engine
MotherDuck Cloud deployment for DuckDB (free plan available)
Evidence BI tool using SQL and Markdown
Github Action CI/CD (to run pipeline, deploy docs to GitHub Page)

Structure

.
├── dbt_pipeline/
│   ├── macros/
│   ├── models/
│   │   ├── ingest/
│   │   ├── staging/
│   │   └── marts/
│   ├── seeds/
│   ├── dbt_project.yml
│   └── profiles.yml
│
├── evidence_BI/
│   ├── pages/
│   ├── sources/
│   ├── build/
│   ├── evidence.config.yaml
│   ├── package.json
│   └── etc.
│
├── .devcontainer/
├── .github/workflows
└── requirements.txt

Data model, DAG and Document

dbt docs : shows SQL models info and structure.

dbt-dag (1)
  1. Seeds

    Static CSV file trash_wheel_info: Stores wheel names and assosiated Google Sheet IDs

    Data source: Google Sheet Link (contain 4 sheets each for a trash wheel).

  2. Ingest Layer

    seed: trash_wheel_info -> stg: config_trash_wheel (for downstream extraction process)

    Source google sheet by get_csv -> ingest_trash_wheel (raw, untransformed data)

  3. Staging Layer (Transformation)

    ingest_trash_wheel ->

    -> stg_trash_format (renames columns, formats date fields and join wheel info)

    -> stg_trash_unique (creates clean and unique primary key of each dumpster by wheel)

    -> stg_trash_type (data type casting)

  4. Marts Layer

    Fact Table:

    stg_trash_type -> fact_trash_collection

    Dimension Table:

    config_trash_wheel -> dim_trash_wheel



Get start

0. Clone the Repo

1. Create MotherDuck account and set environment variable MOTHERDUCK_TOKEN

2. Setup Environment

Use Github Codespace setup by .devcontainer.json, OR in local machine: Python + Node.js

Environment Dependencies
Python 3 pip install -r requirements.txt
Node.js cd evidence_bi/ && npm install

3. Run the dbt Pipeline

cd dbt_pipeline/

dbt debug  # Verify connection

dbt build  # Run the full pipeline (ingest -> staging -> mart) and tests

3. Build the dashboard by Evidence

Define motherduck connection in sources/connection.yaml;

Edit dashboard in pages/index.md;

cd evidence_bi/

echo "EVIDENCE_SOURCE__trash_wheel__token=${MOTHERDUCK_TOKEN}" > .env

npm run sources

npm run build

TBD:

  • test and build dbt
  • mart analysis table
  • incremental (ingest or refresh)
  • github action
  • dashboard deploy
  • analysis and generate insight (TODO)

Reference:

About

dbt data pipeline & evidence BI for analyzing trash wheel collection data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published