Skip to content

Latest commit

 

History

History
69 lines (45 loc) · 2.2 KB

lab_13.md

File metadata and controls

69 lines (45 loc) · 2.2 KB

Lab 13

Cloud-based pipelines

You'll be working with your Project team.


Goal

Your regularly-updated data is being pulled in automatically


GitHub Actions ➡️ Python ➡️ BigQuery


Make a workflow that runs on a schedule to run your ETL. Check out the workflows in this repository as examples you can start with.


Thinking ahead

  • How will you know if it worked?
  • If something went wrong, where would you look to find out why?

Hints

There's an extra step GitHub Actions will need credentials + permissions to write to BigQuery.
How to do it
  1. Create a service account in Google Cloud.
  2. Grant it the appropriate role.
  3. Create a key as JSON.
  4. Add it to GitHub Actions as a secret.

Tips

  • To test out the workflow, you can:
  • To protect yourself in case your data gets messed up:
    • Set up table snapshots.
    • Make separate test and production datasets/tables in BigQuery.

Optional


Submit via CourseWorks:

  • Links to your pull request(s)
  • A link to your GitHub Actions run history