Cloud-based pipelines
You'll be working with your Project team.
Your regularly-updated data is being pulled in automatically
GitHub Actions ➡️ Python ➡️ BigQuery
Make a workflow that runs on a schedule to run your ETL. Check out the workflows in this repository as examples you can start with.
- How will you know if it worked?
- If something went wrong, where would you look to find out why?
There's an extra step
GitHub Actions will need credentials + permissions to write to BigQuery.How to do it
- Create a service account in Google Cloud.
- Grant it the appropriate role.
- Create a key as JSON.
- Add it to GitHub Actions as a secret.
- To test out the workflow, you can:
- Make the schedule more frequent
- Set it to
push
- Run it manually
- Advanced: Run it locally
- To protect yourself in case your data gets messed up:
- Set up table snapshots.
- Make separate test and production datasets/tables in BigQuery.
- Links to your pull request(s)
- A link to your GitHub Actions run history