Skip to content

Latest commit

 

History

History
111 lines (62 loc) · 2.03 KB

lecture_11.md

File metadata and controls

111 lines (62 loc) · 2.03 KB

Lecture 11

Data engineering, continued


We're going to revisit a number of concepts from earlier.


What can go wrong in data loading/manipulation? What errors/bugs have you hit?


What would you want to happen?


Failure modes

  • Graceful degredation
  • Examples?

Process mapping


DAGs

Directed acyclic graphs

What does that mean?


Data processing


Pipelines

From Arshiya:

Why is DAG different from setting workflows in Github?


  • Useful for complex ETL
  • Dependencies
  • Assets
  • Data
  • Code (continuous integration/deployment)

GitHub Actions configuration


Persistence

Why store the data?


Data lake/warehouse

Warehouse layers




From Angel:

[using DAGs] increases data pipeline transparency but simultaneously increases reliance on developer discipline. Code flexibility might just as easily turn into production instability.


There are many alternative data integration / workflow orchestration tools.



They're heavy this week, don't wait!