Skip to content

LaunchCodeEducation/data-analysis-data-engineering-part-1-exercise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Part-1 Exercise

Exercise instructions

We want you to create a dbdiagram.io Entity Relationship Diagram (ERD) for datasets 1, 2, and 3. Each dataset is organized into a folder containing .csv files for all parts of the dataset.

Before creating your ERD for a dataset, we recommend you investigate the data using the following steps:

  1. Explore the data:
    • Create a single Excel file to hold all .csv files for a dataset.
    • Import the data from each .csv file into its own sheet tab
    • Review the overall data in this format, looking for relationships and inconsistencies in the data
    • Highlight fields with mixed types or inconsistent formats, and add comments in Excel.
  2. Profile and Normalize the data:
    • Decide appropriate data types and constraints for the data in each .csv tab
    • Decide on the relationships to define between fields in different .csv tabs

When creating your dbdiagram.io ERD we want you to:

  • Identify entities, primary keys, and foreign keys.
  • Draw relationships (cardinalities) and note optional vs. required links.
  • Add brief comments on columns that require cleaning or should remain TEXT/VARCHAR due to variability in the field's data.

Deliverable:

  • When you are done, create a .txt file named dataset-[dataset number]-solution.txt for each dataset and paste in your final dbdiagram.io ERD code into the file
  • Push your changes to the Github repo you created for this exercise, and submit a link to your repo in Canvas

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published