We want you to create a dbdiagram.io Entity Relationship Diagram (ERD) for datasets 1, 2, and 3. Each dataset is organized into a folder containing .csv files for all parts of the dataset.
Before creating your ERD for a dataset, we recommend you investigate the data using the following steps:
- Explore the data:
- Create a single Excel file to hold all .csv files for a dataset.
- Import the data from each .csv file into its own sheet tab
- Review the overall data in this format, looking for relationships and inconsistencies in the data
- Highlight fields with mixed types or inconsistent formats, and add comments in Excel.
- Profile and Normalize the data:
- Decide appropriate data types and constraints for the data in each .csv tab
- Decide on the relationships to define between fields in different .csv tabs
- Identify entities, primary keys, and foreign keys.
- Draw relationships (cardinalities) and note optional vs. required links.
- Add brief comments on columns that require cleaning or should remain TEXT/VARCHAR due to variability in the field's data.
- When you are done, create a .txt file named
dataset-[dataset number]-solution.txt
for each dataset and paste in your final dbdiagram.io ERD code into the file - Push your changes to the Github repo you created for this exercise, and submit a link to your repo in Canvas