Open
Description
Description
The concept of a dataset is starting to become an overloaded term. It could mean the following:
- A BigQuery dataset which is a collection of tables. This is the original definition we based the
datasets
folder from. - A collection of datasets can also be called a dataset. e.g. the Vizgen dataset, which includes the Mouse Brain Map dataset.
- The other way applies just as well: a subset of larger dataset/s can also be called a dataset. e.g. the Mouse Brain Map dataset which is part of the Vizgen dataset
Plus, in the future, we can expect pipelines that need to onboard multiple datasets in one go. Such a concept is difficult to align using the current hierarchy.
Proposed
The proposal here is to switch from using the datasets/DATASET/PIPELINE
hierarchy into the pipelines/PIPELINE_GROUP/PIPELINE
hierarchy.
# CURRENT
datasets/
vizgen/ (dataset)
mouse_brain_map (pipeline)
some_genome_collection (pipeline)
covid19/ (dataset)
national_cases (pipeline)
racial_stats (pipeline)
# PROPOSED
pipelines/
vizgen/ (pipeline group)
mouse_brain_map (pipeline)
some_genome_collection (pipeline)
covid19/ (pipeline group)
national_cases (pipeline)
racial_stats (pipeline)
Checklist
- I created this issue in accordance with the Code of Conduct.
- This issue is appropriately labeled.