Skip to content

Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

Open
@adlersantos

Description

@adlersantos

Description

The concept of a dataset is starting to become an overloaded term. It could mean the following:

  • A BigQuery dataset which is a collection of tables. This is the original definition we based the datasets folder from.
  • A collection of datasets can also be called a dataset. e.g. the Vizgen dataset, which includes the Mouse Brain Map dataset.
  • The other way applies just as well: a subset of larger dataset/s can also be called a dataset. e.g. the Mouse Brain Map dataset which is part of the Vizgen dataset

Plus, in the future, we can expect pipelines that need to onboard multiple datasets in one go. Such a concept is difficult to align using the current hierarchy.

Proposed

The proposal here is to switch from using the datasets/DATASET/PIPELINE hierarchy into the pipelines/PIPELINE_GROUP/PIPELINE hierarchy.

# CURRENT 
datasets/
    vizgen/                      (dataset)
        mouse_brain_map          (pipeline)
        some_genome_collection   (pipeline)
    covid19/                     (dataset)
        national_cases           (pipeline)
        racial_stats             (pipeline)        


# PROPOSED
pipelines/
    vizgen/                      (pipeline group)
        mouse_brain_map          (pipeline)
        some_genome_collection   (pipeline)
    covid19/                     (pipeline group)
        national_cases           (pipeline)
        racial_stats             (pipeline)        

Checklist

  • I created this issue in accordance with the Code of Conduct.
  • This issue is appropriately labeled.

Metadata

Metadata

Assignees

Labels

cleanupCleanup or refactor coderevision: readmeImprovements or additions to the README

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions