-
Couldn't load subscription status.
- Fork 76
Clarify BIDS dataset organization and distribution considerations #688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yarikoptic, thanks for putting these changes together as it clarifies several pieces for me.
cc @satra
We may also need to change the above statement to more accurately reflect the examples provided. Perhaps to the following:
|
- Add note about BIDS Raw datasets being distributable without derivatives - Include dataset_description.json in directory structure examples to emphasize where we observe legit BIDS datasets - Explain disadvantages of nested dataset organization for distribution - Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets - Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories - Re-Include example of non-nested dataset organization in my_study folder (I based this change on top of the removal proposal in bids-standard#687)
Co-authored-by: Kabilar Gunalan <[email protected]>
852dcb2 to
32e5edb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I don't want to initiate a new discussion (or this discussion again...) but I kind of don't understand why we need to serve all these different cases of where to store what?
Why can't we just say
root/sourcedata--> out of scanner data
root/raw--> raw data in BIDS
root/derivatives--> anything starting with preprocessing, in BIDS- If you have only one of those three, adding a second level folder is not necessary.
Can someone please explain to me why it is not possible to do this? I'm clearly missing something and I would like to understand...sorry!
|
@julia-pfarr a quick answer on "Why can't we just say" is that because it is not what the standard is permitting/expecting ATM. This repository is for the website which just provides a little more of "expanded for a user re-digestion of the specification". If we start saying here something not allowed by the specification, we would do disservice IMHO. Any decisions to change things up should be discussed against specification first. |
TODO: add generic phrasing based on what @effigies suggested Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Julia-Katharina Pfarr <[email protected]>
Ok, I see, that makes sense, thanks! So I understand from your answer that my suggestion is technically possible and not totally unreasonable, this is just not the right place for it. |
|
Hi @yarikoptic, the conversation above was resolved, but I don't think the following update to the text has been made. Original: Suggestion: |
Co-authored-by: Kabilar Gunalan <[email protected]>
| The following examples show three ways to organize, relative to each other, | ||
| a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs. | ||
|
|
||
| A collection of derivative datasets may be stored in the `derivatives/` subdirectory | ||
| of a BIDS (or BIDS Derivatives) dataset: | ||
|
|
||
| ```bash | ||
| my_dataset/ | ||
| derivatives/ | ||
| preprocessed/ | ||
| analysis/ | ||
| sub-01/ | ||
| ... | ||
| dataset_description.json | ||
| ``` | ||
|
|
||
| A BIDS Derivatives dataset may contain references to its input datasets | ||
| in the `sourcedata/` subdirectory: | ||
| Disadvantage is that such organization would complicate distribution of the raw BIDS dataset | ||
| by itself as it would require explicit exclusion of datasets within its `derivatives/` folder. | ||
|
|
||
| A BIDS Derivative dataset may contain references to its input datasets | ||
| (could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the `sourcedata/` subdirectory: | ||
|
|
||
| ```bash | ||
| my_analysis/ | ||
| sourcedata/ | ||
| raw/ | ||
| sub-01/ | ||
| ... | ||
| dataset_description.json | ||
| preprocessed/ | ||
| sub-01/ | ||
| ... | ||
| dataset_description.json | ||
| ``` | ||
|
|
||
| Note that the `sourcedata/` and `derivatives/` subdirectories constitute dataset boundaries. | ||
| Any contents of these directories may be validated independently, | ||
| but their contents must not affect the interpretation of the nested or containing datasets. | ||
| Any subfolders of these directories may be validated independently, if they are BIDS datasets | ||
| which would be indicated by presence of `dataset_description.json` file(s) in them with a | ||
| REQUIRED `"BIDSVersion"` key. | ||
| It is important to note that their contents must not affect the interpretation of the nested | ||
| or containing datasets. | ||
|
|
||
| Unnested datasets are also possible. For example: | ||
| One potential disadvantage to nesting a BIDS Derivative dataset inside a BIDS Raw dataset, or vice versa, | ||
| is that packaging them for independent sharing or publication can become more complicated. | ||
| It is also possible to completely avoid nesting of BIDS Raw datasets into BIDS Derivative datasets (or vice versa), | ||
| by simply placing them in separate folders, namely `sourcedata/` and `derivatives/` at root level: | ||
|
|
||
| ```bash | ||
| my_study/ | ||
| raw_data/ | ||
| sub-01/ | ||
| ... | ||
| sourcedata/ | ||
| raw/ | ||
| sub-01/ | ||
| ... | ||
| derivatives/ | ||
| preprocessed/ | ||
| analysis/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The following examples show three ways to organize: | |
| #### 1. BIDS Raw with derivatives/ | |
| A collection of derivative datasets may be stored in the `derivatives/` subdirectory | |
| @@ -199,31 +201,46 @@ | |
| analysis/ | |
| sub-01/ | |
| ... | |
| dataset_description.json |
Disadvantage is that such organization would complicate distribution of the raw BIDS dataset
by itself as it would require explicit exclusion of datasets within its derivatives/ folder.
2. BIDS Derivative with sourcedata/raw
A BIDS Derivative dataset may contain references to its input datasets
(could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the sourcedata/ subdirectory:
my_analysis/
sourcedata/
raw/
sub-01/
...
dataset_description.json
preprocessed/
sub-01/
...
dataset_description.jsonNote that the sourcedata/ and derivatives/ subdirectories constitute dataset boundaries.
Any subfolders of these directories may be validated independently, if they are BIDS datasets
which would be indicated by presence of dataset_description.json file(s) in them with a
REQUIRED "BIDSVersion" key.
It is important to note that their contents must not affect the interpretation of the nested
or containing datasets.
One potential disadvantage to nesting a BIDS Derivative dataset inside a BIDS Raw dataset, or vice versa,
is that packaging them for independent sharing or publication can become more complicated.
3. BIDS Study with sourcedata/raw and derivatives/
It is also possible to completely avoid nesting of BIDS Raw datasets into BIDS Derivative datasets (or vice versa),
by simply placing them in separate folders, namely sourcedata/ and derivatives/ at root level:
my_study/
sourcedata/
raw/
sub-01/
...
derivatives/
preprocessed/
analysis/There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kabilar is that what you had in mind?
to emphasize where we observe legit BIDS datasets
This PR is on top of the
rawdata/at the top-level of a BIDS dataset #687by @kabilar
Potential TODOs but may be after in a separate PR: