Skip to content

Conversation

@yarikoptic
Copy link
Contributor

@yarikoptic yarikoptic commented Aug 7, 2025

  • Add note about BIDS Raw datasets being distributable without derivatives
  • Include dataset_description.json in directory structure examples
    to emphasize where we observe legit BIDS datasets
  • Explain disadvantages of nested dataset organization for distribution
  • Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets
  • Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories
  • Adjust example of non-nested dataset organization in my_study folder

This PR is on top of the

by @kabilar

Potential TODOs but may be after in a separate PR:

  • Add "study" DatasetType (should be added in 1.11.0, or will it be @1.10.1 @effigies ?)

Copy link
Member

@kabilar kabilar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yarikoptic, thanks for putting these changes together as it clarifies several pieces for me.

cc @satra

@kabilar
Copy link
Member

kabilar commented Aug 7, 2025

The following examples show three ways to organize, relative to each other,
a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs.

We may also need to change the above statement to more accurately reflect the examples provided. Perhaps to the following:

The following examples show three ways to organize datasets:

  1. A BIDS Raw dataset with derivative preprocessed and analysis data.
  2. A BIDS Derivative dataset which uses both raw and preprocessed data as inputs.
  3. A BIDS dataset with raw and derivative data.

kabilar and others added 3 commits October 2, 2025 13:24
- Add note about BIDS Raw datasets being distributable without derivatives
- Include dataset_description.json in directory structure examples
  to emphasize where we observe legit BIDS datasets
- Explain disadvantages of nested dataset organization for distribution
- Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets
- Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories
- Re-Include example of non-nested dataset organization in my_study folder
  (I based this change on top of the removal proposal in
  bids-standard#687)
Co-authored-by: Kabilar Gunalan <[email protected]>
@effigies effigies closed this Oct 2, 2025
@effigies effigies reopened this Oct 2, 2025
Copy link
Contributor

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT?

Copy link
Member

@julia-pfarr julia-pfarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I don't want to initiate a new discussion (or this discussion again...) but I kind of don't understand why we need to serve all these different cases of where to store what?

Why can't we just say

  • root/sourcedata --> out of scanner data
    root/raw --> raw data in BIDS
    root/derivatives --> anything starting with preprocessing, in BIDS
  • If you have only one of those three, adding a second level folder is not necessary.

Can someone please explain to me why it is not possible to do this? I'm clearly missing something and I would like to understand...sorry!

@yarikoptic
Copy link
Contributor Author

@julia-pfarr a quick answer on "Why can't we just say" is that because it is not what the standard is permitting/expecting ATM.

This repository is for the website which just provides a little more of "expanded for a user re-digestion of the specification". If we start saying here something not allowed by the specification, we would do disservice IMHO.

Any decisions to change things up should be discussed against specification first.

TODO: add generic phrasing based on what @effigies suggested

Co-authored-by: Chris Markiewicz <[email protected]>
Co-authored-by: Julia-Katharina Pfarr <[email protected]>
@julia-pfarr
Copy link
Member

@julia-pfarr a quick answer on "Why can't we just say" is that because it is not what the standard is permitting/expecting ATM.

This repository is for the website which just provides a little more of "expanded for a user re-digestion of the specification". If we start saying here something not allowed by the specification, we would do disservice IMHO.

Any decisions to change things up should be discussed against specification first.

Ok, I see, that makes sense, thanks! So I understand from your answer that my suggestion is technically possible and not totally unreasonable, this is just not the right place for it.

@kabilar
Copy link
Member

kabilar commented Oct 9, 2025

Hi @yarikoptic, the conversation above was resolved, but I don't think the following update to the text has been made.

Original:

The following examples show three ways to organize, relative to each other,
a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs.

Suggestion:

The following examples show three ways to organize datasets:
## 1. BIDS Raw with derivatives/
## 2. BIDS Derivative with sourcedata/raw
## 3. BIDS Study with sourcedata/raw and derivatives/

Comment on lines 191 to 246
The following examples show three ways to organize, relative to each other,
a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs.

A collection of derivative datasets may be stored in the `derivatives/` subdirectory
of a BIDS (or BIDS Derivatives) dataset:

```bash
my_dataset/
derivatives/
preprocessed/
analysis/
sub-01/
...
dataset_description.json
```

A BIDS Derivatives dataset may contain references to its input datasets
in the `sourcedata/` subdirectory:
Disadvantage is that such organization would complicate distribution of the raw BIDS dataset
by itself as it would require explicit exclusion of datasets within its `derivatives/` folder.

A BIDS Derivative dataset may contain references to its input datasets
(could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the `sourcedata/` subdirectory:

```bash
my_analysis/
sourcedata/
raw/
sub-01/
...
dataset_description.json
preprocessed/
sub-01/
...
dataset_description.json
```

Note that the `sourcedata/` and `derivatives/` subdirectories constitute dataset boundaries.
Any contents of these directories may be validated independently,
but their contents must not affect the interpretation of the nested or containing datasets.
Any subfolders of these directories may be validated independently, if they are BIDS datasets
which would be indicated by presence of `dataset_description.json` file(s) in them with a
REQUIRED `"BIDSVersion"` key.
It is important to note that their contents must not affect the interpretation of the nested
or containing datasets.

Unnested datasets are also possible. For example:
One potential disadvantage to nesting a BIDS Derivative dataset inside a BIDS Raw dataset, or vice versa,
is that packaging them for independent sharing or publication can become more complicated.
It is also possible to completely avoid nesting of BIDS Raw datasets into BIDS Derivative datasets (or vice versa),
by simply placing them in separate folders, namely `sourcedata/` and `derivatives/` at root level:

```bash
my_study/
raw_data/
sub-01/
...
sourcedata/
raw/
sub-01/
...
derivatives/
preprocessed/
analysis/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following examples show three ways to organize:
#### 1. BIDS Raw with derivatives/
A collection of derivative datasets may be stored in the `derivatives/` subdirectory
@@ -199,31 +201,46 @@
analysis/
sub-01/
...
dataset_description.json

Disadvantage is that such organization would complicate distribution of the raw BIDS dataset
by itself as it would require explicit exclusion of datasets within its derivatives/ folder.

2. BIDS Derivative with sourcedata/raw

A BIDS Derivative dataset may contain references to its input datasets
(could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the sourcedata/ subdirectory:

my_analysis/
sourcedata/
raw/
      sub-01/
      ...
      dataset_description.json
preprocessed/
sub-01/
...
  dataset_description.json

Note that the sourcedata/ and derivatives/ subdirectories constitute dataset boundaries.
Any subfolders of these directories may be validated independently, if they are BIDS datasets
which would be indicated by presence of dataset_description.json file(s) in them with a
REQUIRED "BIDSVersion" key.
It is important to note that their contents must not affect the interpretation of the nested
or containing datasets.

One potential disadvantage to nesting a BIDS Derivative dataset inside a BIDS Raw dataset, or vice versa,
is that packaging them for independent sharing or publication can become more complicated.

3. BIDS Study with sourcedata/raw and derivatives/

It is also possible to completely avoid nesting of BIDS Raw datasets into BIDS Derivative datasets (or vice versa),
by simply placing them in separate folders, namely sourcedata/ and derivatives/ at root level:

my_study/
  sourcedata/
    raw/
      sub-01/
      ...
derivatives/
preprocessed/
analysis/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kabilar is that what you had in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants