Clarify BIDS dataset organization and distribution considerations #688

yarikoptic · 2025-08-07T18:22:25Z

Add note about BIDS Raw datasets being distributable without derivatives
Include dataset_description.json in directory structure examples
to emphasize where we observe legit BIDS datasets
Explain disadvantages of nested dataset organization for distribution
Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets
Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories
Adjust example of non-nested dataset organization in my_study folder

This PR is on top of the

Remove example of rawdata/ at the top-level of a BIDS dataset #687

by @kabilar

Potential TODOs but may be after in a separate PR:

Add "study" DatasetType (should be added in 1.11.0, or will it be @1.10.1 @effigies ?)

kabilar

Hi @yarikoptic, thanks for putting these changes together as it clarifies several pieces for me.

cc @satra

docs/getting_started/folders_and_files/derivatives.md

kabilar · 2025-08-07T20:16:52Z

The following examples show three ways to organize, relative to each other,
a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs.

We may also need to change the above statement to more accurately reflect the examples provided. Perhaps to the following:

The following examples show three ways to organize datasets:

A BIDS Raw dataset with derivative preprocessed and analysis data.

A BIDS Derivative dataset which uses both raw and preprocessed data as inputs.

A BIDS dataset with raw and derivative data.

docs/getting_started/folders_and_files/derivatives.md

- Add note about BIDS Raw datasets being distributable without derivatives - Include dataset_description.json in directory structure examples to emphasize where we observe legit BIDS datasets - Explain disadvantages of nested dataset organization for distribution - Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets - Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories - Re-Include example of non-nested dataset organization in my_study folder (I based this change on top of the removal proposal in bids-standard#687)

Co-authored-by: Kabilar Gunalan <[email protected]>

docs/getting_started/folders_and_files/derivatives.md

effigies

WDYT?

docs/getting_started/folders_and_files/derivatives.md

julia-pfarr

I don't know, I don't want to initiate a new discussion (or this discussion again...) but I kind of don't understand why we need to serve all these different cases of where to store what?

Why can't we just say

root/sourcedata --> out of scanner data
root/raw --> raw data in BIDS
root/derivatives --> anything starting with preprocessing, in BIDS
If you have only one of those three, adding a second level folder is not necessary.

Can someone please explain to me why it is not possible to do this? I'm clearly missing something and I would like to understand...sorry!

docs/getting_started/folders_and_files/derivatives.md

yarikoptic · 2025-10-02T19:54:58Z

@julia-pfarr a quick answer on "Why can't we just say" is that because it is not what the standard is permitting/expecting ATM.

This repository is for the website which just provides a little more of "expanded for a user re-digestion of the specification". If we start saying here something not allowed by the specification, we would do disservice IMHO.

Any decisions to change things up should be discussed against specification first.

@effigies

TODO: add generic phrasing based on what @effigies suggested Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Julia-Katharina Pfarr <[email protected]>

docs/getting_started/folders_and_files/derivatives.md

julia-pfarr · 2025-10-02T20:20:23Z

@julia-pfarr a quick answer on "Why can't we just say" is that because it is not what the standard is permitting/expecting ATM.

This repository is for the website which just provides a little more of "expanded for a user re-digestion of the specification". If we start saying here something not allowed by the specification, we would do disservice IMHO.

Any decisions to change things up should be discussed against specification first.

Ok, I see, that makes sense, thanks! So I understand from your answer that my suggestion is technically possible and not totally unreasonable, this is just not the right place for it.

docs/getting_started/folders_and_files/derivatives.md

kabilar · 2025-10-09T19:47:29Z

Hi @yarikoptic, the conversation above was resolved, but I don't think the following update to the text has been made.

Original:

The following examples show three ways to organize, relative to each other,
a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs.

Suggestion:

The following examples show three ways to organize datasets:
## 1. BIDS Raw with derivatives/
## 2. BIDS Derivative with sourcedata/raw
## 3. BIDS Study with sourcedata/raw and derivatives/

Co-authored-by: Kabilar Gunalan <[email protected]>

julia-pfarr · 2025-10-15T15:40:00Z

docs/getting_started/folders_and_files/derivatives.md

 The following examples show three ways to organize, relative to each other,
 a raw BIDS dataset, a preprocessed derivative dataset, and an analysis that uses both as inputs.

 A collection of derivative datasets may be stored in the `derivatives/` subdirectory
 of a BIDS (or BIDS Derivatives) dataset:

 ```bash
 my_dataset/
  derivatives/
    preprocessed/
    analysis/
  sub-01/
  ...
+  dataset_description.json
 ```

-A BIDS Derivatives dataset may contain references to its input datasets
-in the `sourcedata/` subdirectory:
+Disadvantage is that such organization would complicate distribution of the raw BIDS dataset
+by itself as it would require explicit exclusion of datasets within its `derivatives/` folder.
+
+A BIDS Derivative dataset may contain references to its input datasets
+(could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the `sourcedata/` subdirectory:

 ```bash
 my_analysis/
  sourcedata/
    raw/
+      sub-01/
+      ...
+      dataset_description.json
    preprocessed/
  sub-01/
  ...
+  dataset_description.json
 ```

 Note that the `sourcedata/` and `derivatives/` subdirectories constitute dataset boundaries.
-Any contents of these directories may be validated independently,
-but their contents must not affect the interpretation of the nested or containing datasets.
+Any subfolders of these directories may be validated independently, if they are BIDS datasets
+which would be indicated by presence of `dataset_description.json` file(s) in them with a
+REQUIRED `"BIDSVersion"` key.
+It is important to note that their contents must not affect the interpretation of the nested
+or containing datasets.

-Unnested datasets are also possible. For example:
+One potential disadvantage to nesting a BIDS Derivative dataset inside a BIDS Raw dataset, or vice versa,
+is that packaging them for independent sharing or publication can become more complicated.
+It is also possible to completely avoid nesting of BIDS Raw datasets into BIDS Derivative datasets (or vice versa),
+by simply placing them in separate folders, namely `sourcedata/` and `derivatives/` at root level:

 ```bash
 my_study/
-  raw_data/
-    sub-01/
-    ...
+  sourcedata/
+    raw/
+      sub-01/
+      ...
  derivatives/
    preprocessed/
    analysis/


Suggested change

The following examples show three ways to organize:

#### 1. BIDS Raw with derivatives/

A collection of derivative datasets may be stored in the `derivatives/` subdirectory

@@ -199,31 +201,46 @@

analysis/

sub-01/

...

dataset_description.json

Disadvantage is that such organization would complicate distribution of the raw BIDS dataset
by itself as it would require explicit exclusion of datasets within its derivatives/ folder.

2. BIDS Derivative with sourcedata/raw

A BIDS Derivative dataset may contain references to its input datasets
(could be BIDS Raw, non-BIDS or even other BIDS Derivatives) in the sourcedata/ subdirectory:

my_analysis/ sourcedata/ raw/ sub-01/ ... dataset_description.json preprocessed/ sub-01/ ... dataset_description.json

Note that the sourcedata/ and derivatives/ subdirectories constitute dataset boundaries.
Any subfolders of these directories may be validated independently, if they are BIDS datasets
which would be indicated by presence of dataset_description.json file(s) in them with a
REQUIRED "BIDSVersion" key.
It is important to note that their contents must not affect the interpretation of the nested
or containing datasets.

One potential disadvantage to nesting a BIDS Derivative dataset inside a BIDS Raw dataset, or vice versa,
is that packaging them for independent sharing or publication can become more complicated.

3. BIDS Study with sourcedata/raw and derivatives/

It is also possible to completely avoid nesting of BIDS Raw datasets into BIDS Derivative datasets (or vice versa),
by simply placing them in separate folders, namely sourcedata/ and derivatives/ at root level:

my_study/ sourcedata/ raw/ sub-01/ ... derivatives/ preprocessed/ analysis/

@kabilar is that what you had in mind?

yarikoptic mentioned this pull request Aug 7, 2025

Remove example of rawdata/ at the top-level of a BIDS dataset #687

Closed

kabilar reviewed Aug 7, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Outdated Show resolved Hide resolved

kabilar reviewed Sep 8, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Show resolved Hide resolved

yarikoptic requested review from effigies and julia-pfarr October 2, 2025 17:11

kabilar and others added 3 commits October 2, 2025 13:24

Remove example of rawdata/ at the top level of a BIDS datasets

775dcb5

clarify about nesting

32e5edb

Co-authored-by: Kabilar Gunalan <[email protected]>

effigies force-pushed the rawdata-example2 branch from 852dcb2 to 32e5edb Compare October 2, 2025 17:24

effigies closed this Oct 2, 2025

effigies reopened this Oct 2, 2025

effigies reviewed Oct 2, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Show resolved Hide resolved

effigies reviewed Oct 2, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Show resolved Hide resolved

docs/getting_started/folders_and_files/derivatives.md Outdated Show resolved Hide resolved

docs/getting_started/folders_and_files/derivatives.md Outdated Show resolved Hide resolved

julia-pfarr requested changes Oct 2, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Outdated Show resolved Hide resolved

Tune up sentences about nesting

1be101e

TODO: add generic phrasing based on what @effigies suggested Co-authored-by: Chris Markiewicz <[email protected]> Co-authored-by: Julia-Katharina Pfarr <[email protected]>

yarikoptic commented Oct 2, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Outdated Show resolved Hide resolved

julia-pfarr approved these changes Oct 2, 2025

View reviewed changes

julia-pfarr mentioned this pull request Oct 8, 2025

[WIP] Suggested modifications to directory layout of the bids-study DatasetType bids-standard/bids-specification#2191

Draft

kabilar reviewed Oct 9, 2025

View reviewed changes

docs/getting_started/folders_and_files/derivatives.md Show resolved Hide resolved

Apply suggestions from code review

7acdf96

Co-authored-by: Kabilar Gunalan <[email protected]>

julia-pfarr reviewed Oct 15, 2025

View reviewed changes

Uh oh!

Clarify BIDS dataset organization and distribution considerations #688

Are you sure you want to change the base?

Clarify BIDS dataset organization and distribution considerations #688

Uh oh!

Conversation

yarikoptic commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kabilar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kabilar commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julia-pfarr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yarikoptic commented Oct 2, 2025

Uh oh!

Uh oh!

julia-pfarr commented Oct 2, 2025

Uh oh!

Uh oh!

kabilar commented Oct 9, 2025

Uh oh!

julia-pfarr Oct 15, 2025

Choose a reason for hiding this comment

2. BIDS Derivative with sourcedata/raw

3. BIDS Study with sourcedata/raw and derivatives/

Uh oh!

julia-pfarr Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yarikoptic commented Aug 7, 2025 •

edited

Loading

kabilar left a comment •

edited

Loading

kabilar commented Aug 7, 2025 •

edited

Loading