nf-core · ochkalova · Apr 29, 2026 · May 12, 2026 · May 12, 2026 · May 12, 2026
diff --git a/README.md b/README.md
@@ -22,11 +22,12 @@
 ## Introduction
 
 **nf-core/seqsubmit** is a Nextflow pipeline for submitting sequence data to [ENA](https://www.ebi.ac.uk/ena/browser/home).
-Currently, the pipeline supports three submission modes, each routed to a dedicated workflow and requiring its own input samplesheet structure:
+Currently, the pipeline supports four submission modes, each routed to a dedicated workflow and requiring its own input samplesheet structure:
 
 - `mags` for Metagenome Assembled Genomes (MAGs) submission with `GENOMESUBMIT` workflow
 - `bins` for bins submission with `GENOMESUBMIT` workflow
 - `metagenomic_assemblies` for assembly submission with `ASSEMBLYSUBMIT` workflow
+- `reads` for raw sequencing reads submission with `READSUBMIT` workflow
 
 ![seqsubmit workflow diagram](assets/seqsubmit_schema.png)
 
@@ -123,6 +124,38 @@ assembly_2,data/contigs_2.fasta.gz,,,42.7,ERR011323,MEGAHIT,1.2.9
 > [!IMPORTANT]
 > **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.
 
+### `reads` mode (`READSUBMIT`)
+
+The input must follow `assets/schema_input_reads.json`.
+
+Required columns:
+
+- `sample`
+- `sample_accession`
+- `fastq_1`
+- `fastq_2`
+- `platform`
+- `instrument`
+- `library_source`
+- `library_selection`
+- `library_strategy`
+
+Optional columns:
+
+- `insert_size`
+- `library_name`
+- `description`
+
+Example `samplesheet_reads.csv`:
+
+```csv
+sample,sample_accession,fastq_1,fastq_2,platform,instrument,library_source,library_selection,library_strategy,insert_size,library_name,description
+illumina_run_001,SAMEA1234567,data/reads_R1.fastq.gz,data/reads_R2.fastq.gz,ILLUMINA,Illumina HiSeq 2000,GENOMIC,RANDOM,WGS,500,HiSeq_library_001,Illumina sequencing of sample XYZ
+```
+
+> [!IMPORTANT]
+> **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.
+
 ## Usage
 
 > [!NOTE]
@@ -142,7 +175,7 @@ The `mags`/`bins` workflow requires databases for completeness/contamination est
 
 | Parameter                                  | Description                                                                                                       |
 | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- |
-| `--mode`                                   | Type of the data to be submitted. Options: `[mags, bins, metagenomic_assemblies]`                                 |
+| `--mode`                                   | Type of the data to be submitted. Options: `[mags, bins, metagenomic_assemblies, reads]`                          |
 | `--input`                                  | Path to the samplesheet describing the data to be submitted                                                       |
 | `--outdir`                                 | Path to the output directory for pipeline results                                                                 |
 | `--submission_study` OR `--study_metadata` | ENA study accession (PRJ/ERP) to submit the data to OR metadata file in JSON/TSV/CSV format to register new study |
@@ -161,7 +194,7 @@ General command template:
 ```bash
 nextflow run nf-core/seqsubmit \
    -profile <docker/singularity/...> \
-   --mode <mags|bins|metagenomic_assemblies> \
+   --mode <mags|bins|metagenomic_assemblies|reads> \
    --input <samplesheet.csv> \
    --centre_name <your_centre> \
    --submission_study <your_study> \

diff --git a/assets/schema_input_reads.json b/assets/schema_input_reads.json
@@ -0,0 +1,127 @@
+{
+    "$schema": "https://json-schema.org/draft/2020-12/schema",
+    "$id": "https://raw.githubusercontent.com/nf-core/seqsubmit/main/assets/schema_input_reads.json",
+    "title": "nf-core/seqsubmit pipeline - params.input schema",
+    "description": "Schema for the sample sheet provided with params.input if params.mode is set to 'reads'",
+    "type": "array",
+    "items": {
+        "type": "object",
+        "properties": {
+            "sample": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Sample must be provided and cannot contain spaces",
+                "meta": ["id"],
+                "description": "Unique experiment/run name"
+            },
+            "sample_accession": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Sample accession must be provided and cannot contain spaces",
+                "description": "ENA sample accession of the sample used to generate the reads"
+            },
+            "fastq_1": {
+                "type": "string",
+                "format": "file-path",
+                "exists": true,
+                "pattern": "^\\S+\\.(fq|fastq)(\\.gz)?$",
+                "errorMessage": "FASTQ file must have extension '.fq' or '.fastq' (optionally gzipped)",
+                "description": "Forward reads FASTQ file (single-end or paired-end)"
+            },
+            "fastq_2": {
+                "anyOf": [
+                    {
+                        "type": "string",
+                        "format": "file-path",
+                        "exists": true,
+                        "pattern": "^\\S+\\.(fq|fastq)(\\.gz)?$"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "FASTQ file for reverse reads must have extension '.fq' or '.fastq' (optionally gzipped)",
+                "description": "Reverse reads FASTQ file if paired-end. Leave empty for single-end reads"
+            },
+            "platform": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Platform must be provided and cannot contain spaces",
+                "description": "Sequencing platform (e.g., ILLUMINA, PACBIO_SMRT, OXFORD_NANOPORE, ION_TORRENT)"
+            },
+            "instrument": {
+                "type": "string",
+                "pattern": "^[^\\n]+$",
+                "errorMessage": "Instrument must be provided and cannot span multiple lines",
+                "description": "Sequencer model (e.g., 'Illumina HiSeq 2000', 'PacBio Sequel')"
+            },
+            "library_source": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Library source must be provided and cannot contain spaces",
+                "description": "Library source (GENOMIC, METAGENOMIC, TRANSCRIPTOMIC, etc.)"
+            },
+            "library_selection": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Library selection must be provided and cannot contain spaces",
+                "description": "Library selection (RANDOM, PCR, cDNA, etc.)"
+            },
+            "library_strategy": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Library strategy must be provided and cannot contain spaces",
+                "description": "Library strategy (WGS, RNA-Seq, AMPLICON, etc.)"
+            },
+            "insert_size": {
+                "anyOf": [
+                    {
+                        "type": "number",
+                        "minimum": 0
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "Insert size must be a positive number or empty",
+                "description": "Fragment/insert size for paired-end reads (optional)"
+            },
+            "library_name": {
+                "anyOf": [
+                    {
+                        "type": "string"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "description": "Descriptive library name (optional)"
+            },
+            "description": {
+                "anyOf": [
+                    {
+                        "type": "string"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "description": "Free-text description of the experiment (optional)"
+            }
+        },
+        "required": [
+            "sample",
+            "sample_accession",
+            "fastq_1",
+            "platform",
+            "instrument",
+            "library_source",
+            "library_selection",
+            "library_strategy"
+        ]
+    }
+}
diff --git a/conf/modules.config b/conf/modules.config
@@ -176,7 +176,7 @@ process {
         ]
     }
 
-    withName: 'REGISTERSTUDY|GENERATE_ASSEMBLY_MANIFEST' {
+    withName: 'REGISTERSTUDY|GENERATE_ASSEMBLY_MANIFEST|CREATE_READS_MANIFEST' {
         publishDir = [
             enabled: false
         ]

diff --git a/conf/test_reads_paired.config b/conf/test_reads_paired.config
@@ -0,0 +1,34 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/seqsubmit -profile test_reads,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+process {
+    resourceLimits = [
+        cpus: 2,
+        memory: '8.GB',
+        time: '1.h'
+    ]
+}
+
+params {
+    config_profile_name        = 'Test --mode reads profile'
+    config_profile_description = 'Minimal test profile for reads submission'
+
+    // Input data
+    input  = "${projectDir}/assets/samplesheet_reads.csv"
+    outdir = 'test_output'
+
+    mode             = "reads"
+    submission_study = "PRJEB98843"
+    centre_name      = "TEST_CENTER"
+
+    test_upload      = true
+}
diff --git a/docs/output.md b/docs/output.md
@@ -8,7 +8,7 @@ The directories listed below will be created in the results directory (set with
 
 ## Pipeline overview
 
-The pipeline is built using [Nextflow](https://www.nextflow.io/) and performs automated submission of sequence data to ENA. Exact steps and generated outputs depend on the data type and `--mode` executed (`mags`, `bins` or `metagenomic_assemblies`).
+The pipeline is built using [Nextflow](https://www.nextflow.io/) and performs automated submission of sequence data to ENA. Exact steps and generated outputs depend on the data type and `--mode` executed (`mags`, `bins`, `metagenomic_assemblies` or `reads`).
 
 ## `mags` and `bins` outputs
 
@@ -59,6 +59,20 @@ Assembly study registration, manifest generation, and Webin-CLI submission are e
 > Users should read the ENA documentation on referencing submitted data: \
 > metagenomic assemblies: https://ena-docs.readthedocs.io/en/latest/submit/assembly/metagenome/primary.html#assigned-accession-numbers
 
+## `reads` outputs
+
+When `--mode reads` is used, results are written under `reads/`.
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `reads/`
+  - `upload/reads_accessions.tsv`: run accessions assigned to submitted reads.
+
+</details>
+
+Manifest generation and Webin-CLI submission are executed by the workflow, but their intermediate outputs are not currently published into `--outdir` by the pipeline.
+
 ## Common outputs
 
 ### MultiQC