Add READSUBMIT workflow#58
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the Synchronisation documentation. |
|
| --input samplesheet_reads.csv \ | ||
| --submission_study <your_study> \ | ||
| --webincli_mode submit \ | ||
| --test_upload true \ |
There was a problem hiding this comment.
This can be just a flag, it doesn't need to have a value (the presence should be enough --test_upload with no value)
There was a problem hiding this comment.
Yes, but here it's just for transparency
| @@ -0,0 +1,56 @@ | |||
| process CREATE_READS_MANIFEST { | |||
There was a problem hiding this comment.
Do we need a process for this? Can we just do have some nextflow code to generate this? I'll to give it a shot
There was a problem hiding this comment.
This is complicated. Technically yes, but if you want it to be published to results, it's better to generate inside the module. Also, I plan to use python script in this module to make it more robust
mberacochea
left a comment
There was a problem hiding this comment.
I know this is not ready, but left some notes
|
Alright, another way to think about this is to have the whole
samplesheet to belong to the same webin.
I'm thinking about MAGs for example, having loads of them and having to
run many seqsubmits will make it harder -- chaining pipelines becomes a
bit trickier if the study is an argument instead of a samplesheet element.
On 12/05/2026 12:01, Ekaterina Sakharova wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In workflows/readsubmit.nf
<#58 (comment)>:
> +include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline'
+include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
+include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_seqsubmit_pipeline'
+
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ RUN THE WORKFLOW
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*/
+
+workflow READSUBMIT {
+
+ take:
+ ch_samplesheet // channel: samplesheet read in from --input
+ submission_study // val: accession of the study to submit to (optional)
+ study_metadata // val: path to study metadata file for study creation (used if no submission_study provided)
I think we should not add it into samplesheet.
It is better to limit pipeline to be run for 1 study at a time.
Different studies have different owners (webin accounts). Then we will
need to ask for those in samplesheet as well (?) It seems very
complicated to me. That is why we have 1 study for everything
submitted in one run. If you need to submit to another study - run
seqsubmit again with different arguments.
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAISMOOC4XA27K2UTMNNDBL42MACDAVCNFSM6AAAAACYKRXLYKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DENZRGYYDONRQGY>.
You are receiving this because you commented.Message ID:
***@***.***>
--
Martin Beracochea
MGnify Production Project Leader
Microbiome Informatics
European Bioinformatics Institute (EMBL-EBI)
|
|
Ideally, you will need to check access for all provided studies then (do they belong to one account). Because if they are not - then pipeline will crash on the last step. We also have a step for study registration. I will not expect people having a study already registered. So, I expect study argument/column being empty for majority of submissions (if we talk about external users). |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
I’ve been working on making the submit_study.py script more robust and specifically focused on early header validation and structured logging updates. My goal was to check that if the pipeline fails early on than there is a clear error message if the input CSV/TSV is malformed, rather than crashing downstream. I've verified these changes locally using the mag_no_coverage_paired_reads.nf.test suite, and the test run completed successfully in ~79s on my WSL2 environment. I am still early in my bioinformatics journey, I would appreciate feedback on the Python syntax to make sure it aligns with nf-core's best practices, but the current implementation is functional and passes all local tests. I couldn't push directly to the branch, but you can see the changes here: https://github.com/riceroni18/seqsubmit/blob/dev/bin/submit_study.py |
|
Hi @riceroni18, thank you for your message! If you want a feedback from us, please, create a separate PR from your fork :) |
Resolves to #28
I was able to actually submit reads with this one, but it still requires some work:
Before merging this, it is required to merge this Add reads submission support to webin cli wrapper EBI-Metagenomics/mgnify-pipelines-toolkit#155 and update the toolkit.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).