Skip to content

Add seqera:// NIO filesystem for Seqera Platform datasets#6946

Draft
jorgee wants to merge 6 commits intomasterfrom
260310-seqera-dataset-fs
Draft

Add seqera:// NIO filesystem for Seqera Platform datasets#6946
jorgee wants to merge 6 commits intomasterfrom
260310-seqera-dataset-fs

Conversation

@jorgee
Copy link
Contributor

@jorgee jorgee commented Mar 19, 2026

Summary

  • Implements a seqera:// NIO FileSystemProvider in nf-tower, enabling Nextflow pipelines to reference Seqera Platform datasets as standard file paths (e.g. seqera://org/workspace/datasets/name)
  • Path hierarchy: root → org → workspace → resource type → dataset file (with optional @version pinning)
  • All HTTP traffic delegates to the existing TowerClient / HxClient so token refresh is shared
  • Registers via META-INF/services/java.nio.file.spi.FileSystemProvider

New files

Package Purpose
dataset/DatasetDto, DatasetVersionDto, WorkspaceOrgDto API response DTOs
dataset/SeqeraDatasetClient API calls: list orgs/workspaces/datasets/versions, download
fs/SeqeraPath Path implementation with 0–4 depth hierarchy
fs/SeqeraFileSystem FileSystem with lazy org/workspace/dataset caches
fs/SeqeraFileSystemProvider FileSystemProvider SPI: read, list, attributes, copy
fs/SeqeraFileAttributes BasicFileAttributes backed by dataset metadata
fs/SeqeraPathFactory Nextflow PathFactory integration
fs/ResourceTypeHandler, fs/DatasetsResourceHandler Extensibility interface for future resource types

Changes to existing files

  • TowerClient: added public sendApiRequest() + GET support in makeRequest(); moved initHttpClient() out of constructor to TowerFactory
  • TowerFactory: client() now also activates when accessToken is present (not only when enabled = true), so seqera:// paths work without explicitly setting tower.enabled
  • TowerPlugin: registers SeqeraPathFactory
  • FileHelper: recognises seqera:// scheme

Test plan

  • SeqeraPathTest — path parsing, URI round-trips, relativize/resolve, getFileName, asUri
  • SeqeraFileSystemTest — cache loading, workspace/dataset resolution, thread safety
  • SeqeraFileSystemProviderTest — newInputStream (latest + pinned version), readAttributes, newDirectoryStream, error propagation
  • SeqeraDatasetClientTest — API URL construction, response mapping, error handling

Run with:

./gradlew :plugins:nf-tower:test

🤖 Generated with Claude Code

jorgee added 4 commits March 19, 2026 11:42
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@netlify
Copy link

netlify bot commented Mar 19, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 38420e1
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69bd3ebf4debe70008802e47
😎 Deploy Preview https://deploy-preview-6946--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@jorgee jorgee marked this pull request as draft March 19, 2026 14:09
@jorgee
Copy link
Contributor Author

jorgee commented Mar 19, 2026

Some comments about current implementation:

  • Some refactoring is needed to decouple the Tower client (API calls) from the Observer. A client initialization must be done at filesystem initialization and another at observer. Due to token refresh, HxClient must be shared to avoid authentication problems when using different clients. I would like to make it after merging Add platform-related metadata in Lineage records #6545
  • The Dataset API does not allow streaming the content, so read and write are done through temporary files.
  • Only csv and tsv extensions are allowed; the format is recognized by the extension.
  • Due to the above comments, I am considering making it read-only
  • Every change in the dataset creates a new version seqera://org/workspace/datasets/name accesses the latest version and seqera://org/workspace/datasets/name@version

jorgee added 2 commits March 20, 2026 11:39
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants