Skip to content

feat: add WDL module specification and symbolic imports#765

Open
claymcleod wants to merge 23 commits into
wdl-1.4from
feat/modules
Open

feat: add WDL module specification and symbolic imports#765
claymcleod wants to merge 23 commits into
wdl-1.4from
feat/modules

Conversation

@claymcleod
Copy link
Copy Markdown
Collaborator

@claymcleod claymcleod commented Apr 23, 2026

This introduces a module system to WDL in two parts: a new peer specification at modules/SPEC.md covering the ecosystem (manifest, resolution, lockfile, hashing, signing, credentials, registry), and language-level additions in SPEC.md (the from keyword and two new import forms).

The module specification defines:

  • module.json, the manifest format, including core fields, upstream tools provenance, and a dependencies object supporting version, tag, branch, commit, and local path selectors.
  • module-lock.json, a lockfile that pins the fully resolved tree with commit SHAs and content checksums.
  • A resolution algorithm describing how symbolic paths (<dep-name>[/<sub-path>]) map to modules, including tag-to-manifest version consistency and Go-style path-prefixed tags for multi-module repositories.
  • A content hashing scheme: a deterministic SHA-256 algorithm that serves both lockfile integrity and signature verification.
  • Module signing via optional Ed25519 signatures in module.sig, with trust-on-first-use recorded in the lockfile.
  • Credential management that delegates to Git credential helpers so private repositories work without a new credential store.
  • A registry: a community index at openwdl.github.io/registry that aggregates metadata without becoming a resolution dependency.

The language changes in SPEC.md:

  • Reserve from as a keyword in WDL 1.4 documents. Earlier version declarations continue to parse from as an ordinary identifier.
  • Add two new import forms alongside the existing one, for three forms total:
    1. import <source> [as <alias>] (alias <Old> as <New>)*, the existing namespaced form.
    2. import * from <source>, which brings every task, workflow, and user-defined type from <source> into the importing document's scope, with no namespace.
    3. import { <member> [as <Name>], ... } from <source>, which brings only the listed items, with an optional per-member as <Name> rename, also with no namespace.
  • Unify the two source styles: a <source> is either a quoted URI (resolved per Import URIs) or an unquoted symbolic module path of the form <dep>[/<sub-path>] (resolved through the consuming module's module.json). Once resolved, the two styles produce identical scoping in every form.

JSON Schemas for both module.json and module-lock.json ship under modules/schemas/ and validate the in-spec examples.

Full context: RFC discussion #700.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have considered whether the README.md or other documentation needs updating to account for these changes.
  • You have updated the CHANGELOG.md describing the change and linking back to your pull request.
  • You have read and agree to the CONTRIBUTING.md document.
  • You have considered adding or updating relevant example WDL tests to the specification.
    • See the guide for more details.

For OpenWDL team members:

  • Assign the appropriate individual to this PR.
  • Triage the PR and add appropriate labels.

Closes #226.
Closes #758.

@claymcleod claymcleod changed the base branch from wdl-1.3 to wdl-1.4 April 23, 2026 21:57
github-actions Bot and others added 2 commits April 23, 2026 17:02
Introduces a peer specification at `modules/SPEC.md` covering the
`module.json` manifest, `module-lock.json`, dependency resolution from
Git tags, content hashing, Ed25519 signing, and credential management.
Adds the `from` keyword and a new symbolic import form to `SPEC.md`
with symmetric namespacing rules for tasks, workflows, structs, and
enums. Remote URL imports are soft-deprecated. Ships JSON Schemas for
both the manifest and lockfile.

Closes #758. Closes #226.
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md
A `module-lock.json` file is no longer required for every module. It is
required for modules whose consumers need reproducible builds, but may
be omitted by modules intended as libraries where version resolution is
deliberately left to the consumer. The spec now also states that
lockfiles apply only to the module they sit in: upstream lockfiles are
not consulted during downstream resolution, so consumers remain in
control of their transitive version choices.

Clarifies engine responsibility for verifying cached module content
against the lockfile and for keeping the lockfile consistent with the
resolved tree. Drops the now-stale "Lockfile as a specification
requirement" rationale bullet, and the "Duplicate dependencies over
conflict resolution" bullet whose premise doesn't carry its weight.
@claymcleod claymcleod changed the title Add WDL module specification and symbolic imports feat: add WDL module specification and symbolic imports Apr 23, 2026
@claymcleod claymcleod self-assigned this Apr 23, 2026
@claymcleod claymcleod requested a review from a team April 23, 2026 22:44
@claymcleod claymcleod added Z-specification-change (Metadata) An issue or PR related to a specification change. S05-in-progress (State) A task that is in progress. K-feature (Kind) A new feature request (for issues) or implementation (for PRs).. T-lang (Topic) Issues related to the syntax and semantic of the language itself. labels Apr 23, 2026
@claymcleod claymcleod marked this pull request as ready for review April 23, 2026 22:44
@claymcleod
Copy link
Copy Markdown
Collaborator Author

claymcleod commented Apr 24, 2026

@peterhuene and I have agreed to land a simpler model than the original eight-form proposal. Symbolic imports are interchangeable with quoted imports — the build system resolves a symbolic path to the equivalent of a quoted import. All three forms accept either source type with identical semantics.

The three forms:

  1. import <source> [as <alias>] (alias <Old> as <New>)* — the existing import. UDTs (structs, enums) enter the importing document's scope; tasks and workflows are accessible through a pseudo-namespace but are not exported. as <alias> renames the pseudo-namespace; alias <Old> as <New> renames a UDT. alias cannot rename tasks or workflows.
  2. import * from <source> — every task, workflow, and UDT enters scope.
  3. import { <member> [as <Name>], ... } from <source> — only the listed items enter scope.

<source> is either a quoted URI or a symbolic module path. Forms 2 and 3 don't accept a trailing as <alias> or alias clause.

What changed from the prior eight-form model:

  • No exported namespaces. The pseudo-namespace in form 1 still exists for accessing tasks and workflows, but it isn't part of the importing document's exported scope.
  • from works with quoted URIs too. import { foo } from "bar.wdl" parses, since symbolic and quoted sources have identical semantics.
  • Two imports both bringing baz into scope must be renamed explicitly:
    import "foo.wdl" alias baz as foo_baz
    import { baz as bar_baz } from "bar.wdl"

This model fits the existing WDL mental model (no new namespace concept to teach), is easier to implement, and parses with no lookahead beyond peek. We'll try implementing some packages in this mode and see whether it feels constraining; if it does, we'll come back and formalize exported namespaces.

@claymcleod
Copy link
Copy Markdown
Collaborator Author

claymcleod commented Apr 27, 2026

After much discussion, @peterhuene and I have agreed that this is the approach we're going to take.

  • We're going to implement a simpler version of modules that is closer to what WDL is today (see the ruleset below) in Sprocket and see how it goes. This should be easier for WDL users to understand (follows the existing rules without introducing concepts like namespaces) but may introduce strain on more complicated packages.
  • We'll try implementing some of our packages in this mode and see if it feels constraining. We'll probably also discuss with some other future package maintainers to see what their thought is. If necessary, we'll extend this PR to include formalizing the concept of namespaces (which exist today in WDL but only within a document—i.e., namespaces are not exported as part of the document's scope).

If we decide later on that namespaces are needed, then the directives to encapsulate members within a namespace (as/within on forms 2 and 3) will be introduced, and namespaces will be more fully built out in the proposal.


The rules for the simpler version we're going to try first are below.

  1. import "foo.wdl" follows the behavior that exists today.
    a. User-defined types (UDTs) like structs and enums are brought into the importing document's scope.
    b. Tasks and workflows are accessible in the document via a pseudo-namespace but that namespace is not actually brought in to the document's scope/exported.
  2. import * from "foo.wdl" imports all tasks, workflows, and UDTs into the document's scope (no namespace) and makes them exportable.
  3. import { bar, baz } from "foo.wdl" only imports the items bar and baz into the document's scope and also makes them exportable.

Other notables:

  • You can use as in the first form to rename the pseudo-namespace that items are available under (exists today).
  • You can use alias in the first form to rename members that get imported into the top-level document scope (the UDTs). Alias cannot be used to rename tasks or workflows (exists today).
  • You can use import { bar as baz } from "foo.wdl" to rename individual items that are pulled in using the new syntax. The is semantically equivalent to the as in the first form, as it renames the thing that immediately precedes the keyword (even though the subject is a pseudo-namespace in form 1 and an item in form 3).
  • This means that, if there is a conflict between two modules foo and bar that both import an item baz, a package maintainer's only choice is to alias imports in form 1/use as in form 3 to rename the items, e.g.
    • import "foo.wdl" alias baz as foo_baz, or
    • `import { baz as bar_baz } from "bar.wdl"
  • Symbolic imports are largely going to be interchangable with the existing quoted imports, with the symbolic part simply resolving to the equivalent of a quoted import through the build system. So all forms accept both quoted and symbolic imports.

@claymcleod claymcleod force-pushed the feat/modules branch 2 times, most recently from 0cf7f31 to c5db66c Compare April 27, 2026 18:29
Copy link
Copy Markdown
Contributor

@DavyCats DavyCats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for writing this up!

Comment thread SPEC.md
An import statement takes one of three forms. Every form accepts the same two source styles, and the source style affects only how the build system locates the imported document.

1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used.
2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm not a huge fan of these kinds of imports. They obfuscate where the imported task/workflow is actually defined. I'm wondering how others feel about this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to feel this way when I was resolving them manually / by eye. But LSPs have mostly solved this issue for me, now I just click 'jump to definition' (also in e.g. python where they're also argued against for the same reason.)

Copy link
Copy Markdown
Collaborator Author

@claymcleod claymcleod Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and it's a little tricky too because of the design decisions that were made prior in WDL (i.e., #160). It's actually a split situation today where import "foo.wdl" means:

  • Import all of the user-defined types (structs and enums) into the current document's scope, effectively losing the tether to where they came from, and
  • Make all of the tasks and workflows available at foo.<task>/foo.<workflow> but don't export those as part of the scope.

This asymmetry poses real challenges to making any sort of module system usable under the current syntax.

Additionally, WDL, as it stands today, has no real concept of a formal namespace (there is something referred to as a namespace in the spec, but it's not allowed to be passed between document's using their scope).

Short of introducing namespaces more formally (which has a whole other set of language implications), we had to find a new syntax that made sense to actually pull the tasks and workflows into the document's scope alongside the UDTs while still feeling WDL-like. This was the best we could come up with, but I'm open to other ways to express this.

Comment thread SPEC.md

1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used.
2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced.
3. `import { <member> [as <Name>], ... } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with how form 1 is explained, I guess ... should be *?

Suggested change
3. `import { <member> [as <Name>], ... } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced.
3. `import { (<member> [as <Name>],)* } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced.

Or maybe + if at least one needs to be present.

Or is ... supposed to have a separate functionality, like "also import everything else but don't rename it".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good, I'll clarify this.

Comment thread SPEC.md
Comment on lines +3945 to +3953
version 1.4

import "csvkit.wdl" # tasks via `csvkit` namespace; structs/enums in scope
import "csvkit.wdl" as csv # tasks via `csv` namespace; structs/enums in scope
import * from "csvkit.wdl" # tasks, workflows, structs, enums all in scope
import { CsvSort } from "csvkit.wdl" # only `CsvSort` in scope
import { CsvSort as MySort, CsvSortStable } from "csvkit.wdl" # `MySort` and `CsvSortStable` in scope
import openwdl/csvkit # tasks via `csvkit` namespace; structs/enums in scope
import { CsvSort } from openwdl/csvkit # only `CsvSort` in scope
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing as example of form 1's alias old as new syntax.

Symbolic sub-paths now resolve directly to `<module-folder>/<sub-path>.wdl` rather than naming a nested module's `module.json`. Each dependency points to exactly one module folder; consumers reach additional modules in the same repository by declaring them as separate dependencies with distinct `path` values. The recursive scan that registered every nested manifest in a source is removed.

The manifest's `index` field is renamed to `entrypoint`. A root module import resolves to the entrypoint file; a missing entrypoint surfaces an engine-specific error at import time.

A new `exclude` field accepts gitignore-style globs identifying files unreachable via symbolic import. `exclude` governs the public import surface only and does not affect content hashing, signing, or quoted within-module imports.

Sub-path components remain WDL identifiers, so `..` is rejected at parse time. The spec now states this as a normative no-escape guarantee that engines may rely on when performing sparse Git checkouts of dependency module folders.
claymcleod added a commit to stjude-rust-labs/sprocket that referenced this pull request May 1, 2026
Prepend the spec's domain-separation prefix `wdl-module-content\0v1\0`
to the SHA-256 input in `Hasher::finalize` per openwdl/wdl#765 commit
505e954, and rewrite `detects_path_content_boundary_collision` to
exercise the actual collision the length prefixes prevent (a file name
absorbing what would otherwise be its content's bytes), rather than
asserting determinism on two trivially-different file lists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

K-feature (Kind) A new feature request (for issues) or implementation (for PRs).. S05-in-progress (State) A task that is in progress. T-lang (Topic) Issues related to the syntax and semantic of the language itself. Z-specification-change (Metadata) An issue or PR related to a specification change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement package specification and ecosystem Versioned import statements

3 participants