feat: add WDL module specification and symbolic imports#765
feat: add WDL module specification and symbolic imports#765claymcleod wants to merge 23 commits into
Conversation
Introduces a peer specification at `modules/SPEC.md` covering the `module.json` manifest, `module-lock.json`, dependency resolution from Git tags, content hashing, Ed25519 signing, and credential management. Adds the `from` keyword and a new symbolic import form to `SPEC.md` with symmetric namespacing rules for tasks, workflows, structs, and enums. Remote URL imports are soft-deprecated. Ships JSON Schemas for both the manifest and lockfile. Closes #758. Closes #226.
a29acf3 to
b4f063a
Compare
A `module-lock.json` file is no longer required for every module. It is required for modules whose consumers need reproducible builds, but may be omitted by modules intended as libraries where version resolution is deliberately left to the consumer. The spec now also states that lockfiles apply only to the module they sit in: upstream lockfiles are not consulted during downstream resolution, so consumers remain in control of their transitive version choices. Clarifies engine responsibility for verifying cached module content against the lockfile and for keeping the lockfile consistent with the resolved tree. Drops the now-stale "Lockfile as a specification requirement" rationale bullet, and the "Duplicate dependencies over conflict resolution" bullet whose premise doesn't carry its weight.
24cedf4 to
ceb0fb2
Compare
|
@peterhuene and I have agreed to land a simpler model than the original eight-form proposal. Symbolic imports are interchangeable with quoted imports — the build system resolves a symbolic path to the equivalent of a quoted import. All three forms accept either source type with identical semantics. The three forms:
What changed from the prior eight-form model:
This model fits the existing WDL mental model (no new namespace concept to teach), is easier to implement, and parses with no lookahead beyond |
|
After much discussion, @peterhuene and I have agreed that this is the approach we're going to take.
If we decide later on that namespaces are needed, then the directives to encapsulate members within a namespace ( The rules for the simpler version we're going to try first are below.
Other notables:
|
0cf7f31 to
c5db66c
Compare
c5db66c to
e5de62e
Compare
f3aa495 to
a186061
Compare
77d89ea to
3a4d9e9
Compare
998e4cb to
89d639c
Compare
DavyCats
left a comment
There was a problem hiding this comment.
Nice! Thanks for writing this up!
| An import statement takes one of three forms. Every form accepts the same two source styles, and the source style affects only how the build system locates the imported document. | ||
|
|
||
| 1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used. | ||
| 2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced. |
There was a problem hiding this comment.
Personally, I'm not a huge fan of these kinds of imports. They obfuscate where the imported task/workflow is actually defined. I'm wondering how others feel about this.
There was a problem hiding this comment.
I used to feel this way when I was resolving them manually / by eye. But LSPs have mostly solved this issue for me, now I just click 'jump to definition' (also in e.g. python where they're also argued against for the same reason.)
There was a problem hiding this comment.
Yeah and it's a little tricky too because of the design decisions that were made prior in WDL (i.e., #160). It's actually a split situation today where import "foo.wdl" means:
- Import all of the user-defined types (structs and enums) into the current document's scope, effectively losing the tether to where they came from, and
- Make all of the tasks and workflows available at
foo.<task>/foo.<workflow>but don't export those as part of the scope.
This asymmetry poses real challenges to making any sort of module system usable under the current syntax.
Additionally, WDL, as it stands today, has no real concept of a formal namespace (there is something referred to as a namespace in the spec, but it's not allowed to be passed between document's using their scope).
Short of introducing namespaces more formally (which has a whole other set of language implications), we had to find a new syntax that made sense to actually pull the tasks and workflows into the document's scope alongside the UDTs while still feeling WDL-like. This was the best we could come up with, but I'm open to other ways to express this.
|
|
||
| 1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used. | ||
| 2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced. | ||
| 3. `import { <member> [as <Name>], ... } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced. |
There was a problem hiding this comment.
For consistency with how form 1 is explained, I guess ... should be *?
| 3. `import { <member> [as <Name>], ... } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced. | |
| 3. `import { (<member> [as <Name>],)* } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced. |
Or maybe + if at least one needs to be present.
Or is ... supposed to have a separate functionality, like "also import everything else but don't rename it".
There was a problem hiding this comment.
Yeah sounds good, I'll clarify this.
| version 1.4 | ||
|
|
||
| import "csvkit.wdl" # tasks via `csvkit` namespace; structs/enums in scope | ||
| import "csvkit.wdl" as csv # tasks via `csv` namespace; structs/enums in scope | ||
| import * from "csvkit.wdl" # tasks, workflows, structs, enums all in scope | ||
| import { CsvSort } from "csvkit.wdl" # only `CsvSort` in scope | ||
| import { CsvSort as MySort, CsvSortStable } from "csvkit.wdl" # `MySort` and `CsvSortStable` in scope | ||
| import openwdl/csvkit # tasks via `csvkit` namespace; structs/enums in scope | ||
| import { CsvSort } from openwdl/csvkit # only `CsvSort` in scope |
There was a problem hiding this comment.
This is missing as example of form 1's alias old as new syntax.
8743d5f to
26df2b6
Compare
Symbolic sub-paths now resolve directly to `<module-folder>/<sub-path>.wdl` rather than naming a nested module's `module.json`. Each dependency points to exactly one module folder; consumers reach additional modules in the same repository by declaring them as separate dependencies with distinct `path` values. The recursive scan that registered every nested manifest in a source is removed. The manifest's `index` field is renamed to `entrypoint`. A root module import resolves to the entrypoint file; a missing entrypoint surfaces an engine-specific error at import time. A new `exclude` field accepts gitignore-style globs identifying files unreachable via symbolic import. `exclude` governs the public import surface only and does not affect content hashing, signing, or quoted within-module imports. Sub-path components remain WDL identifiers, so `..` is rejected at parse time. The spec now states this as a normative no-escape guarantee that engines may rely on when performing sparse Git checkouts of dependency module folders.
26df2b6 to
fb174f5
Compare
Prepend the spec's domain-separation prefix `wdl-module-content\0v1\0` to the SHA-256 input in `Hasher::finalize` per openwdl/wdl#765 commit 505e954, and rewrite `detects_path_content_boundary_collision` to exercise the actual collision the length prefixes prevent (a file name absorbing what would otherwise be its content's bytes), rather than asserting determinism on two trivially-different file lists.
This introduces a module system to WDL in two parts: a new peer specification at
modules/SPEC.mdcovering the ecosystem (manifest, resolution, lockfile, hashing, signing, credentials, registry), and language-level additions inSPEC.md(thefromkeyword and two new import forms).The module specification defines:
module.json, the manifest format, including core fields, upstreamtoolsprovenance, and adependenciesobject supportingversion,tag,branch,commit, and localpathselectors.module-lock.json, a lockfile that pins the fully resolved tree with commit SHAs and content checksums.<dep-name>[/<sub-path>]) map to modules, including tag-to-manifest version consistency and Go-style path-prefixed tags for multi-module repositories.module.sig, with trust-on-first-use recorded in the lockfile.openwdl.github.io/registrythat aggregates metadata without becoming a resolution dependency.The language changes in
SPEC.md:fromas a keyword in WDL 1.4 documents. Earlierversiondeclarations continue to parsefromas an ordinary identifier.import <source> [as <alias>] (alias <Old> as <New>)*, the existing namespaced form.import * from <source>, which brings every task, workflow, and user-defined type from<source>into the importing document's scope, with no namespace.import { <member> [as <Name>], ... } from <source>, which brings only the listed items, with an optional per-memberas <Name>rename, also with no namespace.<source>is either a quoted URI (resolved per Import URIs) or an unquoted symbolic module path of the form<dep>[/<sub-path>](resolved through the consuming module'smodule.json). Once resolved, the two styles produce identical scoping in every form.JSON Schemas for both
module.jsonandmodule-lock.jsonship undermodules/schemas/and validate the in-spec examples.Full context: RFC discussion #700.
Before submitting this PR, please make sure:
README.mdor other documentation needs updating to account for these changes.CHANGELOG.mddescribing the change and linking back to your pull request.CONTRIBUTING.mddocument.For OpenWDL team members:
Closes #226.
Closes #758.