Skip to content

std/parseopt: fixes initOptParser fallback logic, moves NimScript logic to os.cmdline#25571

Open
ZoomRmc wants to merge 3 commits intonim-lang:develfrom
ZoomRmc:cmdlinefixes
Open

std/parseopt: fixes initOptParser fallback logic, moves NimScript logic to os.cmdline#25571
ZoomRmc wants to merge 3 commits intonim-lang:develfrom
ZoomRmc:cmdlinefixes

Conversation

@ZoomRmc
Copy link
Contributor

@ZoomRmc ZoomRmc commented Mar 2, 2026

Fixes #17018
Closes #23554
Fixes #25557

Both initOptParser overloads currently fall back to commandLineParams() when given empty input. As already pointed out in #17018, branching semantics on a valid value (len == 0) is a bad API decision.

The most common failure mode is subcommand delegation as described in #25557.

The same issue affects calls like:

commandLineParams().filterIt(...).initOptParser()

If the filter produces an empty sequence, the parser unexpectedly falls back to the full OS argv.

Separately, in NimScript, commandLineParams() returns the compiler invocation flags, not just script arguments. std/parseopt contained CT-defined logic to slice those out, but std/cmdline itself did not. This split responsibility was inconsistent.

Changes:

  1. Moved the NimScript stripping logic from std/parseopt into std/cmdline.commandLineParams() where it belongs. Closes In NimScript mode, make commandLineParams() more useful #23554.

  2. Removed the cmdline.len == 0 fallback from the core (private) parser implementation.

  3. Set the default value for initOptParser and changed for getopt:

    proc initOptParser*(cmdline: seq[string] = commandLineParams(); ...)
    iterator getopt*(cmdline: seq[string] = commandLineParams(); ...) # was `= @[]`
  4. Removed the cmdline = "" default from the string overload so the "omit" case resolves to the now-main seq-taking overload.

  5. Removed the assertion raise from initOptParser. Because the new default commandLineParams() is evaluated at the call site, calling initOptParser() in a dynamic library now should trigger the {.error.} pragma in cmdline at compile-time, rather than crashing with a raiseAssert at runtime.

The parser now strictly respects the input given. Implicit fetching OS arguments reduced to no-parameters from previous two cases (omitted/empty).

Breakage

This breaks the code that used explicit empty values to trigger OS fallback. Well, this is a silly approach and, hopefully, no one relied on it.

However:

initOptParser() (cmdline omitted) continues to fetch OS arguments via the default commandLineParams() value, so the idiomatic usage remains unaffected. The routine this call resolves to changed, though, now it's the seq-taking initOptParser which is really the first-class one anyway, as the string overload splits cmdline to seq[string] with parseCmdLine under the hood (btw, this proc itself recommends using parseopt instead which is a bit circular).

Another bug/quirk fixed/worth mentioning

Under the old implementation, a whitespace-only string such as initOptParser(" ") was first routed through parseCmdLine, which consumes whitespace as delimiters and produces @[], this would then trigger the fallback and silently re-parse the real OS command line.

In other words, a non-empty string argument could unexpectedly produce a parser over the process argv. With the fallback removed, this improves naturally and produces a parser over an empty seq (parseCmdLine(" ") == @[]).

However, this still diverges with initOptParser(@[" "]). Here the parser receives exactly one whitespace argument. No tokenization step collapses it to empty. So, although the sentinel fallback is gone, the two overloads are not fully equivalent for inputs that feel empty. The string overload has a wider "empty surface", so to speak.

Examples

# old behavour
var p1 = initOptParser()    # Fetches OS arguments
var p2 = initOptParser("")  # Fetches OS arguments - ambiguity
var p3 = initOptParser(" ") # Fetches OS arguments - buggy
var p4 = initOptParser(@[]) # Fetches OS arguments - ambiguity

# new behavour
var p1 = initOptParser()    # Fetches OS arguments. No breakage for idiomatic usage
var p2 = initOptParser("")  # Parses 0 arguments, allows sub-parsing
var p3 = initOptParser(" ") # Parses 0 arguments, resolves to call below
var p4 = initOptParser(@[]) # Parses 0 arguments, allows sub-parsing

For NimsScript:

# nim -f test.nims foo
let c = commandLineParams()

# old behaviour
assert c == @["-f", "test.nims", "foo"]
# new behaviour
assert c ==  @["foo"]

Overall, this removes the sentinel-based branching, restores predictable semantics, and fixes the NimScript layering issue at the same time. I say, wins all around.

ZoomRmc added 3 commits March 2, 2026 23:01
- `initOptParser` and `getopt` no longer fall back to
  `commandLineParams()` when given empty input (`""` or `@[]`).
  An explicit empty string or sequence now produces an empty parser.

- `std/cmdline.commandLineParams()` for NimScript now consistently
  returns only the script arguments, skipping Nim arguments.
@ZoomRmc
Copy link
Contributor Author

ZoomRmc commented Mar 2, 2026

This breaks the code that used explicit empty values to trigger OS fallback. Well, this is a silly approach and, hopefully, no one relied on it.

Turns out, the compiler and tools do...

@Araq
Copy link
Member

Araq commented Mar 3, 2026

The code breakage is not acceptable. The compiler itself relied on the behavior as you found out.

@ZoomRmc
Copy link
Contributor Author

ZoomRmc commented Mar 3, 2026

Well, it's required for the bunch of reasons stated in the issues mentioned and explained here

You can't signal two different things with a valid value without a side-channel. The sad situation is exacerbated by the fact that current default state for this overload is a direct call for the side-effect, not even "it might be one or the other, check there".

A seq, at least, has one more mode over a string (@[] / @[""] vs "").

Stringly-typed API in a language with solid type system is puzzling (one might even say "not acceptable"), hacking around this choice with additional parameters leads to even worse cluttered API down the line and required deprecation of even more code.

As you can see by the green CI, the fix is simple (and in this case might be even more so, if we fix cmdlinehelper.NimProg carried proc to require a seq instead of a string).

@Araq
Copy link
Member

Araq commented Mar 3, 2026

Well var p2 = initOptParser("") is working code, you simply say "ambiguity!" but it used to do its thing. Silently changing the meaning of it is not acceptable, no matter how much the initial design was a bad idea.

Even deprecating initOptParser altogether and adding a new createOptParser proc would be better than this solution. Maybe we can find an even better one though.

That the API gets messier for a transition period is a given but so much more preferable over silently changing the meaning of a core API.

@ZoomRmc
Copy link
Contributor Author

ZoomRmc commented Mar 3, 2026

I understand your concerns, but doesn't the CI test common packages specifically to signal the blast radius of changes? The only thing that broke there was the compiler tools where the same thing was copy-pasted multiple times, this is telling.

Deprecation initOptParser might be a solution. As suggested in the #25557, just adding a parameter is another, but it's an ugly hack (it's force-assigning meaning to object states overriding the natural). I'm all ears for other solutions.

It also would be nice to hear other people's opinions.

@Araq
Copy link
Member

Araq commented Mar 3, 2026

Yes, let's hear more opinions. In the meantime, please create a new PR that fixes what can be fixed without breaking the API, if possible.

@ZoomRmc
Copy link
Contributor Author

ZoomRmc commented Mar 3, 2026

The only thing that's more or less self-contained is the commandLineParams fix for NimScript (which is also a breaking change, by the way). Other things all depend on fixing the API, it's the whole point of the PR.

@ZoomRmc
Copy link
Contributor Author

ZoomRmc commented Mar 3, 2026

I got interested and started looking for public cases of using initOptParser with an explicit passing of cmdline.

Where people use the string oveload, the recurring themes are:

  • A. people are confused about the API.
  • B. people try to stay in the string domain needlessly which results in repeated split-join circles.
p = initOptParser(commandLineParams().join(" "))
# or:
let cmd: string = commandLineParams().join(" ")
var p = initOptParser(cmd)

If launched with zero parameters, tries to read them the second time. (case A)
Joins to string only to be split again internally (case B).


let cmdParams = commandLineParams()
var optParser =
  if cmdParams.len == 0:
    initOptParser("")
  else:
    initOptParser(cmdParams.foldr(a & " " & b))

Clearly confused about the API (A). Top branch rereads the argv, bottom concatenates (B). This is just var p = initOptParser() reimplemented.


command = pythonArgs.join(" ")
var parser = initOptParser(nimArgs.join(" "))

This is A, B and a bug. The program has multiple command sets of external provenance in context, if nimArgs.join evaluates to an empty string, the calling program's argv are going to be read, contaminating the context.


Unfortunately, there's a few (~ 20) cases of explicit var p = initOptParser(""). It would be nice if we could emit a CT warning/error there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants