std/parseopt: fixes initOptParser fallback logic, moves NimScript logic to os.cmdline#25571
std/parseopt: fixes initOptParser fallback logic, moves NimScript logic to os.cmdline#25571ZoomRmc wants to merge 3 commits intonim-lang:develfrom
std/parseopt: fixes initOptParser fallback logic, moves NimScript logic to os.cmdline#25571Conversation
- `initOptParser` and `getopt` no longer fall back to `commandLineParams()` when given empty input (`""` or `@[]`). An explicit empty string or sequence now produces an empty parser. - `std/cmdline.commandLineParams()` for NimScript now consistently returns only the script arguments, skipping Nim arguments.
Turns out, the compiler and tools do... |
|
The code breakage is not acceptable. The compiler itself relied on the behavior as you found out. |
|
Well, it's required for the bunch of reasons stated in the issues mentioned and explained here You can't signal two different things with a valid value without a side-channel. The sad situation is exacerbated by the fact that current default state for this overload is a direct call for the side-effect, not even "it might be one or the other, check there". A seq, at least, has one more mode over a string (@[] / @[""] vs ""). Stringly-typed API in a language with solid type system is puzzling (one might even say "not acceptable"), hacking around this choice with additional parameters leads to even worse cluttered API down the line and required deprecation of even more code. As you can see by the green CI, the fix is simple (and in this case might be even more so, if we fix |
|
Well Even deprecating initOptParser altogether and adding a new That the API gets messier for a transition period is a given but so much more preferable over silently changing the meaning of a core API. |
|
I understand your concerns, but doesn't the CI test common packages specifically to signal the blast radius of changes? The only thing that broke there was the compiler tools where the same thing was copy-pasted multiple times, this is telling. Deprecation initOptParser might be a solution. As suggested in the #25557, just adding a parameter is another, but it's an ugly hack (it's force-assigning meaning to object states overriding the natural). I'm all ears for other solutions. It also would be nice to hear other people's opinions. |
|
Yes, let's hear more opinions. In the meantime, please create a new PR that fixes what can be fixed without breaking the API, if possible. |
|
The only thing that's more or less self-contained is the |
|
I got interested and started looking for public cases of using Where people use the string oveload, the recurring themes are:
p = initOptParser(commandLineParams().join(" "))
# or:
let cmd: string = commandLineParams().join(" ")
var p = initOptParser(cmd)If launched with zero parameters, tries to read them the second time. (case A) let cmdParams = commandLineParams()
var optParser =
if cmdParams.len == 0:
initOptParser("")
else:
initOptParser(cmdParams.foldr(a & " " & b))Clearly confused about the API (A). Top branch rereads the argv, bottom concatenates (B). This is just command = pythonArgs.join(" ")
var parser = initOptParser(nimArgs.join(" "))This is A, B and a bug. The program has multiple command sets of external provenance in context, if Unfortunately, there's a few (~ 20) cases of explicit |
Fixes #17018
Closes #23554
Fixes #25557
Both
initOptParseroverloads currently fall back tocommandLineParams()when given empty input. As already pointed out in #17018, branching semantics on a valid value (len == 0) is a bad API decision.The most common failure mode is subcommand delegation as described in #25557.
The same issue affects calls like:
If the filter produces an empty sequence, the parser unexpectedly falls back to the full OS argv.
Separately, in NimScript,
commandLineParams()returns the compiler invocation flags, not just script arguments.std/parseoptcontained CT-defined logic to slice those out, butstd/cmdlineitself did not. This split responsibility was inconsistent.Changes:
Moved the NimScript stripping logic from
std/parseoptintostd/cmdline.commandLineParams()where it belongs. Closes In NimScript mode, makecommandLineParams()more useful #23554.Removed the
cmdline.len == 0fallback from the core (private) parser implementation.Set the default value for initOptParser and changed for getopt:
Removed the
cmdline = ""default from the string overload so the "omit" case resolves to the now-main seq-taking overload.Removed the assertion raise from initOptParser. Because the new default
commandLineParams()is evaluated at the call site, callinginitOptParser()in a dynamic library now should trigger the{.error.}pragma incmdlineat compile-time, rather than crashing with araiseAssertat runtime.The parser now strictly respects the input given. Implicit fetching OS arguments reduced to no-parameters from previous two cases (omitted/empty).
Breakage
This breaks the code that used explicit empty values to trigger OS fallback. Well, this is a silly approach and, hopefully, no one relied on it.
However:
initOptParser()(cmdlineomitted) continues to fetch OS arguments via the defaultcommandLineParams()value, so the idiomatic usage remains unaffected. The routine this call resolves to changed, though, now it's the seq-takinginitOptParserwhich is really the first-class one anyway, as the string overload splits cmdline to seq[string] withparseCmdLineunder the hood (btw, this proc itself recommends usingparseoptinstead which is a bit circular).Another bug/quirk fixed/worth mentioning
Under the old implementation, a whitespace-only string such as
initOptParser(" ")was first routed throughparseCmdLine, which consumes whitespace as delimiters and produces@[], this would then trigger the fallback and silently re-parse the real OS command line.In other words, a non-empty string argument could unexpectedly produce a parser over the process argv. With the fallback removed, this improves naturally and produces a parser over an empty seq (
parseCmdLine(" ") == @[]).However, this still diverges with
initOptParser(@[" "]). Here the parser receives exactly one whitespace argument. No tokenization step collapses it to empty. So, although the sentinel fallback is gone, the two overloads are not fully equivalent for inputs that feel empty. The string overload has a wider "empty surface", so to speak.Examples
For NimsScript:
Overall, this removes the sentinel-based branching, restores predictable semantics, and fixes the NimScript layering issue at the same time. I say, wins all around.