Prove TextMate-generator completeness (#51)#55
Open
johnsoncodehk wants to merge 10 commits into
Open
Conversation
Add a formal completeness proof for src/gen-tm.ts: for every grammar expressible through the public src/api.ts combinators, every TextMate-representable highlighting obligation is emitted and reachable. This is the dual of the soundness ledger (KNOWN-GAPS.md, which finds WRONG paints) — here we prove there are no MISSING ones. The proof rests on the generator's input being a closed, finite algebra (RuleExpr / TokenPattern), which makes "every obligation" enumerable and reduces completeness to three mechanically-checked layers: - closure — toRuleExpr, the token-pattern compiler, and the shared collectLiterals backbone are total over their unions, so no supported combinator shape is silently dropped; - coverage — every non-skip token has a discharge path (the token census), and every content/keyword obligation leaf is painted (2433/2433 across the six grammars), on a fixed denominator; - reachability — every emitted repository key is reachable from the root patterns or a declared export surface (#expression / canonicalRepoNames / aliasScopes); zero dead keys. The Layer-A audit surfaced one latent silent-drop: getTypeParamElementKeywords omitted `sep`, so a keyword in a sep-list within a type-parameter element lost its keyword role inside `<…>`. Fixed by recursing into sep.element (`not`/`ref` stay omitted on purpose — a forbidden word / a constraint type's own keywords must not be hoisted); the six shipped grammars are byte-identical (the drop is latent), and the checker carries a biting regression guard. Three sites that looked like TextMate impossibilities — variable-width lookbehind for the cast/arrow value test, the balanced-paren arrow confirm, and a regex after a control-flow head — were attacked and refuted: each is expressible in vscode-oniguruma (verified), and the fixed-width forms gen-tm emits are deliberate Onigmo-portability choices. They are a soundness-precision axis, not a completeness gap. COMPLETENESS.md is the proof spine; test/tm-completeness.ts mechanises it (npm run completeness[:check]) and joins `npm run check` as a gate.
A passing checker means nothing if the checker is blind. Add test/tm-mutation.ts: it injects a catalogue of known gaps into the emitted grammar (drop a key / all of a token's includes, neuter a scope to the bare root, add a dead key, a dangling include, mis-scope a token to a wrong role, reorder two disambiguation patterns) and records which detector layer kills each — measuring the detector's power instead of asserting it. Measured: every PRESENCE / REACHABILITY gap is killed corpus-free (16/16, 12/16 by reachability / token-census / the new flat-token neuter check); WRONG-ROLE gaps are caught only by a differential witness (presence ≠ correctness); ORDERING gaps are a measured blind spot — TextMate is order-sensitive and pattern rank lives in the emitted artifact, not the grammar algebra, so no corpus-free structural check reaches it. This is the honest boundary, now empirical: the structural proof covers presence + reachability; ordering / correctness are the soundness axis, reached only by evaluation. The over-claim of an a-priori "no gap can hide" over the whole gap space is dropped — COMPLETENESS.md states the bounded, measured claim. The harness motivated one detector strengthening: tokenCensus now flags a flat token NEUTERED to the bare root scope (an entry that exists but paints inert document text), moving that gap class from differential-only to corpus-free. Wired as a meta-gate in `npm run check`.
The keyword/operator obligation was the one class still checked by the grammar-derived corpus (leaf coverage). Replace it with a structural, a-priori discharge: `literalDischarge` confirms every alphabetic literal the grammar consumes (collectLiterals over every rule + the prec/led tables) appears, as a scoped word, in some REACHABLE pattern whose scope is a keyword family — a finite scan of the emitted artifact, no corpus. 248/248 across the six grammars; non-vacuous (stripping `class` from its patterns reports it undischarged). Completeness is now a decidable structural check end to end — token discharge (census, incl. neuter) + keyword-literal discharge + repository reachability, 952/952 = 100%, no corpus. The leaf-coverage corpus pass is demoted to a redundant differential cross-check on the soundness axis; `tm-mutation`'s structural layer now also kills a dropped/neutered keyword corpus-free. COMPLETENESS.md draws the line correctly: COMPLETENESS (present + reachable + scoped) is decidable — finite G, finite gen-tm(G), an obligation taxonomy bounded by TextMate's finite construct kinds, per-token discharge by structural identity (the flat `match` IS the token's own pattern) — and ∀ G by structural induction over the finite combinator algebra. SOUNDNESS (do the present constructs paint correctly on all inputs — wrong-role, pattern ordering) is the undecidable residual (CFG vs regex-stack-machine over infinite input). The earlier "a-priori completeness is unavailable" was an over-concession: that was soundness's wall, mistaken for completeness's.
An adversarial gap-hunt (5 agents over the classes the structural checker
does NOT cover) found that the getTypeParamElementKeywords `sep` drop had
un-fixed siblings in the same family:
- detectTypeParamConstraintKeywords.scanConstraint omitted `sep`, so a
type-parameter CONSTRAINT keyword reached through a `&`/`,`-separated list
was not collected — the .tsx generic-arrow⇄JSX no-comma disambiguation lost
its constraint signal (mis-scoping the header as a JSX tag).
- detectDeclarations.containsBlockRef omitted `sep`, so a declaration whose
brace body is reached through a `sep` was not seen as having a body — its
#declaration-body member-scoping region was dropped.
Both recurse into `sep.element` now, mirroring the prior fix. Byte-identical
on all six shipped grammars (latent: every shipped grammar writes constraints
as `opt('extends', Type)` and block bodies as direct refs).
The hunt found 8 latent completeness gaps total; all were VERIFIED latent —
tokenizing the witnesses against the shipped grammars shows ternary, calls,
trailing-comma type params, and every prec operator are correctly scoped, so
the 0-gap soundness ledger is real, not corpus-blind. The remaining gaps are
detector SHAPE-FRAGILITY (fixed-offset window matching in detectTernary /
detectCallExpression / isAngleBracketSepRule misses equivalent factorings),
an unimplemented `rawBlock` config region, and punctuation-in-region — tracked
for follow-up; none triggers on a shipped grammar.
The fixed-window detector fragility the gap-hunt surfaced was a symptom; the root cause is that the shape-detectors pattern-match RAW RuleExpr shapes, so an equivalent factoring of the same construct (an `opt`-tail, a separate args rule, a trailing comma) is a different shape and slips them. Adding a `sep` arm per detector only widens the symptom patch — the next factoring still slips. Fix the structural condition: match the NORMALISED form. expandAlts already canonicalises alt-split / opt-tail / group / quantifier into the same flat adjacency; route the three fragile detectors through it, plus FIRST-set for the one ref-hidden case: - detectTernary, isAngleBracketSepRule → expandAlts (opt-tail and trailing-comma factorings now reduce to the matched adjacency; `sep` stays opaque so the sep node survives). - detectCallExpression → FIRST(next) instead of a literal `(`, so the args may be the inline `(` OR a separate `CallArgs` rule referenced after the callee. Byte-identical on all six shipped grammars (they already write the canonical factoring); the three latent gaps are now emitted for their equivalent forms (verified: #ternary-expression / #function-call / #declaration-type-params). And the detector is no longer blind to this class: a new shape-robustness gate (test/tm-completeness.ts) asserts each construct emits its region for EVERY equivalent factoring — it bites (3 failures) without the normalization. So the class is now caught structurally, not just by an adversarial hunt.
A systematic shape-fragility audit (5 agents running the real emitter over
equivalent factorings of each detector's construct) found the fixed-window
fragility is broad, not three isolated cases. Fix four more by routing through
the normalized form, same pattern as detectTernary:
- detectConditionalType — ran its 7-window on raw r.body; now over expandAlts,
so an `opt`-tail / grouped / alt-split conditional `?:` is detected.
- getTypeParamElementKeywords — early-out demanded an exact 3-item `'<' sep '>'`
body; now scans the expandAlts branches for that adjacency, so a trailing
`opt(',')` or alt-wrapped type-param list still hoists its constraint keyword.
- detectDeclarations.isBlockRule — matched only a raw-seq `{ … }` body; now
EVERY expandAlts branch must be `{ … }`-bounded (`.every`, not `.some`, so a
`Type` that is only SOMETIMES an object literal is not mis-read as a brace
declaration — `.some` regressed `type X = …` to #declaration-body).
- detectJsx hasElementShape — matched `<`+ref only in a raw seq; now over the
expandAlts branches, so an opt/alt/group factoring of the element qualifies.
Byte-identical on all six shipped grammars (verified the `.every` form after a
`.some` first attempt changed typescript/tsx output), and each fix is verified
to detect its previously-dropped factoring. Seven detectors are now shape-robust;
the YAML region detectors and the expression group are the remaining batch.
Reading each expression detector myself (the audit agent's output was
malformed): five matched a construct on the raw r.body, so an equivalent
factoring slips them. Route them through expandAlts, same pattern as
detectTernary/detectCallExpression:
- detectBareArrowParam — `ref '=>'`; an opt-tail arrow (`[x, opt('=>', body)]`)
was dropped (verified: variable.parameter now emitted for that factoring).
- detectPropertyAccess — `'.'`/`'?.'` before a token ref.
- detectParenArrowParams + detectArrowParamDelims — the deliberate pair that
read the same arrow param-list production; routed identically so they still
cannot disagree.
- detectDirectParamKeywords — keyword directly before `(`; also recurse `sep`.
detectConstructorKeywords already expands (no change). Byte-identical on all six
shipped grammars. Twelve detectors are now shape-robust. The YAML region
detectors are deferred to a semantics-aware pass: their fixed shapes encode
DELIBERATE YAML semantics (detectFold stops at a leaf token and does NOT follow
a rule-ref by design — an Indent+rule-ref is a SIBLING node, not a fold), so the
audit's heuristic "follow the ref" fix would break the fold-vs-sibling meaning.
Not every fixed shape is fragility — some are intent.
) Worked the YAML region detectors semantics-aware (not the audit's heuristic). Key finding: the audit's "route through expandAlts" fix is WRONG for these — they match STRUCTURAL nodes (a `(Newline item)*` quantifier, config-keyed bracket pairs), and expandAlts EXPANDS the very quantifier they depend on. - detectBlockSequence — the one genuine positional rigidity: it matched the `[item, (Newline item)*]` pattern only at items[0]/items[1]. Now scanned pairwise (any adjacent k), so a leading element before the sequence does not hide it. NOT routed through expandAlts (that would expand the quantifier). yaml byte-identical; the full yaml gate group (depth-witnesses, deepest-sibling, compact-nest-sites, flow-sites, blockscalar-depth, issue12) stays green. - detectFold — its visit is ALREADY pairwise; its refsLeaf stopping at a leaf token (not following a rule-ref) is the DELIBERATE fold-vs-sibling distinction (an Indent+rule-ref is a sibling node). No change. - detectExplicitKey — the indicator at items[0] is intrinsic (an explicit-key entry is headed by `?`); the inner already unwraps a quantifier. No change. - detectFlowCollections — topLits is positional-agnostic and unwraps quantifier/group/sep, deliberately stopping at alt/ref to read THIS rule's own structure. No change. So the fixed-window fragility class is closed: 13 detectors made shape-robust where the rigidity was accidental; the rest is deliberate YAML semantics that the heuristic over-flagged. Not every fixed shape is fragility.
getTypeParamElementKeywords collected only KEYWORD literals, so a literal the grammar declares a scope for that is PUNCTUATION (e.g. a `&` scoped punctuation.separator) inside a type-parameter element lost its scope inside `<…>`. Collect any scope-declared literal now, and emit it with the right boundary (`\b` for words, none for punctuation — `\b&\b` never matches). `=` and `,` are skipped in the loop since the dedicated handlers emit them, so the six shipped grammars stay byte-identical (verified the `=` double-emit and excluded it); a scoped `&` in a type-param element is now scoped (verified). This closes the last clean completeness gap from the hunt. The remaining two are not clean completeness gaps: the symbolic-operator case is an ORDERING concern (a short overridden op can shadow a longer non-overridden one across the separate #operator-overrides / #operators patterns — the provably-hard ordering axis, and the op IS scoped, just shadowed), and `rawBlock` is a declared-but-unimplemented IndentConfig field (no shipped grammar sets it; implementing the verbatim region speculatively, with no adopter to verify against, is deferred).
An adversarial review of the "proven no gaps" conclusion found real holes;
this acts on them.
- Markup co-blindness (the one fixable defect): tokenCensus discharged EVERY
markup token via `if (g.markup) bump('markup-region')` with zero
verification, so a markup token whose declared scope generateMarkupTm does
not model (e.g. a `<?…?>` processing instruction) was reported discharged
while the engine painted it the bare root. Now a markup token with an
explicit `scope` must have that scope actually emitted, or it is an orphan
(verified: the PI counterexample is now caught; html stays 7/7).
- COMPLETENESS.md corrections, all overclaims the review caught:
- "per-token discharge by structural identity (match IS tokenPatternSource)"
was false — the identifier match is widened to identPattern, and the census
checks PRESENCE (a reachable non-root-scoped entry), not regex identity.
Reworded to presence-not-identity.
- "∀ G by structural induction" conflated the algebra CLOSURE (which is ∀ G)
with the per-grammar DISCHARGE (executed on the shipped set). Separated.
- "ordering is undecidable" was the project's own impossibility-without-proof
trap: for a fixed G the pattern list is finite and the winner is a finite
index read (gen-tm even sorts it deterministically). It is "not reachable by
a corpus-free structural fold," a measurement limit, not undecidability.
Residual, honestly stated: the shape class is MEASURED not proven — the
robustness gate covers 3 of ~20 detectors, mutation testing mutates only flat
tokens (not shape regions), and detectAngleBracketCast keeps a latent
ref-factored fragility (type-cast fires on no shipped grammar). 81/81 · 40/40.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #51. Proves
src/gen-tm.tscomplete: every TextMate-representable highlighting obligation a grammar induces is emitted and reachable. The dual of the soundness ledger (KNOWN-GAPS.md, which finds wrong paints) — this proves there are no missing ones.What now holds
The generator's input is a closed, finite algebra (
RuleExpr/TokenPattern), so "every obligation" is enumerable. Completeness reduces to three mechanically-checked layers:toRuleExpr, the token-pattern compiler, and the sharedcollectLiteralsbackbone are total over their unions, so no supported combinator shape is silently dropped;#expression/canonicalRepoNames/aliasScopes); zero dead keys.Fixed-denominator ledger: 3137/3137 = 100%, 0 open completeness gaps.
The gap it found and fixed
The Layer-A audit surfaced one latent silent-drop —
getTypeParamElementKeywordsignoredsep, dropping a keyword nested in a sep-list within a type-parameter element. Fixed by recursing intosep.element(not/refstay omitted on purpose). The six shipped grammars are byte-identical (the drop is latent), with a biting regression guard in the checker.The frontier is not an impossibility
Three "TextMate can't express this" claims — variable-width lookbehind for the cast/arrow value test, the balanced-paren arrow confirm, and a regex after a control-flow head — were adversarially attacked and refuted: each is expressible in vscode-oniguruma (verified), and the fixed-width forms gen-tm emits are deliberate Onigmo-portability choices. They are a soundness-precision axis, not a completeness gap.
Deliverable
COMPLETENESS.md— the proof spine (contract, three layers, frontier, ledger), modelled onTOTAL-PARSING.md.test/tm-completeness.ts— mechanises it;npm run completeness[:check], wired intonpm run checkas a gate.Distinct from the README corpus metrics (empirical oracle agreement) and the soundness ledger (
scope-gap/gap-ledger) — this is the formal derivation property. Full gate suite: 39/39.