Skip to content

perf: optimize content extension checking in router#1391

Open
renich wants to merge 2 commits into
amberframework:masterfrom
renich:bolt-optimize-content-ext-7295391843654548349
Open

perf: optimize content extension checking in router#1391
renich wants to merge 2 commits into
amberframework:masterfrom
renich:bolt-optimize-content-ext-7295391843654548349

Conversation

@renich

@renich renich commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Uses File.extname and Hash lookup instead of regular expression alternation for file extension checking in hot paths.

Co-developed-by: Gemini AI renich+gemini@woralelandia.com
Signed-off-by: Rénich Bon Ćirić renich@woralelandia.com

@crimson-knight

Copy link
Copy Markdown
Member

The performance intent is sound — the 606-alternation MimeTypes::TYPE_EXT_REGEX on a per-request path is worth avoiding. But File.extname is not semantically equivalent to the anchored regex /\.(ext)$/, and the current implementation changes routing/content-negotiation behavior on real inputs, so requesting changes.

Two confirmed divergences (reproduced against the exact original regex on Crystal 1.20.0):

  • Trailing slash after an extension/users.json/: original regex → no match (anchored $); File.extname("/users.json/").json. Worse, Router#match then computes resource[0...-ext_size] = /users. — a corrupted route key the original would never produce.
  • Leading-dot / dotfile basename/.html, /.json: original regex → matches (treats as that extension); File.extname"", so extension/content handling is lost.

Bare minimum we'll accept

The optimized code must produce identical results to the pre-PR anchored-regex behavior for every input below, each covered by a spec (in spec/amber/router/ and spec/amber/controller/respond_with_spec.cr). This is the explicit minimum:

  1. Trailing slash after extension/users.json/, /x.html/ → must behave exactly as before (no match; no corrupted/truncated route key).
  2. Leading-dot basename/.html, /.json, /.txt → must match the original behavior.
  3. Multi-dot filename/archive.tar.html → extension is html (final segment only).
  4. Uppercase extension/INDEX.HTML, /data.JSON → must stay case-sensitive (original regex has no i flag → no match; don't silently downcase).
  5. No extension/users, /, and empty string → no extension / no match.
  6. Unknown extension/file.foobar, and a known-but-unsupported one like /image.png if not in the set → no match / nil format, same as before.

Plus one structural ask:

  1. Single source of truth — derive the accepted extensions from Content::TYPE (e.g. Content::TYPE.has_key?(ext)) instead of the hard-coded case in has_content_ext; otherwise the router list and the TYPE hash will drift.

Beyond the minimum

If you can identify additional edge cases worth locking down — e.g. percent-encoded dots (%2E), unusually long extension strings, or inputs specific to how request.path is produced — we'll gladly accept specs for them too. Just articulate in the PR why each one matters (what real request triggers it and what would break), so the coverage is intentional rather than incidental.

Once #1#7 hold with green specs, we're happy to take the optimization.

@renich

renich commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

I have updated the PR to address your feedback.

Following your suggestions, I avoided File.extname and rewrote the matching logic using rindex and a pre-populated Set derived from Content::TYPE. This correctly handles all cases, including trailing slashes, leading-dot basenames, case sensitivity, and multi-dot extensions. I have also added regression specs covering these 6 cases.

All specs are passing.

(Note: This contribution was co-developed with Gemini AI. Rénich has directed, reviewed, tested, and takes full responsibility for this code.)

google-labs-jules Bot and others added 2 commits June 16, 2026 22:58
Replaced slow Regex matching `str.match(/\.(html|json|txt|...)$/)` with fast string manipulation and hash lookups (`File.extname` + `.has_key?`). This drastically speeds up router matching and content type parsing.

Co-authored-by: renich <225115+renich@users.noreply.github.com>
Refactor route matching and responder extension checking to use rindex
and custom key sets. This matches the behavior of the original regex
anchoring while avoiding File.extname bugs (such as trailing slashes,
leading-dot file basenames, and case-insensitivity bugs).

Co-developed-by: Gemini AI <renich+gemini@woralelandia.com>
Signed-off-by: Rénich Bon Ćirić <renich@woralelandia.com>
@renich renich force-pushed the bolt-optimize-content-ext-7295391843654548349 branch from 00fd908 to ba1ee41 Compare June 17, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants