Skip to content

⚡ perf(link): skip quote roundtrip on safe URL paths#13985

Closed
gaborbernat wants to merge 1 commit into
pypa:mainfrom
gaborbernat:pip-tools-clean-url-fast-path
Closed

⚡ perf(link): skip quote roundtrip on safe URL paths#13985
gaborbernat wants to merge 1 commit into
pypa:mainfrom
gaborbernat:pip-tools-clean-url-fast-path

Conversation

@gaborbernat
Copy link
Copy Markdown

@gaborbernat gaborbernat commented May 6, 2026

The Simple-API JSON response from Warehouse hands out file URLs whose path is pure ASCII alphanumerics plus the always-safe _-.~/ set, yet _clean_url_path still pays a full urllib.parse.unquote followed by urllib.parse.quote per link to guarantee idempotency. Walking ~65000 links across an 8-pass cross-platform lock spends roughly 6% of user-CPU time reproducing the input verbatim. ⚡

Guard the round-trip with a single negative-class re.search against the always-safe alphabet. When every character of the path passes, return it unchanged; otherwise fall through to the existing logic, byte-for-byte preserved. The pre-check is ~250 ns and the work it skips averages ~900 ns on a real-world wheel link, so the fast path is 4.7x faster per call and paths that fall through pay only a small constant overhead.

Function-level micro-bench against /packages/12/34/567/somepackage-1.2.3-py3-none-any.whl: _clean_url_path drops from 1144 ns/call to 244 ns/call, a 79% reduction in CPU time. End-to-end against a cross-platform lock pipeline iterating ~65000 links across 8 resolver passes (n=12 paired runs alternating between HEAD and HEAD + this patch): user-CPU mean falls 6.3% (10/12 paired runs faster) with stdev 2.5x lower under the patch.

No behaviour change for any URL containing a character outside [A-Za-z0-9_./~-]. Local-path inputs, %-escaped paths, and paths carrying @ or other reserved characters all flow through the original cleaning logic untouched.

gaborbernat added a commit to gaborbernat/pip that referenced this pull request May 6, 2026
@gaborbernat gaborbernat force-pushed the pip-tools-clean-url-fast-path branch from a460696 to da4ac5c Compare May 6, 2026 17:30
@gaborbernat gaborbernat changed the title perf: short-circuit URL path cleaning for already-safe paths ⚡ perf(link): skip quote roundtrip on safe URL paths May 6, 2026
@gaborbernat gaborbernat force-pushed the pip-tools-clean-url-fast-path branch from da4ac5c to cbc2351 Compare May 6, 2026 17:37
The Simple-API JSON response from Warehouse hands out file URLs whose path
is pure ASCII alphanumerics plus `_-.~/`, every character of which sits in
`urllib.parse.quote`'s default-safe set. `_clean_url_path` still pays a
full `urllib.parse.unquote` followed by `urllib.parse.quote` per link to
guarantee idempotency, even though the round-trip cannot change anything
on these inputs. With ~65000 links walked across an 8-pass cross-platform
lock that is ~6% of user-CPU time spent reproducing the input verbatim.

Guard the round-trip with a single negative-class `re.search`. When the
path contains no character outside the always-safe alphabet, return it
unchanged; otherwise fall through to the existing logic. The pre-check is
~250 ns and the work it skips averages ~900 ns on a real-world wheel link
(4.7x faster on the fast path), so even paths that fall through pay only a
small constant overhead.

A new `test_clean_url_path_idempotent_for_safe_paths` parametrize asserts
the fast path is a true identity for the alphabets it claims to cover. The
existing `test_clean_url_path` cases all carry at least one unsafe char and
keep exercising the slow path.
@gaborbernat
Copy link
Copy Markdown
Author

Closing in favour of #13986, which short-circuits at the outer _ensure_quoted_url layer and subsumes this patch — when _ensure_quoted_url returns early on a clean URL, _clean_url_path is never entered, so the fast path here becomes dead code. The benchmarks confirm the two cover the same set of links and the wins do not stack.

@gaborbernat gaborbernat closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant