harden: bound remaining catalog reads + lock in no-unbounded-read invariant#3145
Draft
PascalThuet wants to merge 3 commits into
Draft
harden: bound remaining catalog reads + lock in no-unbounded-read invariant#3145PascalThuet wants to merge 3 commits into
PascalThuet wants to merge 3 commits into
Conversation
Add a shared _download_security module (read_response_limited, is_https_or_localhost_http, size constants) and route the GitHub release and Azure DevOps token network reads through bounded reads so an oversized response can't exhaust memory. Add a strict_redirects mode to authentication.open_url: the redirect handler now rejects any redirect whose target isn't HTTPS (or HTTP to localhost), composing with the existing per-hop redirect_validator and auth-stripping. The Azure DevOps token POST is routed through that handler so a 307/308 cannot forward the client_secret body to a non-HTTPS host.
…on/preset downloads Extend _download_security with safe_extract_zip (path + symlink + per-member and total size + entry-count bounds), read_zip_member_limited, and verify_sha256, and adopt them across the extension and preset install paths: - install_from_zip extracts via safe_extract_zip (zip-slip + symlink + zip-bomb defense) instead of the previous path-only containment check. - catalog JSON fetches and package downloads use bounded reads; the inline manifest pre-read in extension update uses read_zip_member_limited. - downloaded packages are verified against their catalog sha256 when present. - CatalogStackBase and the preset catalog validator reject hostless URLs before the scheme check. The command-registration path-traversal guard the original change carried is already on main (github#3088) and is intentionally omitted.
…ad invariant Route the remaining network JSON reads through read_response_limited and pass strict_redirects=True on the workflow/integration/bundler catalog readers and the workflow add / workflow step add download paths: - workflows/catalog.py, integrations/catalog.py, bundler/services/adapters.py, commands/bundle/__init__.py: bounded catalog reads. - __init__.py workflow_add / workflow_step_add: bounded reads + strict redirects + shared is_https_or_localhost_http validation. - extensions/_commands.py: bound the `extension add --from <url>` ZIP download (the last unbounded network read). Add test_remote_downloads_do_not_use_unbounded_response_reads: an AST gate that scans the whole src/specify_cli tree and fails if any response.read() is unbounded, except the three local-file get_hash reads. With this, every remote read in the CLI is bounded.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR completes the runtime download-hardening stack by enforcing bounded HTTP response reads across remaining catalog/workflow/bundler paths, tightening redirect handling (strict HTTPS w/ loopback HTTP exception), and adding a repo-wide AST gate to prevent reintroducing unbounded response.read() calls.
Changes:
- Introduces
src/specify_cli/_download_security.py(bounded reads, URL predicate, ZIP extraction + member read bounds, optional sha256 verification) and threads it through remaining network read sites. - Enforces strict redirect validation via
open_url(..., strict_redirects=True)(and the shared redirect handler) across workflows/integrations/bundler/versioning, with tests updated to use cursor-advancing response fakes. - Adds
tests/test_download_security.pywith both helper-level tests and an AST-based invariant check blocking unbounded.read()usage undersrc/specify_cli.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_bundler_adapters.py | Makes the response fake stream-like (read(size) cursor) to exercise bounded-read loops. |
| tests/test_workflows.py | Updates workflow download/open_url fakes to accept strict_redirects and provide stream semantics. |
| tests/test_upgrade.py | Adds regression coverage to pin _fetch_latest_release_tag to read_response_limited(max_bytes=...) and adapts mocks for bounded reads. |
| tests/test_self_upgrade_verification.py | Imports shared fixture to route build_opener().open through patched urlopen in strict-redirect paths. |
| tests/test_self_upgrade_execution.py | Same fixture import for strict-redirect opener routing. |
| tests/test_self_upgrade_detection.py | Same fixture import for strict-redirect opener routing. |
| tests/test_presets.py | Updates mocks to use stream-backed .read() and accommodates strict-redirect openers in preset flows. |
| tests/test_github_http.py | Updates response mocks for bounded reads and adds redirect/auth preservation tests. |
| tests/test_extensions.py | Updates mocks to stream semantics and adjusts URL validation expectation text. |
| tests/test_download_security.py | Adds bounded-read/zip-safety tests plus AST gate forbidding unbounded .read() across src/specify_cli. |
| tests/test_authentication.py | Updates redirect/strict-opener expectations and patches opener-based code paths. |
| tests/self_upgrade_helpers.py | Re-exports the opener-routing fixture for reuse across self-upgrade tests. |
| tests/integrations/test_integration_catalog.py | Updates catalog-fetch tests to cover both urlopen and opener.open paths and adds bounded-read assertion. |
| tests/http_helpers.py | Changes JSON response mocks to stream semantics and adds an autouse fixture to route opener.open through urlopen. |
| src/specify_cli/workflows/catalog.py | Validates catalog URLs via shared predicate, enforces strict redirects, and bounds catalog JSON reads. |
| src/specify_cli/presets/_commands.py | Uses shared URL predicate + bounded reads for preset add --from, and routes GitHub release resolution via strict openers. |
| src/specify_cli/presets/init.py | Uses safe_extract_zip, bounds catalog JSON reads, and verifies optional sha256 for downloaded packs. |
| src/specify_cli/integrations/catalog.py | Enforces strict redirects and bounds integration catalog JSON reads. |
| src/specify_cli/extensions/_commands.py | Bounds extension add --from <url> ZIP downloads and bounds pre-extract manifest reads. |
| src/specify_cli/extensions/init.py | Uses safe_extract_zip, bounds catalog JSON reads, and verifies optional sha256 for extension downloads. |
| src/specify_cli/commands/bundle/init.py | Bounds bundle catalog reads. |
| src/specify_cli/catalogs.py | Tightens catalog URL validation order (host before scheme) and clarifies loopback HTTP messaging. |
| src/specify_cli/bundler/services/adapters.py | Bounds bundler catalog JSON reads and decodes via the bounded-read helper. |
| src/specify_cli/authentication/http.py | Adds strict_redirects option and enforces shared strict redirect validation inside the redirect handler. |
| src/specify_cli/authentication/azure_devops.py | Routes token POST through strict redirect handler and bounds token JSON reads. |
| src/specify_cli/_version.py | Uses bounded reads + strict redirects for GitHub latest-release metadata. |
| src/specify_cli/_github_http.py | Bounds GitHub release metadata reads when resolving asset API URLs. |
| src/specify_cli/_download_security.py | New central hardening helpers (bounded reads, strict URL predicate, safe ZIP extraction, sha256 verification). |
| src/specify_cli/init.py | Applies bounded reads + strict redirects + shared URL predicate to workflow add/step add remote fetch paths. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 29/29 changed files
- Comments generated: 2
Comment on lines
+159
to
+165
| zip_path.write_bytes( | ||
| read_response_limited( | ||
| response, | ||
| error_type=PresetError, | ||
| label=f"preset {from_url}", | ||
| ) | ||
| ) |
Comment on lines
+243
to
+253
| def _is_unbounded_read(call: ast.Call) -> bool: | ||
| if call.args: | ||
| size = _constant_int(call.args[0]) | ||
| return size is not None and size < 0 | ||
|
|
||
| for keyword in call.keywords: | ||
| if keyword.arg == "size": | ||
| size = _constant_int(keyword.value) | ||
| return size is not None and size < 0 | ||
|
|
||
| return True |
Collaborator
|
Converted to a draft as it is waiting for #3141 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of splitting #2442 into smaller, dedicated PRs. Re-derived against current
main. Final PR of the runtime-hardening stack.What
read_response_limited(max_bytes=MAX_JSON_CATALOG_BYTES)and passstrict_redirects=True:workflows/catalog.py,integrations/catalog.py,bundler/services/adapters.py,commands/bundle/__init__.py.__init__.pyworkflow_add/workflow_step_add: bounded reads +strict_redirects=True+ sharedis_https_or_localhost_httpURL validation.extensions/_commands.py: bound theextension add --from <url>ZIP download — the last unbounded network read (surfaced by the gate below).The invariant
Adds
tests/test_download_security.py::test_remote_downloads_do_not_use_unbounded_response_reads: an AST scan over the wholesrc/specify_clitree that fails CI if anyresponse.read()is unbounded, with a 3-entry allowlist for the local-fileget_hashreads (extensions/presets/integrations). This only goes green once #3140 + #3141 + this have converted every remote read — it's the lock that keeps the bound from silently regressing.Validation
test_download_security(incl. the AST gate),test_workflows,test_integration_catalog,test_bundler_adapters,test_upgrade,test_extensions— 792 passed.ruff checkclean.strict_redirectsthreaded through the fake openers.This completes the runtime hardening. The Bandit + detect-secrets CI gates land in a separate PR on the #2/#3 track.