[Security hardening] Add automated security audit workflow#2442
[Security hardening] Add automated security audit workflow#2442PascalThuet wants to merge 38 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new GitHub Actions workflow to introduce automated security checks for the Python codebase, aiming to catch dependency advisories and high-severity static-analysis findings in CI alongside the existing test/lint workflows.
Changes:
- Add
.github/workflows/security.ymlwith a dedicatedSecurity Auditworkflow. - Run
pip-auditon pushes tomain, pull requests, a weekly cron, and manual dispatch. - Run Bandit against
src/, temporarily skippingB602pending the shell-step hardening tracked in #2440.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/security.yml |
New CI workflow that adds dependency-audit and static-analysis jobs for the Python project. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 5
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why
|
Addressed the Copilot feedback in the latest push:
Local validation: uv export --quiet --extra test --frozen --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll
git diff --checkResults: |
|
Added automated regression coverage for the security workflow in The new tests statically verify that:
Validation after this commit: uv run python -m pytest tests/test_security_workflow.py -q
uv export --quiet --extra test --frozen --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll
git diff --checkAll passed locally. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
src/specify_cli/init.py:414
- This second
# nosec B602suppresses Bandit for the non-capturing branch as well, so neither path throughrun_commandwill ever raise B602 again. If the intent is only to defer the current finding until #2440, the skip needs to stay in the workflow/configuration layer rather than permanently muting this call site in source.
# shell=True is only available to callers that opt in explicitly.
subprocess.run(cmd, check=check_return, shell=shell) # nosec B602
- Files reviewed: 5/5 changed files
- Comments generated: 5
|
Please address Copilot feedback. If not applicable, please explain why. Note the shell step should indeed be ignored |
|
Addressed the follow-up review in
Validation: uv export --quiet --extra test --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
uv run python -m pytest tests/test_security_workflow.py tests/test_workflows.py -q
uvx ruff check src/
git diff --checkI also verified the |
|
One more small cleanup after re-review: pushed Reason: it still audits the runtime + Revalidated locally: uv pip compile pyproject.toml --extra test --quiet --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
uv run python -m pytest tests/test_security_workflow.py -q
git diff --checkAll passed. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
tests/test_security_workflow.py:91
- Bandit doesn't require the exact literal
# nosec B602to suppress this finding; plain# nosec,#nosec, and multi-ID forms also disable B602. This check leaves those suppression paths untested, so a future source-level bypass can slip in without failing CI.
)
def test_b602_is_not_suppressed_in_source(self):
source_text = "\n".join(
path.read_text(encoding="utf-8")
for path in (REPO_ROOT / "src").rglob("*.py")
)
- Files reviewed: 5/5 changed files
- Comments generated: 3
|
Please address Copilot feedback |
|
Addressed the latest Copilot review in
Validation: uv run python -m pytest tests/test_security_workflow.py -q
uv pip compile pyproject.toml --extra test --python-version 3.11 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py311.txt
uv pip compile pyproject.toml --extra test --python-version 3.12 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py312.txt
uv pip compile pyproject.toml --extra test --python-version 3.13 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py313.txt
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py311.txt --progress-spinner off
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py312.txt --progress-spinner off
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py313.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
git diff --checkAll passed locally. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 5
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback
|
Addressed the latest Copilot review round in commit aa02062:
Local validation passed:
The new GitHub workflow runs are currently waiting for fork PR approval ( |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 2
Three micro-cleanups raised during review github#4 of my own work — no behavior change, just clarity. - Replace the __import__("specify_cli._download_security", fromlist=...) dance with a plain `import ... as _real_read_response_limited` at the top of the file. Easier to grep, no runtime difference. - Type the recorded dict explicitly and make max_bytes/label keyword- only without defaults on the spy. If a future refactor drops either argument the spy now raises TypeError immediately, instead of silently recording None and tripping the post-call assertion with a more confusing message. - Tighten the label check from fuzzy substring match ("github" in label.lower()) to exact equality ("GitHub latest release"). Both catch regressions; exact equality also catches typos.
Six items from the new Copilot pass. Three were latent bugs in the guardrails added by earlier commits, two are documentation/wording, one is parity coverage. Bugs - security.yml: the MEDIUM Bandit informational pass ran without --baseline, so the whitelisted HIGH B602 finding re-fired there on every run, turning the job summary into a permanent warning. Apply the same baseline to both passes; medium-only NEW findings now surface, as intended. - security.yml: the summary step ran with if: always() but the MEDIUM pass has the default if: success() — when the blocking HIGH step fails, the MEDIUM pass is skipped (outcome=skipped, not failure) and the summary wrote "✅ clean" anyway. Switch to a case statement that handles failure/success/skipped distinctly (⚠️ / ✅ / ⏭️). - check_bandit_baseline.py and check_secrets_baseline.py used `git show <head_ref>:` on both sides, so an unreadable/unfetched head ref returned empty results and the diff computed 0 new identities → fail-open. Read the head side from the working tree instead (CI is checked out at the PR head), fail-closed when the file is missing, and SystemExit on corrupt JSON. The base side keeps the lenient JSONDecodeError fallback because that's historical state we can't change. Wording - security.yml + CONTRIBUTING.md: both mentioned `# nosec` as a suppression mechanism, but tests/test_security_workflow.py:: test_bandit_nosec_is_not_suppressed_in_source explicitly forbids `# nosec` under src/. Replace with the actually-supported paths (bandit baseline for HIGH findings, `# noqa: S6xx` for ruff subprocess-shell rules) and flag the forbidden-comment policy. Parity coverage - tests/test_security_workflow.py: three new tests for the secret-scan job mirroring the dependency-audit / static-analysis coverage — detect-secrets-hook command, baseline path, excluded paths, growth gate env wiring (BASE only, no HEAD env), and fetch-depth: 0. - tests/test_workflows.py: regression test that WorkflowCatalog._fetch_single_catalog routes through read_response_limited with error_type=WorkflowCatalogError and label "workflow catalog". Mirrors TestBoundedRead for _fetch_latest_ release_tag and the equivalent test in test_integration_catalog.py. - tests/test_baseline_gates.py: two new fail-closed cases (head missing in working tree, head corrupt in working tree); drop the now-unused head_sha returns and the head env var from GateHandle.run. Note: Copilot also flagged "no tests on baseline gate scripts" — those tests already shipped in tests/test_baseline_gates.py (commit 2fd8071, posted before the review). Updated here with the new fail-closed cases. Tests: 3017 passed (was 3009).
- read_response_limited: read in a loop until EOF or one byte past the limit instead of a single read(max_bytes + 1). A server using chunked transfer encoding can return fewer bytes per read() than requested while streaming more than max_bytes total, defeating the single-read bound. Add regression tests for the short-read and within-limit paths. - _download_security: annotate _raise / _raise_from as NoReturn so type checkers treat call sites as unreachable. - Extract the duplicated is_https_or_localhost_http redirect-safety predicate into _download_security and import it from both _github_http and authentication/http so the rule lives in one place. - azure_devops: stop catching broad ValueError/KeyError around token acquisition; give the bounded read a dedicated _TokenResponseTooLarge type and catch only URLError, OSError, JSONDecodeError, and that type so unrelated programming errors still surface. - tests: make response mocks faithful streams (advancing cursor, b"" at EOF) so the bounded read loop terminates as it would against a real http.client.HTTPResponse. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- __init__.py preset/extension URL installs: give read_response_limited a domain error_type (PresetError / ExtensionError) and catch that instead of a blanket ValueError, so an oversized body is reported cleanly while unrelated ValueErrors surface as real errors. The extension catch now also covers install_from_zip's ValidationError (an ExtensionError). - _utils.run_command: rewrite the misleading docstring — shell=False is the only honoured mode; shell=True is rejected with ValueError, the parameter is retained only so existing keyword callers don't hit TypeError. - _download_security: document that the loopback allowance is an exact-string match (not an IP-range check), that read_response_limited's max_bytes default is the 50 MiB ceiling (callers with tighter budgets should pass an explicit value), and how _safe_zip_name handles single trailing-slash directory markers vs malformed empty segments. - authentication/http: comment the empty-hosts _StripAuthOnRedirect use as the HTTPS-downgrade guard on the unauthenticated path. - check_security_requirements: document the HEAD^ fallback failing safe (audit anyway) on shallow / single-commit checkouts. - security.yml: document the universal committed snapshot vs per-Python scheduled compile distinction. - tests: add a regression test that a symlink alongside benign members is rejected with no partial extraction to disk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit follow-up to the download-hardening work, closing similar cases within the scope of this PR: - Add read_zip_member_limited() and use it for the inline extension.yml read in the extension *update* path (__init__.py). That read happened before install_from_zip()'s safe_extract_zip(), so a raw zf.open().read() bypassed the per-member size bound: a manifest declaring a huge file_size (few KB compressed, gigabytes uncompressed) would be fully loaded by yaml.safe_load. The helper rejects on declared size and reads bounded. - Route the Azure DevOps OAuth token request through a strict-redirect opener so a 307/308 redirect cannot forward the client_secret POST body to a non-HTTPS, non-loopback host. - Tests for the new helper and the updated ADO opener path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…P bytes Addresses the Copilot review on PR github#2442 and the same pattern elsewhere. - safe_extract_zip(): track the cumulative bytes actually written and fail past max_total_bytes, so the total-size bound holds even if member headers understate file_size (the declared-total check alone could be evaded). Mirrors the existing per-member written guard — defense-in-depth consistency. - Pass an explicit max_bytes to read_response_limited() at every JSON call site instead of inheriting the 50 MiB archive/payload default: * MAX_JSON_METADATA_BYTES (1 MiB): Azure AD token, GitHub release metadata, and the existing latest-release fetch (migrated off an inline literal). * MAX_JSON_CATALOG_BYTES (8 MiB): preset, extension, workflow and integration catalog fetches. Binary/archive downloads keep the 50 MiB ceiling. Both ceilings are centralized as documented constants in _download_security.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- pin actions/checkout to the repo-wide df4cb1c (v6.0.3) in lint.yml and security.yml - replace the ad-hoc ip_address loopback checks in the workflow add URL/catalog flows with the shared is_https_or_localhost_http predicate, so HTTP-on-loopback rules match the redirect handler - drop the empty member name from the zip dot-segment test: zipfile cannot write such an entry, the case crashed in the test itself
…gates - align the setup-uv pin in security.yml with test.yml (v8.2.0) - use is_https_or_localhost_http for the preset_add/extension_add URL checks and pass strict_redirects=True to the latest-release fetch and the release-asset resolver call sites - baseline gate scripts fail closed on unresolvable refs and git read errors instead of treating them as "baseline did not exist"; the security workflow re-runs on labeled/unlabeled so the ack label can turn the gate green without a push - regenerate the bandit baseline against HEAD (two entries referenced removed code, one had drifted); track baseline entries by file+test_id in tests so line drift no longer breaks them - raise ZIP size-limit errors outside the broad except in safe_extract_zip so an error_type subclassing OSError/RuntimeError cannot re-wrap them - tests: drop two redirect tests duplicated from test_authentication, move the downgrade test next to its siblings, assert the workflow catalog max_bytes, route OpenerDirector.open through urlopen in the modules that patch urlopen, add set -euo pipefail to the secret scan, misc cleanup (unused helper, redundant imports, EOF-less fake read)
is_https_or_localhost_http allows HTTP for localhost, 127.0.0.1 and ::1; the user-facing messages and the open_url docstring only said localhost.
The non-HTTPS redirect rejection in _StripAuthOnRedirect applies to every authenticated attempt regardless of strict_redirects; the flag only extends the same guard to the unauthenticated fallback. Document both guards on the class and correct the open_url docstring, which previously gated the whole scheme restriction under strict_redirects.
A URL without a hostname (e.g. https:///x) has no real target; reject it regardless of scheme. Folds main's hostless-HTTPS guard into the shared predicate so every download/redirect call site benefits.
|
Can we split this into smaller dedicated PRs. This has been going a long time and the scope of being able to review it properly is getting too large. Thanks! |
Assisted-by: Codex (model: GPT-5, autonomous)
Summary
Security context
This creates a repeatable CI baseline for dependency advisories and high-severity static-analysis issues while preserving a scheduled live-resolution signal for newly published dependency problems. It also closes similar supply-chain and archive-handling gaps found during follow-up review.
Closes #2438
Validation