Skip to content

feat: add fetch access controls#376

Draft
arabold wants to merge 3 commits intomainfrom
feat/374-fetch-access-controls
Draft

feat: add fetch access controls#376
arabold wants to merge 3 commits intomainfrom
feat/374-fetch-access-controls

Conversation

@arabold
Copy link
Copy Markdown
Owner

@arabold arabold commented Mar 29, 2026

Summary

  • add shared network and file access policy enforcement across HTTP fetches, browser rendering, and local file scraping
  • introduce secure scraper security defaults with explicit host, CIDR, and allowed-root overrides plus archive temp-file handling
  • document the trust model and add focused config, policy, and browser tests for the new access controls

Verification

  • created focused tests for config parsing, shared access policy behavior, and browser fetch enforcement
  • local command execution in this environment still cannot fully run vitest or tsc because vitest/config and vite/client resolution is broken here

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an OpenSpec change set describing a new, configurable security policy for outbound HTTP(S) fetches and local file:// access, with secure defaults and explicit opt-ins to reduce SSRF and unsafe local file traversal risks across fetch and scrape entry points.

Changes:

  • Introduces a proposal and design for shared outbound network + local file access policy enforcement (scraper.security).
  • Adds specification scenarios for blocking/allowing network targets, file-root containment, hidden paths, symlink handling, and archive workflows.
  • Provides an implementation task breakdown covering config/schema, shared policy helpers, traversal hardening, tests, and documentation updates.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
openspec/changes/add-fetch-access-controls/proposal.md Rationale and high-level scope for configurable fetch/scrape access controls.
openspec/changes/add-fetch-access-controls/design.md Design decisions and trade-offs for a shared access policy across network and file workflows.
openspec/changes/add-fetch-access-controls/tasks.md Concrete implementation plan and verification checklist.
openspec/changes/add-fetch-access-controls/specs/outbound-access-control/spec.md Requirements/scenarios for outbound network and local file access policy behavior.
openspec/changes/add-fetch-access-controls/specs/configuration/spec.md Requirements/scenarios for config/env override behavior and new scraper.security defaults.
openspec/changes/add-fetch-access-controls/.openspec.yaml Registers the change under the spec-driven OpenSpec schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arabold
Copy link
Copy Markdown
Owner Author

arabold commented Mar 29, 2026

OpenSpec review for add-fetch-access-controls

Overall: the core model is sound for the main scenario of allowing public internet access while denying local/private network access by default, with selective internal exceptions via host/CIDR allowlists. That scenario is possible and is covered by the current proposal/design/spec.

Main findings

  1. Browser-driven secondary requests are not explicitly covered by the spec.
    The current normative text focuses on top-level HTTP(S) fetches, redirects, and DNS-resolved targets, but the architecture also performs network access through Playwright page loads, iframe/frame loads, and sandbox-fetched external scripts. A public page could still trigger internal fetches during rendering unless the spec explicitly requires policy enforcement on all secondary and subresource requests, not just the initial URL and redirects.

  2. The design introduces revalidateAfterDnsResolution and revalidateRedirectTargets as if they are public config keys, but the configuration spec and tasks do not define them normatively.
    Either these are real config surface area and need requirements/tests, or they should be removed from the design examples.

  3. allowedHosts semantics are under-specified.
    The spec says a host or CIDR match can permit a private target while allowPrivateNetworks remains false, but it does not define precedence between hostname allowlisting and post-DNS IP validation. It should spell out behavior for multi-record DNS, dual-stack hosts, DNS rebinding, and hosts that resolve to both public and private addresses.

  4. The proposal does not cover a true outbound allowlist model for public internet hosts.
    Today’s design allows all public internet targets by default and uses allowedHosts/allowedCidrs only as exceptions for private access. If a deployment wants “only these public hosts plus this internal subnet,” that is not covered.

  5. The default file policy may be broader than the security framing suggests.
    The change is motivated as hardening for shared or exposed deployments, but the default still grants access to the entire $DOCUMENTS tree. That may be acceptable for local workflows, but it is not an obviously safe default for remotely reachable deployments.

  6. Operational behavior is incomplete around arrays and $DOCUMENTS.
    The configuration spec proves nested env-var naming, but not how arrays like allowedHosts, allowedCidrs, and allowedRoots are represented via environment variables, nor what happens if $DOCUMENTS cannot be resolved in a container or service account context.

  7. The internal temp-archive exception needs a tighter boundary.
    The design/spec say app-managed temp files for accepted web archives are exempt from user file-root checks, but they should explicitly limit that exception to the downloaded archive artifact and its virtual members, not any arbitrary temp path encountered later.

Direct answers to the intended scenarios

  • Allow all internet websites but restrict local/private network access by default: yes, this is covered.
  • Allow restricted local/private network access through specific host/CIDR exceptions while keeping the default deny: yes, this is covered.
  • Allow all local/private network access: conceptually yes via allowPrivateNetworks: true, though the spec should make the resulting breadth more explicit.
  • Restrict all outbound access to an explicit host allowlist only: no, that is not covered by the current model.

Counterintuitive behaviors worth documenting

  • allowPrivateNetworks: true broadens access; it does not mean “only allow the listed internal targets.”
  • allowedHosts and allowedCidrs are exceptions to the private-network block, not global outbound allowlists.
  • Hidden paths are blocked even when directly requested if includeHidden: false.
  • allowedRoots: [] in allowedRoots mode means deny all user-requested file:// access.
  • Supported web archive scraping intentionally bypasses normal file-root checks after the original network URL passes policy.

Recommended changes

  1. Add a requirement that the policy applies to every network request initiated during fetch/render/processing, including Playwright subresources, frames, iframes, and sandbox-fetched scripts.
  2. Decide whether revalidateAfterDnsResolution and revalidateRedirectTargets are real config fields. If yes, add requirements/defaults/tests. If no, remove them from examples.
  3. Clarify hostname vs IP evaluation order and exact allowlist semantics after DNS resolution.
  4. Decide whether a full outbound allowlist mode is needed for hardened deployments.
  5. Reconsider whether $DOCUMENTS should remain the default file-access root for exposed deployments, or document the tradeoff very explicitly.
  6. Specify env-var encoding for array settings and behavior when $DOCUMENTS cannot be resolved.

Net: the main access-control story is good, but the spec should more clearly cover browser/render-time secondary requests and tighten a few policy semantics before implementation.

arabold added 2 commits March 29, 2026 09:06
Apply shared network and file access policy across HTTP, browser, and local scraping flows with secure defaults and explicit allowlists.

Document the new security model and add focused config, policy, and browser coverage for the fetch access controls change.
@arabold arabold changed the title feat(openspec): add fetch access controls change feat: add fetch access controls Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants