Skip to content

Implement proxy mount read caching#751

Open
scuffi wants to merge 4 commits into
mainfrom
improve-mount-proxy-read
Open

Implement proxy mount read caching#751
scuffi wants to merge 4 commits into
mainfrom
improve-mount-proxy-read

Conversation

@scuffi

@scuffi scuffi commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Reduce redundant upstream requests during credential-proxy mounted reads by adding a short-lived HEAD metadata response cache.

s3fs issues frequent HEAD requests for metadata (getattr) on every file open, stat, and read. With credential-proxy mounts, each HEAD request was forwarded upstream through the signing proxy, adding latency to repeated reads. This PR keeps the existing aws4fetch/AwsClient signing path and focuses the optimization on safe metadata caching.

Changes

HEAD metadata cache (s3-credential-proxy-handler.ts)

  • Positive cache: Cache successful HEAD responses for 60s.
  • Negative cache: Cache 404 HEAD responses for 5s to avoid repeated existence checks for non-existent paths (s3fs probes path/, path_$folder$ variants).
  • PUT priming: After a successful PUT, synthesize and cache a HEAD-equivalent entry from request/response metadata such as content-length, content-type, etag, last-modified, and x-amz-meta-*.
  • Conservative bypasses: Do not cache ranged, conditional, checksum-mode, SSE-C, or GCS customer-encryption HEAD requests.
  • Selective invalidation: Mutating methods (PUT, POST, DELETE) invalidate cached metadata. GET requests preserve cached metadata.
  • Copy/multipart safety: Do not prime from query-string PUTs or copy operations; multipart/query mutations invalidate affected metadata.
  • Size bound: Cache is limited to 1,000 entries with TTL-aware eviction, falling back to FIFO eviction if still over limit.

Cache lifecycle (sandbox.ts)

  • evictHeadMetadataCacheForMount is called during unmount, mount-failure cleanup, and sandbox teardown, matching the existing SigV4 client and directory-marker cache cleanup paths.

Request forwarding safety

  • Strip hop-by-hop/proxy headers before forwarding credential-proxy requests upstream.
  • Preserve SigV4 request bodies that do not include content-length instead of dropping the stream.
  • Keep SigV4 signing on the existing aws4fetch AwsClient path.

Benchmark results (from repro)

Step Direct S3 Credential Proxy Delta
read-small (1 KiB) 133ms 206ms +73ms
read-large (512 KiB) 69ms 100ms +31ms
read-large-repeat (5x) 388ms 285ms -103ms
cached-head 20x reads 2102ms 1297ms -805ms
list-files 1898ms 107ms -1791ms

Credential-proxy now avoids redundant upstream HEAD requests on repeated metadata reads.

@changeset-bot

changeset-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 9b670bf

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 11, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@751

commit: 9b670bf

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

📦 Preview Build

Version: 0.0.0-pr-751-9b670bfd

Install the SDK preview:

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@751

🐳 Docker images were not rebuilt — no container changes detected. Use the latest release images from Docker Hub.

scuffi added 3 commits June 16, 2026 15:40
Keep the established SigV4 signer for credential-proxy mounts while
retaining the metadata cache behavior. This keeps the cache change focused
on reducing redundant HEAD requests without expanding signing risk.
@scuffi scuffi marked this pull request as ready for review June 18, 2026 10:25

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@aron-cf aron-cf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into the HEAD traffic. Did you look at tweaking s3fs’s own stat cache? I don't really want to add an additional layer when the one in s3fs should be doing this already (unless I've missed something).

Right now sandbox-sdk sets these defaults for R2-backed s3fs mounts:

stat_cache_expire: '60',
enable_noobj_cache: true,
multipart_size: '5'

s3fs also supports max_stat_cache_size, stat_cache_expire, and negative caching (enable_negative_cache / disable_negative_cache; negative cache appears to be enabled by default in current s3fs).

Have we tried to repro with larger s3fs options first? for example:

s3fsOptions: [
  'stat_cache_expire=300',
  'max_stat_cache_size=100000',
  'enable_negative_cache'
]

@aron-cf aron-cf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at s3fs I think this is probably useful for common cases where a bucket is mounted and owned by a single container or the bucket is largely read only.

It's a bit risky for anything intended to be shared collaboratively. I'd suggest we roll this out under an opt-in flag, with configurable timeouts and document which usecases will benefit and which wont.

const DEFAULT_SLOW_REQUEST_MS = 1000;
const ERROR_RESPONSE_BODY_LIMIT = 2048;
const MAX_DIAGNOSTIC_EVENTS = 500;
const HEAD_METADATA_CACHE_TTL_MS = 60_000;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels risky, if something updates/deletes the object, we keep serving the old HEAD result for up to a minute. Could we make this shorter or configurable... this works well for cases where this bucket is not changing often but if the bucket is shared with another writer then this will get problematic.

const ERROR_RESPONSE_BODY_LIMIT = 2048;
const MAX_DIAGNOSTIC_EVENTS = 500;
const HEAD_METADATA_CACHE_TTL_MS = 60_000;
const NEGATIVE_HEAD_METADATA_CACHE_TTL_MS = 5_000;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, this means an object created externally will be missing for 5s.

}
}

function getHeadMetadataCacheKey(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key is scoped by mountId, not by the underlying bucket/endpoint identity, this doesn't feel right. If you have the same bucket mounted in two places the cache should be the same right?

return `${mountId}:${realPath}${url.search}`;
}

function getCachedHeadMetadataResponse(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defeats s3fs’s open-time revalidation. s3fs intentionally drops its own stat cache on reopen and sends a HEAD; with this cache, that HEAD may be answered locally instead of actually revalidating against the bucket. I think this behaviour in s3fs is probably what led to this PR. But it makes me think that this cache layer should be opt-in and extremely configurable.

});
}

function cacheHeadMetadataFromPUT(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat. How do we ensure that we're accurately matching what the provider actually responds with? Feels like this might need an integration test to verify our code matches the various provider implementations.

}
} else if (isMutatingMethod(method)) {
deleteDirectoryMarkerCacheEntry(mountId, realPath);
deleteHeadMetadataCacheEntriesForObject(mountId, realPath);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-HEAD mutating requests, we invalidate before forwarding. There’s a possible race where a HEAD during an in-flight DELETE/POST/copy/multipart operation can cache an old upstream metadata, and then the mutation succeeds. Should we also invalidate after successful mutations?

);

if (response.ok) {
directoryMarkerCache.set(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory marker cache looks like it has no TTL. External changes can leave us with stale directory metadata for the lifetime of the mount unless something local invalidates it. Can we add the same invalidation pattern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants