feat: Add S3 External Storage Driver (Claim Check pattern) by brucearctor · Pull Request #2924 · temporalio/sdk-java

brucearctor · 2026-06-21T04:11:59Z

Summary

Implements the S3 External Storage Driver for the Java SDK, as described in #2882 and the cross-SDK feature spec temporalio/features#783. This ports the Claim Check pattern already implemented in the Go and Python SDKs.

Problem

Temporal has a payload size limit (~2 MiB). Workflows dealing with large data (ML models, documents, media) hit this limit.

Solution

External Storage adds a transparent stage to the data converter pipeline that automatically offloads large payloads to S3 and replaces them with lightweight claim tokens in Event History.

Wire Format (Cross-SDK Compatible)

Claim token payloads use encoding temporal-external-storage/claim with claim data stored as prefixed metadata entries (claim.*). Compatible with Go and Python SDK implementations.

Changes

New Module: `temporal-s3-external-storage`

S3StorageDriver — StorageDriver impl with content-addressed storage (SHA-256)
S3StorageDriverOptions — Builder-based config (bucket, key prefix, driver name)
S3Client / AwsSdkV2S3Client — Abstracted S3 client for testability

Core SDK (`temporal-sdk`)

StorageDriver interface — SPI for storage backends
ExternalStorage — Config (threshold, driver selection)
CodecDataConverter — Integration as terminal pipeline stage
WorkflowClientOptions.setExternalStorage() — Auto-wires into data converter

Usage

S3StorageDriver driver = new S3StorageDriver(
    S3StorageDriverOptions.newBuilder()
        .setClient(new AwsSdkV2S3Client(awsS3Client))
        .setBucket("my-temporal-payloads")
        .build());

WorkflowClientOptions options = WorkflowClientOptions.newBuilder()
    .setExternalStorage(ExternalStorage.newBuilder().addDriver(driver).build())
    .build();
// Payloads > 256 KiB are now automatically offloaded to S3

Security

Path injection prevention: URL-encoded path segments in S3 keys
Confused deputy prevention: retrieve() ignores bucket from claim data
Metadata key collision prevention: claim data keys prefixed with claim.

Testing

21 tests total (15 core + 6 S3 driver)
Full pipeline integration test
Java 8 compatible

Closes #2882

Implements temporalio#2882 — porting the External Storage feature from the Go and Python SDKs to Java. Core interfaces (temporal-sdk): - StorageDriver: interface for external storage drivers - StorageDriverClaim: claim reference token stored in Event History - StorageDriverStoreContext/RetrieveContext: identity context for drivers - StorageDriverSelector: multi-driver selection strategy - ExternalStorage: configuration with builder pattern - StorageDriverException: driver error type Pipeline integration: - CodecDataConverter: wired external storage as last pipeline stage (after PayloadCodec). Payloads >= threshold are stored externally and replaced with claim tokens. Claims are auto-detected and resolved on deserialization. - WorkflowClientOptions: added setExternalStorage() builder method S3 Driver (temporal-s3-external-storage module): - S3Client: abstract interface for S3 operations (testability) - AwsSdkV2S3Client: AWS SDK v2 adapter - S3StorageDriver: content-addressed storage using SHA-256 hashing - S3StorageDriverOptions: builder-based configuration Tests: - 11 core unit tests (ExternalStorageTest) - 6 S3 driver unit tests (S3StorageDriverTest)

Critical fixes: - Wire ExternalStorage from WorkflowClientOptions into CodecDataConverter - Apply external storage to failure encode/decode paths (encodePayloads/decodePayloads) - Namespace claim data keys with 'claim.' prefix to prevent metadata key collisions - Remove dynamic bucket override in S3StorageDriver.retrieve() (confused deputy fix) - URL-encode path segments in S3 key construction (path injection fix) - Return null from buildStoreContext() when no context (instead of 'unknown'/'unknown') Improvements: - Add SLF4J logging to S3StorageDriver and AwsSdkV2S3Client (DEBUG/WARN) - Add serialVersionUID to StorageDriverException - Remove misleading 'throws StorageDriverException' from StorageDriver interface - Add null checks in StorageDriverStoreContext factory methods - Validate setDrivers() list elements for null in ExternalStorage.Builder - Add equals()/hashCode() to ExternalStorage - Replace unreachable fallback with IllegalStateException in selectDriver() - Remove redundant IOException catch block in AwsSdkV2S3Client - Add withExternalStorage() method to CodecDataConverter Documentation: - Add retry configuration guidance (AWS SDK v2 built-in retries) - Document SSE encryption, S3 Lifecycle Policies, memory implications - Add thread-safety notes to StorageDriver, S3Client, context classes - Document Claim Check pattern, threshold rationale, server limits - Warn about reserved claim data key names Tests: - Add full pipeline integration test (serialize -> externalize -> retrieve -> deserialize) - Add small payload inline test (below threshold) - Add claim key collision protection test - Add WorkflowClientOptions wiring test

jmaeagle99 · 2026-06-22T19:17:14Z

Thanks for making this contribution. There is an existing PR (#2907) for contributing an S3 driver in Java. I think that it is best to continue to provide feedback there and merge that PR when it is ready.

brucearctor · 2026-06-22T23:43:39Z

@jmaeagle99 -- any reason the PR wasn't linked to the issue?

It is disappointing to devote time to address open issues -- seeing nobody assigned and no PR attached and thinking that is useful time to help the project. Then to be notified, there is unlinked work in process and that was time not well spent.

Ultimately, that is discouraging and creates a higher bar for me to be contributing if i need to check around for open PRs that are not linked and when nobody is assigned?

Any thoughts on how to assuage that concern? Is there a community expectation that PRs would be linked? That if someone is working on something they would be assigned? Ex: what is the community expectation [ ex: that I need to check for PRs that are not linked to open issues ]??

Generally, I look around for issues not assigned and not having associated PRs. Otherwise, what's the reccommended process for how to go ahead and figure out where to contribute? Is it expected I wait and ask and hope for a response before working on a contribution [ I suspect that's added friction, which doesn't help many ]?

brucearctor added 2 commits June 20, 2026 20:40

brucearctor requested a review from a team as a code owner June 21, 2026 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add S3 External Storage Driver (Claim Check pattern)#2924

feat: Add S3 External Storage Driver (Claim Check pattern)#2924
brucearctor wants to merge 2 commits into
temporalio:masterfrom
brucearctor:s3-external-storage-driver

brucearctor commented Jun 21, 2026

Uh oh!

jmaeagle99 commented Jun 22, 2026

Uh oh!

brucearctor commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brucearctor commented Jun 21, 2026

Summary

Problem

Solution

Wire Format (Cross-SDK Compatible)

Changes

New Module: temporal-s3-external-storage

Core SDK (temporal-sdk)

Usage

Security

Testing

Uh oh!

jmaeagle99 commented Jun 22, 2026

Uh oh!

brucearctor commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New Module: `temporal-s3-external-storage`

Core SDK (`temporal-sdk`)