feat: Add S3 External Storage Driver (Claim Check pattern)#2924
feat: Add S3 External Storage Driver (Claim Check pattern)#2924brucearctor wants to merge 2 commits into
Conversation
Implements temporalio#2882 — porting the External Storage feature from the Go and Python SDKs to Java. Core interfaces (temporal-sdk): - StorageDriver: interface for external storage drivers - StorageDriverClaim: claim reference token stored in Event History - StorageDriverStoreContext/RetrieveContext: identity context for drivers - StorageDriverSelector: multi-driver selection strategy - ExternalStorage: configuration with builder pattern - StorageDriverException: driver error type Pipeline integration: - CodecDataConverter: wired external storage as last pipeline stage (after PayloadCodec). Payloads >= threshold are stored externally and replaced with claim tokens. Claims are auto-detected and resolved on deserialization. - WorkflowClientOptions: added setExternalStorage() builder method S3 Driver (temporal-s3-external-storage module): - S3Client: abstract interface for S3 operations (testability) - AwsSdkV2S3Client: AWS SDK v2 adapter - S3StorageDriver: content-addressed storage using SHA-256 hashing - S3StorageDriverOptions: builder-based configuration Tests: - 11 core unit tests (ExternalStorageTest) - 6 S3 driver unit tests (S3StorageDriverTest)
Critical fixes: - Wire ExternalStorage from WorkflowClientOptions into CodecDataConverter - Apply external storage to failure encode/decode paths (encodePayloads/decodePayloads) - Namespace claim data keys with 'claim.' prefix to prevent metadata key collisions - Remove dynamic bucket override in S3StorageDriver.retrieve() (confused deputy fix) - URL-encode path segments in S3 key construction (path injection fix) - Return null from buildStoreContext() when no context (instead of 'unknown'/'unknown') Improvements: - Add SLF4J logging to S3StorageDriver and AwsSdkV2S3Client (DEBUG/WARN) - Add serialVersionUID to StorageDriverException - Remove misleading 'throws StorageDriverException' from StorageDriver interface - Add null checks in StorageDriverStoreContext factory methods - Validate setDrivers() list elements for null in ExternalStorage.Builder - Add equals()/hashCode() to ExternalStorage - Replace unreachable fallback with IllegalStateException in selectDriver() - Remove redundant IOException catch block in AwsSdkV2S3Client - Add withExternalStorage() method to CodecDataConverter Documentation: - Add retry configuration guidance (AWS SDK v2 built-in retries) - Document SSE encryption, S3 Lifecycle Policies, memory implications - Add thread-safety notes to StorageDriver, S3Client, context classes - Document Claim Check pattern, threshold rationale, server limits - Warn about reserved claim data key names Tests: - Add full pipeline integration test (serialize -> externalize -> retrieve -> deserialize) - Add small payload inline test (below threshold) - Add claim key collision protection test - Add WorkflowClientOptions wiring test
|
Thanks for making this contribution. There is an existing PR (#2907) for contributing an S3 driver in Java. I think that it is best to continue to provide feedback there and merge that PR when it is ready. |
|
@jmaeagle99 -- any reason the PR wasn't linked to the issue? It is disappointing to devote time to address open issues -- seeing nobody assigned and no PR attached and thinking that is useful time to help the project. Then to be notified, there is unlinked work in process and that was time not well spent. Ultimately, that is discouraging and creates a higher bar for me to be contributing if i need to check around for open PRs that are not linked and when nobody is assigned? Any thoughts on how to assuage that concern? Is there a community expectation that PRs would be linked? That if someone is working on something they would be assigned? Ex: what is the community expectation [ ex: that I need to check for PRs that are not linked to open issues ]?? Generally, I look around for issues not assigned and not having associated PRs. Otherwise, what's the reccommended process for how to go ahead and figure out where to contribute? Is it expected I wait and ask and hope for a response before working on a contribution [ I suspect that's added friction, which doesn't help many ]? |
Summary
Implements the S3 External Storage Driver for the Java SDK, as described in #2882 and the cross-SDK feature spec temporalio/features#783. This ports the Claim Check pattern already implemented in the Go and Python SDKs.
Problem
Temporal has a payload size limit (~2 MiB). Workflows dealing with large data (ML models, documents, media) hit this limit.
Solution
External Storage adds a transparent stage to the data converter pipeline that automatically offloads large payloads to S3 and replaces them with lightweight claim tokens in Event History.
Wire Format (Cross-SDK Compatible)
Claim token payloads use encoding
temporal-external-storage/claimwith claim data stored as prefixed metadata entries (claim.*). Compatible with Go and Python SDK implementations.Changes
New Module:
temporal-s3-external-storageS3StorageDriver— StorageDriver impl with content-addressed storage (SHA-256)S3StorageDriverOptions— Builder-based config (bucket, key prefix, driver name)S3Client/AwsSdkV2S3Client— Abstracted S3 client for testabilityCore SDK (
temporal-sdk)StorageDriverinterface — SPI for storage backendsExternalStorage— Config (threshold, driver selection)CodecDataConverter— Integration as terminal pipeline stageWorkflowClientOptions.setExternalStorage()— Auto-wires into data converterUsage
Security
claim.Testing
Closes #2882