Description
Describe the feature
Allow S3AsyncClient.getObject
to write downloaded objects directly to disk, rather than buffering in ByteBuffers via an AsyncResponseTransformer
.
aws-crt-java
recently added support for this under the hood: awslabs/aws-crt-java#825
Use Case
When dealing with large objects (10GB+) and high speeds (10Gb/s), the Java heap is quickly exhausted when downloading files via GetObject, even if their destination is on disk via e.g. client.getObject(req, AsyncResponseTransformer.toFile(file))
This causes gigabytes of unnecessary allocations and GCs, to the point of the AWS Java SDK not being feasible for my application that deals with large files.
My current solution is to call a standalone native binary to perform this download to disk, which adds plenty of extra complexity and loses the many benefits of using your SDK.
Another advantage was stated in the crt-java repo: awslabs/aws-crt-java#825 (comment)
It would lower latency, by removing an additional copy from C -> Java, and improve memory usage (no need to allocate a ByteBuffer to hold the additional copy).
Proposed Solution
No preference how this is implemented, either a standalone S3AsyncClient::getObjectToFile
interface method, or an option on GetObjectRequest.
Other Information
I don't have any issue when calling PutObject
for very large files from disk. The JVM heap usage stay very low.
Acknowledgements
- I may be able to implement this feature request
- This feature might incur a breaking change
AWS Java SDK version used
2.28.20
JDK version used
21
Operating System and version
Ubuntu 24.04