Skip to content

S3AsyncClient getObject write file directly to disk #5660

Open
@earlybard

Description

@earlybard

Describe the feature

Allow S3AsyncClient.getObject to write downloaded objects directly to disk, rather than buffering in ByteBuffers via an AsyncResponseTransformer.

aws-crt-java recently added support for this under the hood: awslabs/aws-crt-java#825

Use Case

When dealing with large objects (10GB+) and high speeds (10Gb/s), the Java heap is quickly exhausted when downloading files via GetObject, even if their destination is on disk via e.g. client.getObject(req, AsyncResponseTransformer.toFile(file))

This causes gigabytes of unnecessary allocations and GCs, to the point of the AWS Java SDK not being feasible for my application that deals with large files.

My current solution is to call a standalone native binary to perform this download to disk, which adds plenty of extra complexity and loses the many benefits of using your SDK.


Another advantage was stated in the crt-java repo: awslabs/aws-crt-java#825 (comment)

It would lower latency, by removing an additional copy from C -> Java, and improve memory usage (no need to allocate a ByteBuffer to hold the additional copy).

Proposed Solution

No preference how this is implemented, either a standalone S3AsyncClient::getObjectToFile interface method, or an option on GetObjectRequest.

Other Information

I don't have any issue when calling PutObject for very large files from disk. The JVM heap usage stay very low.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS Java SDK version used

2.28.20

JDK version used

21

Operating System and version

Ubuntu 24.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    crt-clientfeature-requestA feature should be added or improved.p2This is a standard priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions