Skip to content

Conversation

@HJLebbink
Copy link
Member

@HJLebbink HJLebbink commented Oct 30, 2025

Summary

  • Removed unnecessary intermediate Vec<Bytes> allocation in HTTP request body preparation
  • Directly consume SegmentedBytes iterator into stream using Arc::unwrap_or_clone()
  • Implemented zero-cost BodyIterator enum to avoid heap allocation and dynamic dispatch
  • Reduces memory overhead and eliminates Vec growth reallocations during request preparation

Memory Impact Analysis

Important: Bytes is reference-counted, so cloning it never copies the actual byte data - just increments a reference count. The 1GB of actual data is
never duplicated in either implementation. The difference is in metadata overhead - the collection of Bytes pointers.

Scenario 1: Multipart Upload (5MB parts)

For a 1GB file with MIN_PART_SIZE = 5 MiB:

  • Number of parts: 1,024 MB ÷ 5 MB = ~205 parts
  • Each Bytes pointer: 32 bytes (on 64-bit systems)

Original Implementation:

  • Creates Vec with 205 entries
  • Memory overhead: 205 × 32 = 6,560 bytes ≈ 6.4 KB
  • Plus Vec's capacity over-allocation (typically 1.5-2x during growth)
  • Total: ~10-13 KB

New Implementation:

  • BodyIterator enum: 16 bytes
  • Iterator state (indices): ~24 bytes
  • Total: ~40 bytes

Savings: ~10-13 KB

Scenario 2: Streaming Read (smaller chunks)

If the 1GB is read from a stream in smaller buffers (e.g., 64KB chunks common in I/O):

  • Number of chunks: 1,048,576 KB ÷ 64 KB = 16,384 chunks

Original Implementation:

  • Vec with 16,384 entries
  • Memory overhead: 16,384 × 32 = 524,288 bytes = 512 KB
  • With Vec growth: ~750 KB - 1 MB

New Implementation:

  • Still ~40 bytes

Savings: ~750 KB - 1 MB

Additional Benefits

  1. Zero-cost abstraction - Enum dispatch is optimized away at compile time through monomorphization
  2. No heap allocation - Unlike Box, the enum is stack-allocated
  3. No reallocation overhead - Vec growth causes multiple allocations and copies during .collect()
  4. Better cache locality - Preserves the nested SegmentedBytes structure which may already be in cache
  5. Lower peak memory - No temporary spike when collecting into the intermediate Vec

Technical Details

Arc::unwrap_or_clone() efficiently handles the Arc:

  • If sole owner: unwraps with zero cost
  • If shared: clones only the SegmentedBytes structure (Vec of Vecs), not the underlying byte data

The BodyIterator enum provides type unification without runtime overhead, as the enum variants are resolved at compile time.

@HJLebbink HJLebbink requested a review from Copilot October 30, 2025 13:20
@HJLebbink HJLebbink self-assigned this Oct 30, 2025
@HJLebbink HJLebbink added the cleanup-rewrite Used in release doc generation label Oct 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the body stream creation for PUT and POST requests by eliminating an unnecessary intermediate vector allocation. The change removes the TODO comment about the inefficiency and directly converts the Arc<SegmentedBytes> into an iterator.

Key Changes:

  • Removed intermediate Vec<Bytes> collection that caused unnecessary memory allocation
  • Used Arc::unwrap_or_clone() to efficiently extract or clone the SegmentedBytes before converting to iterator
  • Simplified stream creation by directly mapping the iterator

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment on lines 539 to 574
if (*method == Method::PUT) || (*method == Method::POST) {
//TODO: why-oh-why first collect into a vector and then iterate to a stream?
let bytes_vec: Vec<Bytes> = match body {
Some(v) => v.iter().collect(),
None => Vec::new(),
let iter = match body {
Some(v) => BodyIterator::Segmented(Arc::unwrap_or_clone(v).into_iter()),
None => BodyIterator::Empty(std::iter::empty()),
};
let stream = futures_util::stream::iter(
bytes_vec.into_iter().map(|b| -> Result<_, Error> { Ok(b) }),
);
let stream = futures_util::stream::iter(iter.map(|b| -> Result<_, Error> { Ok(b) }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, we're still loading the whole body into memory within SegmentedBytes, wouldn't it make sense to have that already as a Stream so that e.g. uploading a file would not have to load the whole thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing implementation is already fairly efficient, since nothing actually gets copied. The bytes are split into a vector of byte segments, it might look like data is being copied, but that is not the case, because Bytes is reference-counted. Still, if there are 100+ segments, all those reference counts need to be updated, and that is what is being changed here. And I didn't want to change SegmentedBytes. But yes, maybe there are better optimizations to get rid of the segments in the first place.

@HJLebbink HJLebbink force-pushed the minor-update branch 2 times, most recently from cb97a7e to 19d7c3a Compare October 30, 2025 13:46
…est body preparation

  - Removed unnecessary intermediate `Vec<Bytes>` allocation in HTTP request body preparation
  - Directly consume `SegmentedBytes` iterator into stream using `Arc::unwrap_or_clone()`
  - Implemented zero-cost `BodyIterator` enum to avoid heap allocation and dynamic dispatch
  - Reduces memory overhead and eliminates Vec growth reallocations during request preparation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup-rewrite Used in release doc generation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants