Skip to content

Performance gap between OCI Python SDK and boto3 for object downloads #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dreamtalen opened this issue Apr 9, 2025 · 0 comments
Open

Comments

@dreamtalen
Copy link

dreamtalen commented Apr 9, 2025

Environment details

  • Python version: 3.9.18
  • pip version: 23.2.1
  • oci version: 2.111.0

Issue

We are comparing the download performance of the OCI Python SDK and boto3 (AWS SDK). For the same objects stored in an OCI bucket, we’ve observed that the OCI SDK is approximately 20% to 50% slower than boto3 when downloading to memory.

Methods Tested with OCI SDK

  1. Using response.data.content :
response = self._oci_client.get_object(
    namespace_name=self._namespace, bucket_name=bucket, object_name=key, range=bytes_range
)
return response.data.content 
  1. Using response.data.raw.stream
    Get idea from this issue, this method is ~60% faster than method 1 but still ~20% slower than boto3:
response = self._oci_client.get_object(
    namespace_name=self._namespace, bucket_name=bucket, object_name=key, range=bytes_range
)
content = bytearray()
for chunk in response.data.raw.stream(1024 * 1024, decode_content=False):  # 1MB chunks
    content.extend(chunk)
return bytes(content)

Note: We tested various chunk sizes, but they did not yield further improvements.

boto3 Baseline Implementation

response = s3_client.get_object(Bucket=bucket_name, Key=key)
return response['Body'].read()

Performance Results

With ThreadPoolExecutor(max_workers=16), I got following average throughput downloading 64MB x 1000 objects from the same OCI bucket to memory:

  • boto3 get_object: 9.8 Gbps
  • OCI SDK response.data.content: 4.1 Gbps
  • OCI SDK response.data.raw.stream: 6.8 Gbps

The gap remains consistent across multiple test runs, including various multithreaded and multiprocessed setups.

Questions

  1. Is this performance gap expected?
  2. Are there any recommended optimizations or best practices for improving download performance with the OCI Python SDK?
  3. Are there any internal differences in how OCI supports S3-compatible APIs handling downloads that might explain the performance gap?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant