Skip to content

Conversation

@afkbluey
Copy link

@afkbluey afkbluey commented Nov 1, 2025

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

This PR enables mc cp to handle files greater than or equal to 5TB by:

  1. defaulting to streaming where previously mc cp failed
  2. exposing flags --part-size and --parallel for configurable copy

See below

Strategy Conditions Data Flow Multipart Support Part Size & Parallel Flags
Stream Copy - Different aliases
- OR file size >= 5 TiB
- OR --zip flag used
- OR --checksum flag used
Download + Upload
(through client)
Yes
(via standard multipart upload API)
Enabled by this PR
--part-size and --parallel
Server-Side Copy - Same alias (source & target)
- File size < 5 TiB
- No --zip flag
- No --checksum flag
No data through client
(server-to-server)
Yes but not configurable
(via ComposeObject API with X-Amz-Copy-Source-Range)
Disabled but proposed in 2175 of minio-go sdk
api currently doesn't support parallel
PUT without Multipart - File size < 64 MiB (default)
- OR --disable-multipart flag
Single PUT request No Not applicable

Previously I attempted to make mc cat fast but realised fundamental bottleneck of io.Read which is sequential: #5255

Motivation and Context

I want to efficiently copy 12TB file across buckets on same host. Currently mc cp doesn't allow configuring parallelisation or part size.

UPDATE: when embarking on this change, I found undocumented environment variables but it will won't support files >= 5TB:

image

e.g. can be called:

MC_UPLOAD_MULTIPART_SIZE=128MiB MC_UPLOAD_MULTIPART_THREADS=8 mc cp largefile.zip myminio/mybucket/

How to test this PR?

run unit tests

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Unit tests added/updated
  • Internal documentation updated
  • Create a documentation update request here

@afkbluey afkbluey changed the title feat: part size, parallel and explcit behaviour of 5TB cp Feature: cp with --part-size and --parallel flags and handle objects greater than 5TB Nov 2, 2025
@afkbluey afkbluey marked this pull request as ready for review November 2, 2025 04:53
@afkbluey afkbluey changed the title Feature: cp with --part-size and --parallel flags and handle objects greater than 5TB Feature: cp with --part-size and --parallel flags and handle objects greater than 5TB Nov 2, 2025
@afkbluey afkbluey changed the title Feature: cp with --part-size and --parallel flags and handle objects greater than 5TB Feature: cp with --part-size and --parallel flags enabling copy support for objects >= 5TB Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant