-
Notifications
You must be signed in to change notification settings - Fork 617
Feature: mc cat with parallel and buffer size args #5255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Test small and large data reads - Test single and multi-threaded scenarios - Test exact and uneven part boundaries - Test small buffer reads and edge cases - Test proper resource cleanup - Add mock client for isolated testing
Co-authored-by: Klaus Post <[email protected]>
* buffer size for user clarify * context.CancelCauseFunc to avoid duplicated context * buffer pools for re-using buffers * worker queue depth * 2 for non blocking parallel * 16MiB min buffer size
|
OK so I didn't find much benefit and was head scratching. Here is findings and proposal. Sorry I've made lots of changes but have tried to provide confidence by decent tests: original PR architecturechan chan request response pattern: The chan chan request response pattern is good for dynamic responses except our use case is for predictable, fixed-size parts. It created a tight coupling of Read() → Worker → Download → Read() and was pretty much just doing complex single-threaded download:
Proposed improved architectureProducer consumer pipeline architecture: Producer consumer pipeline pushes part numbers and workers continuously pull from the queue in parallel:
I had to also add |
You can run
|
klauspost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
I sent in a PR with updates for the linter: #5256
You can either wait for that or fix the issues reported by the tests.
@klauspost I ran the CI commands in #5256: go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -fix -test ./...
gofumpt -w .Tests and lints pass but now many changes included as part of this PR. Sorry if I misinterpreted your comment - in hindsight I should have merged that branch. |
|
Yeah, maybe just revert latest commit. |
bdba4bd to
06ae46d
Compare
|
@klauspost reverted now but includes the gofumt fixes to pass current CI |
|
@klauspost im still not seeing improvements like I expected and I think it's due to fundamental bottleneck of stdout io.Read() needs to be synchronous. I'm going to close this PR and create a new PR so its easier to copy > 5TB files |
Community Contribution License
All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.
Description
mc pipe already includes flags to make it fast. This PR adds
mc catsupport for--paralleland--buffer-sizeflags so it can be fast too.Motivation and Context
I have a fast on prem k8s cluster with Minio object store. I want to move a very large 15TB object from one bucket to another using mc cli. The most obvious choice
mc cpis limited bymc putlimit of 5TB following same behaviour as AWS. https://docs.min.io/enterprise/aistor-object-store/reference/aistor-server/thresholds/The next best is mc pipe which works great except is only as fast as the incoming stream. Enter mc cat. Other alternatives for copying across buckets on my k8s Minio instance:
Rather than deviate from AWS behaviour i.e. augmenting
mc cpto handle files greater than 5TB, this PR proposes to enablemc catto be fast so as not to limit the benefits when piping tomc pipeHow to test this PR?
run unit tests
Types of changes
Checklist:
commit-idorPR #here)