Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: Add CRC32C checksum support for GCS #5635

Open
1 task done
zjregee opened this issue Feb 18, 2025 · 2 comments
Open
1 task done

new feature: Add CRC32C checksum support for GCS #5635

zjregee opened this issue Feb 18, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@zjregee
Copy link
Member

zjregee commented Feb 18, 2025

Feature Description

CRC32C is the algorithm recommended in the official documentation of GCS, and because OpenDAL S3 already provides a CRC32C checksum implementation, I think it is intuitive to add CRC32C checksum support to GCS.

Problem and Solution

GCS Docs: https://cloud.google.com/storage/docs/data-validation.

Additional Context

No response

Are you willing to contribute to the development of this feature?

  • Yes, I am willing to contribute to the development of this feature.
@zjregee zjregee added the enhancement New feature or request label Feb 18, 2025
@Xuanwo
Copy link
Member

Xuanwo commented Feb 18, 2025

Hi, before making an effort to add more checksum support to the services, I'm interested in discussing how we can achieve end-to-end checksum support at #5549.

@zjregee
Copy link
Member Author

zjregee commented Feb 18, 2025

Hi, @Xuanwo , thank you very much for your suggestions. I think we can first list some common services that support checksum algorithms, which may include the supported checksum algorithms, the use of checksum algorithms in different read and write methods, the scope of data verification, etc. I would be happy to find and provide a list like this.

In addition, after choosing some common checksum algorithms, maybe these services will initially just rely on these common functions and expose them to users by setting configuration like we do now, and then consider refactoring this part by possibly adding layers; or perhaps from the beginning, the most general design considerations are used to determine the future application mode of checksum in OpenDAL, and implement it on this basis.

I think the advantage of the first approach is that we can implement checksum for the most commonly used services first and accumulate knowledge for future general designs in the process. The advantage of the second approach is that if we design it clearly from the beginning, we can avoid repeated implementation.

These are some of my simple ideas, and hope to get any suggestions. 💗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants