Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] Potential improvements around supply chain security #44688

Open
pitrou opened this issue Nov 9, 2024 · 6 comments
Open

[C++][Python] Potential improvements around supply chain security #44688

pitrou opened this issue Nov 9, 2024 · 6 comments

Comments

@pitrou
Copy link
Member

pitrou commented Nov 9, 2024

Describe the enhancement requested

For now this is more of a wishlist/discussion issue, but could grow into a more precise meta-task if we want to move forward.

There have been growing concerns over the years over the fragility of software supply chains, particularly when open source software is concerned. Some standards and practices have been proposed to help prevent such attacks:

  • SLSA (apparently pronounced "salsa") is, AFAIU, a specification that help projects evaluate and improve their build and test practices
  • Software bills of materials (SBOM) are a type of artifact that precisely describe the provenance of code shipped within a package (related link: announcement of a "SBOM for Python packages" project; also: Accelerating SBOM success with the help of SLSA)
  • OpenSSF scorecards provide a standard vocabulary to evaluate a software projects' security practices
  • Reproducible builds help ensure that binary artifacts have not been compromised, by allowing independent verification of build outputs

Arrow C++ in particular has a non-trivial set of dependencies that are incorporated in the build process in various ways. For example, for Python wheels we use vcpkg on a specific changeset, potentially with home-grown patches. This of course applies to other bindings of Arrow C++ where we may produce binary packages (such as R).

We should evaluate whether any of these could help us improve our intrinsic quality, or would merely amount to additional bureaucracy (related link: concerns by a prominent member of the Python packaging community).

Note: if desirable, this could, and should, typically be funded by interested companies.

Component(s)

C++, Python

@pitrou
Copy link
Member Author

pitrou commented Nov 9, 2024

cc @raulcd @assignUser

@assignUser
Copy link
Member

+1
Reproducible builds could also improve security but will likely be a big effort (except for java where there are afaik plugins that handle it).

@pitrou
Copy link
Member Author

pitrou commented Nov 12, 2024

Good point, I think we should strive to achieve reproduce builds.

@raulcd
Copy link
Member

raulcd commented Nov 12, 2024

Reproducible builds would help on releases too. It could potentially make patch and minor releases easier to achieve.

@pitrou
Copy link
Member Author

pitrou commented Nov 14, 2024

Apparently someone has been producing automated OpenSSF scorecards, e.g. for PyArrow: https://deps.dev/pypi/pyarrow

@assignUser
Copy link
Member

🧐
pyarrow 18.0.0 has no dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants