Skip to content

[FEA] inspect wheel contents, remove anything unnecessary #410

@jameslamb

Description

@jameslamb

Is your feature request related to a problem? Please describe.

Recent libcuopt-cu13 wheels package more than 19,000 files.

And it looks like there are many things in there that I wouldn't expect to find in a wheel for a C++ shared library, like 50MB of HTML and 27MB of PNG files, .bat files, etc.

file size
  * compressed size: 0.738G
  * uncompressed size: 1.006G
  * compression space saving: 26.6%
contents
  * directories: 2465
  * files: 19408 (41 compiled)
size by extension
  * .so - 0.438G (43.5%)
  * .pack - 0.226G (22.5%)
  * .html - 50.317M (4.9%)
  * .hpp - 43.327M (4.2%)
  * .h - 39.824M (3.9%)
  * .a - 37.721M (3.7%)
  * .3 - 31.537M (3.1%)
  * .png - 27.33M (2.7%)
  * .cu - 22.121M (2.1%)
  * .cuh - 18.394M (1.8%)
  * .o - 16.885M (1.6%)
  * .idx - 9.856M (1.0%)
  * .cpp - 9.027M (0.9%)
...

(build link)

Describe the solution you'd like

Please scrutinize the contents of the wheels being produced here and try to identify some files that could be omitted.

Summaries like I shared above can be obtained in the CI logs of wheel-build-* CI jobs, or directly by pip downloading-ing wheels and running pydistcheck --inspect on them.

Some places to look:

  • use of install(DIRECTORY), install(FILES), or similar in CMake code (install(TARGETS) and dependency-tracking should be preferred)
  • MANIFEST.in rules
  • package_date configuration in pyproject.toml / setup.py
  • third-party dependencies being vendored instead of re-used from wheels (e.g., probably do not need to vendor the raft/ headers, a runtime dependency on libraft-cu{12,13} might be enough)

After eliminating as much data as possible, look at the reported compressed sizes and reduce thresholds like this:

[tool.pydistcheck]
select = [
"distro-too-large-compressed",
]
max_allowed_size_compressed = '900M'

Describe alternatives you've considered

N/A

Additional context

Smaller wheels, with fewer individual files, would mean:

  • faster builds
  • faster installation (in CI and for users)
  • smaller disk footprint for environments

Metadata

Metadata

Assignees

Labels

awaiting responseThis expects a response from maintainer or contributor depending on who requested in last comment.feature requestNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions