Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo metadata: provide an option not to collect data about dev-dependencies #14794

Open
Shnatsel opened this issue Nov 8, 2024 · 12 comments
Open
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-metadata S-needs-team-input Status: Needs input from team on whether/how to proceed. Z-avoid-dev-deps Nightly: avoid-dev-deps

Comments

@Shnatsel
Copy link
Member

Shnatsel commented Nov 8, 2024

Problem

In certain environments that tightly control dependencies (e.g. for the purposes of supply-chain security) the dev-dependencies are not present when code is compiled. But cargo metadata always tries to collect data for the entire dependency tree, including dev-dependencies. It errors out and reports no info if dev-dependencies are missing.

In practical terms this blocks the deployment of cargo auditable (which relies on cargo metadata) within Debian (which does not package dev-dependencies).

This is similar to #10718, but this isn't just a problem with feature unification - the dev-dependency packages are not present at all.

Proposed Solution

Adding a new flag, --no-dev-deps (following the precedent of the existing --no-deps flag) so that cargo metadata wouldn't attempt to query dev-dependencies would resolve this.

Cargo generating a SBOM natively would also solve this by letting cargo auditable bypass cargo metadata completely. But that's going to take a long time to reach stable. That feature isn't even on nightly yet, since the reviews of the RFC and the implementation PR seem to have stalled.
Making it possible to query cargo metadata without requiring dev-dependencies to be present seems like a simpler and more straightforward change that shouldn't require as much design and stabilization work.

Notes

cargo auditable got a lot of uptake from Linux distributions. 5 distros, including Alpine and NixOS, are already building all their Rust packages with it. There is also interest from Debian, but this issue makes adopting cargo auditable a non-starter in their environment.

@Shnatsel Shnatsel added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage. labels Nov 8, 2024
@weihanglo
Copy link
Member

Correct me if I'm wrong.
This proposal, underneath the hood, is more like asking Cargo to resolve dependencies as if no dev-dependencies are present in Cargo.toml.

@epage
Copy link
Contributor

epage commented Nov 8, 2024

In certain environments that tightly control dependencies (e.g. for the purposes of supply-chain security) the dev-dependencies are not present when code is compiled. But cargo metadata always tries to collect data for the entire dependency tree, including dev-dependencies. It errors out and reports no info if dev-dependencies are missing.

In those situations, it isn't only cargo metadata that would fail but every cargo command. That seems like a highly specialized use case and I would be worried about exposing those kind of knobs.

Also, the dependency resolver can skip dev dependencies today but I'm unsure if that is hard requirement for the use cases that use it or if we could move that to a later stage which would have a lot of other benefits for Cargo. However, that would be a no-Cargo.lock mode which would be non-obvious and we've tried to avoid doing that in other situations because it has problems (#10096).

I have been toying with how we could construct an alternative set of plumbing commands that could give more control or be swapped out with custom logic in different situations. We'd need a lot of experimentation in third party code before we could have a good feel for for whether these would meet people's needs and not overly restrict Cargo.

@Shnatsel
Copy link
Member Author

Shnatsel commented Nov 8, 2024

This proposal, underneath the hood, is more like asking Cargo to resolve dependencies as if no dev-dependencies are present in Cargo.toml.

Ideally, yes. That would also take care of #10718.

But really, anything that doesn't require the contents of the .crate files for dev-dependencies to be present would do. cargo metadata --ignore-missing so that it wouldn't error out when it cannot find something would also work.

I'm not familiar with Cargo internals, so I cannot really tell what the best "under the hood" mechanism would be.

In those situations, it isn't only cargo metadata that would fail but every cargo command.

Well, cargo build works, and I'm trying to get cargo auditable build to work whenever cargo build works.

I have been toying with how we could construct an alternative set of plumbing commands that could give more control or be swapped out with custom logic in different situations.

A machine-readable output for cargo tree would be a nearly perfect match for cargo auditable. The way cargo metadata works has some fundamental limitations such as #7754 and #10718. Although I don't know if cargo tree's dependency type filtering would work when dependencies are vendored and .crate files for dev-dependencies aren't present.

For cargo auditable and similar tools like cargo cyclonedx #13709 would be the best solution, if only on nightly in the foreseeable future. All the review comments on the PR have been addressed by the author, but then the review has stalled?

@epage
Copy link
Contributor

epage commented Nov 8, 2024

Well, cargo build works, and I'm trying to get cargo auditable build to work whenever cargo build works.

Oh, right. If you are using the registry and not vendored dependencies, the Index will be used for dependency resolution so .crate files aren't needed. When we do feature resolution, we only do it for the selected targets and don't need the .crate files.

The problem comes in when we get the extended information for including in the metadata output that we need the .crate files.

However, this is Debian. I would assume a registry is not being used. Are the .crate files being referenced in a way that still has an index to make this work?

@Shnatsel
Copy link
Member Author

Shnatsel commented Nov 8, 2024

@alexanderkjall since you worked on the integration into Debian, could you explain to Cargo developers how the Debian build process for Rust packages works and how it avoids using the dev-dependencies?

@alexanderkjall
Copy link
Contributor

I tried to write something, I'm not really sure if I hit the correct abstraction level here at all, so please ask more questions if needed:

How does building binaries from rust source work in Debian:

Debian does not allow the build servers to fetch source code from the internet while building, instead all source code that generates a binary must be already packaged and part of Debian before the build of the binary can happen.

There is also a concept of source packages and binary packages in Debian, with rust this gets a tad complicated. For rust-libraries the debian-binary package contains the source code of the library (for C it would contain the .so file), but for rust-binaries the debian-binary package contains the binary.

Here is an example of the content of a Debian-binary package built from rust:
https://packages.debian.org/sid/amd64/lsd/filelist

And here is the Debian-source package for that:
https://packages.debian.org/source/sid/rust-lsd

For libraries this looks like this:
https://packages.debian.org/sid/amd64/librust-futures-channel-dev/filelist

https://packages.debian.org/source/sid/rust-futures-channel

This means that Debian mirrors the dependency resolution system in cargo in it's own package system. The build then first does the dependency resolution on the Debian side, and installs librust--dev packages, these packages only contain rust source code that gets placed into /usr/share/cargo/registry/ .

Of note is also that the dependency resolution on Debians side is recursive, we read the content of Cargo.toml and only insert those entries into the dependency list, and then the apt dependency resolution will walk the tree and pull in what is needed.

Debian do not respect/care about the Cargo.lock file.

Testing of the source code is also slightly involved, tests in Debian are run in a separate section from the normal build, called autopkgtest, and we run those per feature plus all-features and no-features.

The dev-dependencies gets added as to the Depends: row for the autopkgtest section, but not the build section. One important aspect of why this is loops in the dependency tree, it's quite common that a -derive package dev-depends on the main package in rust and if those dev-dependencies got added to the build Depends in Debian we would end up with a loop in the build system that it can't handle.

@Fabian-Gruenbichler
Copy link

adding some more bits:

there's actually two different ways we test crate sources in Debian packaging:

  • as part of the build using cargo test, if there are no dev-dependencies
  • as part of autopkgtests, also using cargo test, if there are dev-dependencies

in both cases, the dev-dependencies are not used as part of the regular build, but just for the (decoupled, running after packages are built) autopkgtests.

for regular package building, we use a cargo wrapper that (among other things) injects -Z avoid-dev-deps for "build", "rustc", "doc", "test", "bench", "install" invocations, which means that the lack of dev-dependencies at package building time is not an issue. it can be an issue for other commands, especially those related to generating the source packages that get later build. for example, debcargo, our tool that translates Cargo.toml into debian packaging metadata, has support for packaging from a local crate source tree (in addition to the default mode, which uses a downloaded .crate archive from crates.io). in order to support such local crate sources, debcargo basically does the equivalent of cargo package, which in turn requires all dependencies to exist in the registry. if "the registry" is the local dir of packaged crates, then this requires all dependencies (including dev dependencies) to be installed, even though they are not used in any fashion at that point. similarly, other tools using/extending cargo might be unhappy if there is a cargo config pointing at the packaged crate "registry", as opposed to crates.io, since the former only offers an "incomplete" view.

so in summary, it would be great if cargo could be taught to ignore referenced crates not existing for various actions - it would make our packaging life easier. it's not required to build and test packages though, as that part is already handled by the existing options in cargo.

(similarly, it would be great to be able to tell cargo to ignore the non-existence of dependencies for targets that are not currently being looked at/built for/.., as we have to patch such things out, and it causes a lot of meaningless patch churn when pulling in updates - but that is probably a separate issue ;))

@epage
Copy link
Contributor

epage commented Nov 11, 2024

So it sounds like Debian is working around this via #5133 and what they really are wanting is some staiblized form of that that would then be used by all of the third-party commands. In that case, would the appropriate thing to do is to shift the conversation over to #5133 or is there something distinct about this request?

@Shnatsel
Copy link
Member Author

From the cargo auditable standpoint, the short-term solution would be supporting -Z avoid-dev-deps in cargo metadata (this doesn't even have to be stabilized to allow use in Debian) and the long-term solution is #13709.

@epage
Copy link
Contributor

epage commented Nov 11, 2024

From the cargo auditable standpoint, the short-term solution would be supporting -Z avoid-dev-deps in cargo metadata (this doesn't even have to be stabilized to allow use in Debian) a

What is missing from it?

@Shnatsel
Copy link
Member Author

Both cargo +nightly metadata -Z avoid-dev-deps and cargo +nightly -Z avoid-dev-deps metadata still include dev-dependencies in the output as of cargo 1.84.0-nightly (031049782 2024-11-01). At least when run with unrestricted access to the internet.

I've tried it both with and without the Cargo.lock file present.

@epage epage added the Z-avoid-dev-deps Nightly: avoid-dev-deps label Nov 11, 2024
@epage
Copy link
Contributor

epage commented Nov 11, 2024

So this should be treated more as a bug report against cargo metadata?

The name just says to "avoid" not to exclude. This matches some of the long form descriptions like for the feature

When running commands such as cargo install or cargo build, Cargo currently requires dev-dependencies to be downloaded, even if they are not used. The -Z avoid-dev-deps flag allows Cargo to avoid downloading dev-dependencies if they are not needed. The Cargo.lock file will not be generated if dev-dependencies are skipped.

However, the summary is different

Prevents the resolver from including dev-dependencies during resolution.

(and then there is the -Zhelp description that uses a npm-style use of the word "install")

This came up because the flag -Zavoid-dev-deps maps to

    /// `true` if this workspace should enforce optional dependencies even when
    /// not needed; false if this workspace should only enforce dependencies
    /// needed by the current configuration (such as in cargo install). In some
    /// cases `false` also results in the non-enforcement of dev-dependencies.
    require_optional_deps: bool,

Without -Zavoid-dev-deps, we load the lockfile unconditionally with HasDevUnits::Yes. With it, we rely on the callers value for HasDevUnits which is always Yes for metadata.

With this meaning, I can see either keeping the hard coded HasDevUnits::Yes (because the metadata was requested and so no nothing to avoid much like cargo check -Zavoid-dev-deps --all-targets) or making it conditional on -Zavoid-dev-deps (the user wants to avoid them, be sloppier).

As this is an unstable feature without a clear path to stabilization and not much feedback on people want, I lean towards the latter. If we do that, we need to call it out in the tracking issue so we keep it in mind if we ever get around to stabilization.

@epage epage added S-needs-team-input Status: Needs input from team on whether/how to proceed. and removed S-triage Status: This issue is waiting on initial triage. labels Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-metadata S-needs-team-input Status: Needs input from team on whether/how to proceed. Z-avoid-dev-deps Nightly: avoid-dev-deps
Projects
None yet
Development

No branches or pull requests

5 participants