Skip to content

Conversation

@ckyrouac
Copy link
Collaborator

@ckyrouac ckyrouac commented Dec 11, 2025

Most of this was written by Claude and needs to be cleaned up. It's very much a WIP, but I want to get feedback on the high level design before going too far into the cleanup. Please focus on the README changes (contained in a separate commit) that explain the design. I have tested the code at least works and you can try it out if you want.

The key design points (explained in detail in the READMEs)

  • A separate registry VM is started before the test VMs
  • The registry connection details are passed as environment variables to each test
  • The base bootc image under test is pushed to the registry
  • Individual tests can be defined as a directory or a single file
  • Additional images required for a test must be defined in an images sub directory of the test
  • The pre-built images are passed to the tests as environment variables

The primary reason for this change is to support building sealed images on the test runner without needing to copy keys into the test VM and copy-to-storage. An added benefit is our tests will run in a more realistic environment (i.e. using a registry instead of building on the system under test).

@ckyrouac ckyrouac requested a review from cgwalters December 11, 2025 17:14
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a major and impressive refactoring of the TMT test infrastructure. The high-level design of using a dedicated registry VM is excellent and will make the tests more realistic and powerful. The move to directory-based tests with custom image definitions is also a great improvement, making tests more self-contained and easier to manage. The extensive documentation in the new and updated READMEs is very clear and will be incredibly helpful for other developers.

The code is a WIP as you mentioned, and I've left some comments on areas for cleanup and improvement, focusing on correctness and robustness. My main points are:

  • Improving the robustness of IP address parsing from ip route output.
  • Enforcing uniqueness of test numbers to prevent non-deterministic generation of TMT files.
  • Removing some leftover debugging statements.
  • A question about the use of SELinux permissive mode in the registry VM.

Overall, this is a fantastic direction for the testing framework. Great work!

}

// Sort tests by number
tests.sort_by_key(|t| t.number);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tests are sorted by number, but there's no check for uniqueness. If multiple tests have the same number, sort_by_key is not stable, which can lead to non-deterministic order in the generated integration.fmf file. This could cause flaky CI runs if the file is checked into git.

You should enforce that test numbers are unique. You could use a HashSet to detect duplicates during the test discovery phase before sorting.
For example:

let mut seen_numbers = std::collections::HashSet::new();
for test in &tests {
    if !seen_numbers.insert(test.number) {
        anyhow::bail!("Duplicate test number found: {}", test.number);
    }
}
tests.sort_by_key(|t| t.number);

I've noticed that test-25-download-only-upgrade and test-25-soft-reboot seem to share the number 25, which this change would catch.

Suggested change
tests.sort_by_key(|t| t.number);
// Enforce unique test numbers to prevent non-deterministic plan ordering
let mut seen_numbers = std::collections::HashSet::new();
for test in &tests {
if !seen_numbers.insert(test.number) {
anyhow::bail!("Duplicate test number found: {}", test.number);
}
}
tests.sort_by_key(|t| t.number);

set -xeuo pipefail
# Install the registry CA certificate if it exists
# This allows test VMs to trust the registry's TLS certificate
ls -la /run/src/src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This ls command appears to be for debugging and can be removed.

Comment on lines +453 to +459
let registry_ip = gateway_output
.split_whitespace()
.skip_while(|&word| word != "default")
.skip(1) // skip "default"
.find(|&word| word != "via") // find first word after "via"
.unwrap_or("")
.trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic to parse the gateway IP is a bit fragile. It assumes a specific ordering of words in the ip route show default output. For example, it would fail on default dev eth0 via 192.168.122.1.

A more robust approach would be to find via and take the next word. Consider using something like:

let registry_ip = gateway_output
    .split_whitespace()
    .skip_while(|&word| word != "via")
    .nth(1)
    .unwrap_or("")
    .trim();

This will find via and then take the next token, which should be the IP address, regardless of other options like dev.

Even better, if the test image has ip -j, you could use ip -j route show default and parse the JSON output for the gateway field, which would be the most robust solution.

Suggested change
let registry_ip = gateway_output
.split_whitespace()
.skip_while(|&word| word != "default")
.skip(1) // skip "default"
.find(|&word| word != "via") // find first word after "via"
.unwrap_or("")
.trim();
let registry_ip = gateway_output
.split_whitespace()
.skip_while(|&word| word != "via")
.nth(1)
.unwrap_or("")
.trim();

Comment on lines +503 to +505
// Print the contents of cert-dir for debugging
println!("Using cert-dir: {}", cert_dir);
let _ = cmd!(sh, "ls -laR {cert_dir}").run();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These debugging prints can be removed.


# Install the registry CA certificate if it exists
# This allows test VMs to trust the registry's TLS certificate
ls -la /run/context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This ls command appears to be for debugging and can be removed.

Comment on lines +100 to +103
cat > /etc/selinux/config <<'EOF'
SELINUX=permissive
SELINUXTYPE=targeted
EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The registry image is configured to run with SELinux in permissive mode. Is this strictly necessary? The volume mounts for the registry container use the :Z flag, which should handle relabeling. If possible, it would be better to run in enforcing mode for a more realistic test environment. If permissive mode is required, it would be good to add a comment explaining why.

Comment on lines +6 to +12
print "#####################################"
print "#####################################"
print "#####################################"
print "#####################################"

bootc status
hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These print statements and bootc status, hostname calls appear to be for debugging and can be removed.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

use crate::{RunTmtArgs, TmtProvisionArgs};

/// RAII guard to ensure VM cleanup on drop unless explicitly preserved
struct VmCleanupGuard<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as is but I filed bootc-dev/bcvk#173 related to this

let ssh_port = ssh_port as u16;
return Ok((ssh_port, ssh_key.to_string()));
// Try to get IP address from network interfaces
let ip = json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf())
.context("Failed to convert key path to UTF-8")?;

std::fs::write(&key_path, key).context("Failed to write SSH key")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_file.write_all(key.as_bytes())

// Build context is images/<tag>/ subdirectory
cmd!(
sh,
"podman build -t {image_ref} -f {containerfile_path} {image_dir}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'd prefer that we do things like build container images from Justfile. Arguably just build should build all images by default?

// Remove from local storage
// We don't remove from registry since it's a test registry that gets torn down anyway
let full_tag = &image.full_tag;
let _ = cmd!(sh, "podman rmi -f {full_tag}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just image-cleanup or something? (which could call into this) also on this topic...I tried to ensure that every image we build is tagged with a label, so we should make it easy to remove all local images we built.

// Host pushes to localhost:5000
// Test VMs use hostname (configured via TMT prepare)
println!("Pushing base image to registry...");
let registry_url = "localhost:5000".to_string();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a const instead of 5000 in multiple places

/target

# Registry TLS certificates (generated at build time)
/hack/.registry-certs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this but note the secure-boot keys which are similar are in target which I feel is cleaner; basically it's the default directory for build-time artifacts.

./hack/setup-registry-certs.sh
# Build registry image with Quadlet configuration
# Pre pull registry container to be used as a LBI
podman pull quay.io/libpod/registry:2.8.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker but we'll need to ensure stuff like this is tagged so that renovate can handle bumping it. Probably best as a Justfile variable at the top too.

I would also say we should not use that specific container image which I think is only for the podman team's CI. https://github.com/distribution/distribution is part of CNCF too, but obviously there's a lot of choices for registries.

Comment on lines +83 to +92
if [ -f /run/src/src/hack/.registry-certs/ca.pem ]; then
echo "Installing registry CA certificate to trust store..."
cp /run/src/src/hack/.registry-certs/ca.pem /usr/share/pki/ca-trust-source/anchors/bootc-registry-ca.crt
update-ca-trust
echo "✓ Registry CA certificate installed"
else
echo "Note: Registry CA certificate not found - registry will need --tls-verify=false"
echo "To enable secure registry access, run: hack/setup-registry-certs.sh"
exit 1
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all 100% fine as is but I think it'd be cleaner and clearer written like this:

if test '!' -f /run/src/src/hack/.registry-certs/ca.pem; then
  echo "hack/setup-registry-certs.sh must be invoked" 1>&2; exit 1
fi
cp /run/src/src/hack/.registry-certs/ca.pem /usr/share/pki/ca-trust-source/anchors/bootc-registry-ca.crt
update-ca-trust

The idea is we check for exceptions/errors first, and the "happy path" is the default one.

@@ -0,0 +1 @@
kargs = ["testarg=foo", "othertestkarg", "thirdkarg=bar"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this specific line but just a note: since this PR is changing all of the tests it's going to be an ongoing merge conflict fest.

One thing that would likely here is to split up any "prep" changes that can land independently first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants