WIP: Use a registry with TMT tests #1847

ckyrouac · 2025-12-11T17:14:25Z

Most of this was written by Claude and needs to be cleaned up. It's very much a WIP, but I want to get feedback on the high level design before going too far into the cleanup. Please focus on the README changes (contained in a separate commit) that explain the design. I have tested the code at least works and you can try it out if you want.

The key design points (explained in detail in the READMEs)

A separate registry VM is started before the test VMs
The registry connection details are passed as environment variables to each test
The base bootc image under test is pushed to the registry
Individual tests can be defined as a directory or a single file
Additional images required for a test must be defined in an images sub directory of the test
The pre-built images are passed to the tests as environment variables

The primary reason for this change is to support building sealed images on the test runner without needing to copy keys into the test VM and copy-to-storage. An added benefit is our tests will run in a more realistic environment (i.e. using a registry instead of building on the system under test).

gemini-code-assist

Code Review

This is a major and impressive refactoring of the TMT test infrastructure. The high-level design of using a dedicated registry VM is excellent and will make the tests more realistic and powerful. The move to directory-based tests with custom image definitions is also a great improvement, making tests more self-contained and easier to manage. The extensive documentation in the new and updated READMEs is very clear and will be incredibly helpful for other developers.

The code is a WIP as you mentioned, and I've left some comments on areas for cleanup and improvement, focusing on correctness and robustness. My main points are:

Improving the robustness of IP address parsing from ip route output.
Enforcing uniqueness of test numbers to prevent non-deterministic generation of TMT files.
Removing some leftover debugging statements.
A question about the use of SELinux permissive mode in the registry VM.

Overall, this is a fantastic direction for the testing framework. Great work!

gemini-code-assist · 2025-12-11T17:17:47Z

crates/xtask/src/tmt.rs

+    }
+
    // Sort tests by number
    tests.sort_by_key(|t| t.number);


The tests are sorted by number, but there's no check for uniqueness. If multiple tests have the same number, sort_by_key is not stable, which can lead to non-deterministic order in the generated integration.fmf file. This could cause flaky CI runs if the file is checked into git.

You should enforce that test numbers are unique. You could use a HashSet to detect duplicates during the test discovery phase before sorting.
For example:

let mut seen_numbers = std::collections::HashSet::new(); for test in &tests { if !seen_numbers.insert(test.number) { anyhow::bail!("Duplicate test number found: {}", test.number); } } tests.sort_by_key(|t| t.number);

I've noticed that test-25-download-only-upgrade and test-25-soft-reboot seem to share the number 25, which this change would catch.

Suggested change

tests.sort_by_key(|t| t.number);

// Enforce unique test numbers to prevent non-deterministic plan ordering

let mut seen_numbers = std::collections::HashSet::new();

for test in &tests {

if !seen_numbers.insert(test.number) {

anyhow::bail!("Duplicate test number found: {}", test.number);

}

}

tests.sort_by_key(|t| t.number);

gemini-code-assist · 2025-12-11T17:17:48Z

Dockerfile

+set -xeuo pipefail
+# Install the registry CA certificate if it exists
+# This allows test VMs to trust the registry's TLS certificate
+ls -la /run/src/src


This ls command appears to be for debugging and can be removed.

gemini-code-assist · 2025-12-11T17:17:48Z

crates/xtask/src/tmt.rs

+    let registry_ip = gateway_output
+        .split_whitespace()
+        .skip_while(|&word| word != "default")
+        .skip(1) // skip "default"
+        .find(|&word| word != "via") // find first word after "via"
+        .unwrap_or("")
+        .trim();


This logic to parse the gateway IP is a bit fragile. It assumes a specific ordering of words in the ip route show default output. For example, it would fail on default dev eth0 via 192.168.122.1.

A more robust approach would be to find via and take the next word. Consider using something like:

let registry_ip = gateway_output .split_whitespace() .skip_while(|&word| word != "via") .nth(1) .unwrap_or("") .trim();

This will find via and then take the next token, which should be the IP address, regardless of other options like dev.

Even better, if the test image has ip -j, you could use ip -j route show default and parse the JSON output for the gateway field, which would be the most robust solution.

Suggested change

let registry_ip = gateway_output

.split_whitespace()

.skip_while(|&word| word != "default")

.skip(1) // skip "default"

.find(|&word| word != "via") // find first word after "via"

.unwrap_or("")

.trim();

let registry_ip = gateway_output

.split_whitespace()

.skip_while(|&word| word != "via")

.nth(1)

.unwrap_or("")

.trim();

gemini-code-assist · 2025-12-11T17:17:48Z

crates/xtask/src/tmt.rs

+    // Print the contents of cert-dir for debugging
+    println!("Using cert-dir: {}", cert_dir);
+    let _ = cmd!(sh, "ls -laR {cert_dir}").run();


These debugging prints can be removed.

gemini-code-assist · 2025-12-11T17:17:48Z

hack/Containerfile

+
+# Install the registry CA certificate if it exists
+# This allows test VMs to trust the registry's TLS certificate
+ls -la /run/context


This ls command appears to be for debugging and can be removed.

gemini-code-assist · 2025-12-11T17:17:48Z

hack/Containerfile.registry

+cat > /etc/selinux/config <<'EOF'
+SELINUX=permissive
+SELINUXTYPE=targeted
+EOF


The registry image is configured to run with SELinux in permissive mode. Is this strictly necessary? The volume mounts for the registry container use the :Z flag, which should handle relabeling. If possible, it would be better to run in enforcing mode for a more realistic test environment. If permissive mode is required, it would be good to add a comment explaining why.

gemini-code-assist · 2025-12-11T17:17:48Z

tmt/tests/booted/readonly/021-test-rhsm-facts.nu

+print "#####################################"
+print "#####################################"
+print "#####################################"
+print "#####################################"
+
+bootc status
+hostname


These print statements and bootc status, hostname calls appear to be for debugging and can be removed.

cgwalters

Thanks for working on this!

cgwalters · 2025-12-11T17:39:57Z

crates/xtask/src/tmt.rs

 use crate::{RunTmtArgs, TmtProvisionArgs};

+/// RAII guard to ensure VM cleanup on drop unless explicitly preserved
+struct VmCleanupGuard<'a> {


Fine as is but I filed bootc-dev/bcvk#173 related to this

cgwalters · 2025-12-11T17:42:04Z

crates/xtask/src/tmt.rs

                    let ssh_port = ssh_port as u16;
-                    return Ok((ssh_port, ssh_key.to_string()));
+                    // Try to get IP address from network interfaces
+                    let ip = json


I think this can use https://docs.rs/serde_json/1.0.145/serde_json/enum.Value.html#method.pointer

cgwalters · 2025-12-11T17:42:29Z

crates/xtask/src/tmt.rs

+    let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf())
+        .context("Failed to convert key path to UTF-8")?;
+
+    std::fs::write(&key_path, key).context("Failed to write SSH key")?;


key_file.write_all(key.as_bytes())

cgwalters · 2025-12-11T17:43:50Z

crates/xtask/src/tmt.rs

+        // Build context is images/<tag>/ subdirectory
+        cmd!(
+            sh,
+            "podman build -t {image_ref} -f {containerfile_path} {image_dir}"


Hmm. I'd prefer that we do things like build container images from Justfile. Arguably just build should build all images by default?

cgwalters · 2025-12-11T17:44:42Z

crates/xtask/src/tmt.rs

+        // Remove from local storage
+        // We don't remove from registry since it's a test registry that gets torn down anyway
+        let full_tag = &image.full_tag;
+        let _ = cmd!(sh, "podman rmi -f {full_tag}")


How about just image-cleanup or something? (which could call into this) also on this topic...I tried to ensure that every image we build is tagged with a label, so we should make it easy to remove all local images we built.

cgwalters · 2025-12-11T17:51:43Z

crates/xtask/src/tmt.rs

+    // Host pushes to localhost:5000
+    // Test VMs use hostname (configured via TMT prepare)
+    println!("Pushing base image to registry...");
+    let registry_url = "localhost:5000".to_string();


Let's use a const instead of 5000 in multiple places

cgwalters · 2025-12-11T17:53:12Z

.gitignore

 /target
+
+# Registry TLS certificates (generated at build time)
+/hack/.registry-certs


I'm ok with this but note the secure-boot keys which are similar are in target which I feel is cleaner; basically it's the default directory for build-time artifacts.

cgwalters · 2025-12-11T17:56:51Z

Justfile

+    ./hack/setup-registry-certs.sh
+    # Build registry image with Quadlet configuration
+    # Pre pull registry container to be used as a LBI
+    podman pull quay.io/libpod/registry:2.8.2


Not a blocker but we'll need to ensure stuff like this is tagged so that renovate can handle bumping it. Probably best as a Justfile variable at the top too.

I would also say we should not use that specific container image which I think is only for the podman team's CI. https://github.com/distribution/distribution is part of CNCF too, but obviously there's a lot of choices for registries.

cgwalters · 2025-12-11T17:58:56Z

Dockerfile

+if [ -f /run/src/src/hack/.registry-certs/ca.pem ]; then
+    echo "Installing registry CA certificate to trust store..."
+    cp /run/src/src/hack/.registry-certs/ca.pem /usr/share/pki/ca-trust-source/anchors/bootc-registry-ca.crt
+    update-ca-trust
+    echo "✓ Registry CA certificate installed"
+else
+    echo "Note: Registry CA certificate not found - registry will need --tls-verify=false"
+    echo "To enable secure registry access, run: hack/setup-registry-certs.sh"
+    exit 1
+fi


This is all 100% fine as is but I think it'd be cleaner and clearer written like this:

if test '!' -f /run/src/src/hack/.registry-certs/ca.pem; then echo "hack/setup-registry-certs.sh must be invoked" 1>&2; exit 1 fi cp /run/src/src/hack/.registry-certs/ca.pem /usr/share/pki/ca-trust-source/anchors/bootc-registry-ca.crt update-ca-trust

The idea is we check for exceptions/errors first, and the "happy path" is the default one.

cgwalters · 2025-12-11T18:03:42Z

...st-20-image-pushpull-upgrade/images/01-bootc-derived/usr/lib/bootc/kargs.d/05-testkargs.toml

@@ -0,0 +1 @@
+kargs = ["testarg=foo", "othertestkarg", "thirdkarg=bar"]


Not related to this specific line but just a note: since this PR is changing all of the tests it's going to be an ongoing merge conflict fest.

One thing that would likely here is to split up any "prep" changes that can land independently first

ckyrouac added 2 commits December 11, 2025 11:58

Tmt test refactor READMEs

427aca3

Code changes for tmt test refactor

5009041

ckyrouac requested a review from cgwalters December 11, 2025 17:14

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

cgwalters mentioned this pull request Dec 11, 2025

add --lifecycle-bind-to <pid> bootc-dev/bcvk#173

Open

cgwalters reviewed Dec 11, 2025

View reviewed changes

-    tests.sort_by_key(|t| t.number);
+    // Enforce unique test numbers to prevent non-deterministic plan ordering
+    let mut seen_numbers = std::collections::HashSet::new();
+    for test in &tests {
+        if !seen_numbers.insert(test.number) {
+            anyhow::bail!("Duplicate test number found: {}", test.number);
+        }
+    }
+    tests.sort_by_key(|t| t.number);

		@@ -0,0 +1 @@
		kargs = ["testarg=foo", "othertestkarg", "thirdkarg=bar"]

WIP: Use a registry with TMT tests #1847

Are you sure you want to change the base?

WIP: Use a registry with TMT tests #1847

Uh oh!

Conversation

ckyrouac commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ckyrouac commented Dec 11, 2025 •

edited

Loading