-
Notifications
You must be signed in to change notification settings - Fork 156
WIP: Use a registry with TMT tests #1847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a major and impressive refactoring of the TMT test infrastructure. The high-level design of using a dedicated registry VM is excellent and will make the tests more realistic and powerful. The move to directory-based tests with custom image definitions is also a great improvement, making tests more self-contained and easier to manage. The extensive documentation in the new and updated READMEs is very clear and will be incredibly helpful for other developers.
The code is a WIP as you mentioned, and I've left some comments on areas for cleanup and improvement, focusing on correctness and robustness. My main points are:
- Improving the robustness of IP address parsing from
ip routeoutput. - Enforcing uniqueness of test numbers to prevent non-deterministic generation of TMT files.
- Removing some leftover debugging statements.
- A question about the use of SELinux permissive mode in the registry VM.
Overall, this is a fantastic direction for the testing framework. Great work!
| } | ||
|
|
||
| // Sort tests by number | ||
| tests.sort_by_key(|t| t.number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are sorted by number, but there's no check for uniqueness. If multiple tests have the same number, sort_by_key is not stable, which can lead to non-deterministic order in the generated integration.fmf file. This could cause flaky CI runs if the file is checked into git.
You should enforce that test numbers are unique. You could use a HashSet to detect duplicates during the test discovery phase before sorting.
For example:
let mut seen_numbers = std::collections::HashSet::new();
for test in &tests {
if !seen_numbers.insert(test.number) {
anyhow::bail!("Duplicate test number found: {}", test.number);
}
}
tests.sort_by_key(|t| t.number);I've noticed that test-25-download-only-upgrade and test-25-soft-reboot seem to share the number 25, which this change would catch.
| tests.sort_by_key(|t| t.number); | |
| // Enforce unique test numbers to prevent non-deterministic plan ordering | |
| let mut seen_numbers = std::collections::HashSet::new(); | |
| for test in &tests { | |
| if !seen_numbers.insert(test.number) { | |
| anyhow::bail!("Duplicate test number found: {}", test.number); | |
| } | |
| } | |
| tests.sort_by_key(|t| t.number); |
| set -xeuo pipefail | ||
| # Install the registry CA certificate if it exists | ||
| # This allows test VMs to trust the registry's TLS certificate | ||
| ls -la /run/src/src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let registry_ip = gateway_output | ||
| .split_whitespace() | ||
| .skip_while(|&word| word != "default") | ||
| .skip(1) // skip "default" | ||
| .find(|&word| word != "via") // find first word after "via" | ||
| .unwrap_or("") | ||
| .trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic to parse the gateway IP is a bit fragile. It assumes a specific ordering of words in the ip route show default output. For example, it would fail on default dev eth0 via 192.168.122.1.
A more robust approach would be to find via and take the next word. Consider using something like:
let registry_ip = gateway_output
.split_whitespace()
.skip_while(|&word| word != "via")
.nth(1)
.unwrap_or("")
.trim();This will find via and then take the next token, which should be the IP address, regardless of other options like dev.
Even better, if the test image has ip -j, you could use ip -j route show default and parse the JSON output for the gateway field, which would be the most robust solution.
| let registry_ip = gateway_output | |
| .split_whitespace() | |
| .skip_while(|&word| word != "default") | |
| .skip(1) // skip "default" | |
| .find(|&word| word != "via") // find first word after "via" | |
| .unwrap_or("") | |
| .trim(); | |
| let registry_ip = gateway_output | |
| .split_whitespace() | |
| .skip_while(|&word| word != "via") | |
| .nth(1) | |
| .unwrap_or("") | |
| .trim(); |
| // Print the contents of cert-dir for debugging | ||
| println!("Using cert-dir: {}", cert_dir); | ||
| let _ = cmd!(sh, "ls -laR {cert_dir}").run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| # Install the registry CA certificate if it exists | ||
| # This allows test VMs to trust the registry's TLS certificate | ||
| ls -la /run/context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cat > /etc/selinux/config <<'EOF' | ||
| SELINUX=permissive | ||
| SELINUXTYPE=targeted | ||
| EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The registry image is configured to run with SELinux in permissive mode. Is this strictly necessary? The volume mounts for the registry container use the :Z flag, which should handle relabeling. If possible, it would be better to run in enforcing mode for a more realistic test environment. If permissive mode is required, it would be good to add a comment explaining why.
| print "#####################################" | ||
| print "#####################################" | ||
| print "#####################################" | ||
| print "#####################################" | ||
|
|
||
| bootc status | ||
| hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
| use crate::{RunTmtArgs, TmtProvisionArgs}; | ||
|
|
||
| /// RAII guard to ensure VM cleanup on drop unless explicitly preserved | ||
| struct VmCleanupGuard<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine as is but I filed bootc-dev/bcvk#173 related to this
| let ssh_port = ssh_port as u16; | ||
| return Ok((ssh_port, ssh_key.to_string())); | ||
| // Try to get IP address from network interfaces | ||
| let ip = json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can use https://docs.rs/serde_json/1.0.145/serde_json/enum.Value.html#method.pointer
| let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf()) | ||
| .context("Failed to convert key path to UTF-8")?; | ||
|
|
||
| std::fs::write(&key_path, key).context("Failed to write SSH key")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key_file.write_all(key.as_bytes())
| // Build context is images/<tag>/ subdirectory | ||
| cmd!( | ||
| sh, | ||
| "podman build -t {image_ref} -f {containerfile_path} {image_dir}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'd prefer that we do things like build container images from Justfile. Arguably just build should build all images by default?
| // Remove from local storage | ||
| // We don't remove from registry since it's a test registry that gets torn down anyway | ||
| let full_tag = &image.full_tag; | ||
| let _ = cmd!(sh, "podman rmi -f {full_tag}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just image-cleanup or something? (which could call into this) also on this topic...I tried to ensure that every image we build is tagged with a label, so we should make it easy to remove all local images we built.
| // Host pushes to localhost:5000 | ||
| // Test VMs use hostname (configured via TMT prepare) | ||
| println!("Pushing base image to registry..."); | ||
| let registry_url = "localhost:5000".to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a const instead of 5000 in multiple places
| /target | ||
|
|
||
| # Registry TLS certificates (generated at build time) | ||
| /hack/.registry-certs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this but note the secure-boot keys which are similar are in target which I feel is cleaner; basically it's the default directory for build-time artifacts.
| ./hack/setup-registry-certs.sh | ||
| # Build registry image with Quadlet configuration | ||
| # Pre pull registry container to be used as a LBI | ||
| podman pull quay.io/libpod/registry:2.8.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker but we'll need to ensure stuff like this is tagged so that renovate can handle bumping it. Probably best as a Justfile variable at the top too.
I would also say we should not use that specific container image which I think is only for the podman team's CI. https://github.com/distribution/distribution is part of CNCF too, but obviously there's a lot of choices for registries.
| if [ -f /run/src/src/hack/.registry-certs/ca.pem ]; then | ||
| echo "Installing registry CA certificate to trust store..." | ||
| cp /run/src/src/hack/.registry-certs/ca.pem /usr/share/pki/ca-trust-source/anchors/bootc-registry-ca.crt | ||
| update-ca-trust | ||
| echo "✓ Registry CA certificate installed" | ||
| else | ||
| echo "Note: Registry CA certificate not found - registry will need --tls-verify=false" | ||
| echo "To enable secure registry access, run: hack/setup-registry-certs.sh" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all 100% fine as is but I think it'd be cleaner and clearer written like this:
if test '!' -f /run/src/src/hack/.registry-certs/ca.pem; then
echo "hack/setup-registry-certs.sh must be invoked" 1>&2; exit 1
fi
cp /run/src/src/hack/.registry-certs/ca.pem /usr/share/pki/ca-trust-source/anchors/bootc-registry-ca.crt
update-ca-trust
The idea is we check for exceptions/errors first, and the "happy path" is the default one.
| @@ -0,0 +1 @@ | |||
| kargs = ["testarg=foo", "othertestkarg", "thirdkarg=bar"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this specific line but just a note: since this PR is changing all of the tests it's going to be an ongoing merge conflict fest.
One thing that would likely here is to split up any "prep" changes that can land independently first
Most of this was written by Claude and needs to be cleaned up. It's very much a WIP, but I want to get feedback on the high level design before going too far into the cleanup. Please focus on the README changes (contained in a separate commit) that explain the design. I have tested the code at least works and you can try it out if you want.
The key design points (explained in detail in the READMEs)
imagessub directory of the testThe primary reason for this change is to support building sealed images on the test runner without needing to copy keys into the test VM and copy-to-storage. An added benefit is our tests will run in a more realistic environment (i.e. using a registry instead of building on the system under test).