Fix must-gather output for omc compatibility#4965
Open
ozzywalsh wants to merge 11 commits into
Open
Conversation
59c4537 to
70ee1ee
Compare
Contributor
Author
|
|
pavolloffay
reviewed
Apr 17, 2026
| # Use pipe (|) for multiline entries. | ||
| subtext: | | ||
| Previously collected files used a per-collector directory with kind-prefixed filenames (e.g. `namespaces/<ns>/<collector-name>/deployment-<name>.yaml`), | ||
| which omc cannot parse. Output now follows the standard omc layout (`namespaces/<ns>/<api-group>/<resource-plural>/<name>.yaml`). |
Member
There was a problem hiding this comment.
Could we find a way to implement a test to check the output with omc? Perhaps even as e2e test that runs on OCP where the tooling is available?
Contributor
Author
There was a problem hiding this comment.
This is a good idea. I'll look into and see if I can add something.
Contributor
Author
There was a problem hiding this comment.
I have added an omc step to the e2e test script.
We need to loop over the pods (in case replicas > 1)
processPodsByInstance was writing pod YAML but not collecting container logs, so collector/TA/bridge pod logs were missing from must-gather output. Now iterates all containers and calls getPodLogs for each. Remove the stale must-gather check from the otlp-metrics-traces e2e test — it used the old directory layout and is fully covered by the dedicated must-gather test.
omc expects container logs at pods/<pod>/<container>/<container>/logs/ (container name repeated twice), not pods/<pod>/<container>/logs/. Also, omc logs discovers pods from pods/<pod>/<pod>.yaml — write pod YAML there so omc logs can find and display them.
The previous implementation appended "es"/"s" based on whether the Kind string ended in "s". This produces wrong paths for kinds like NetworkPolicy (→ "networkpolicys" instead of "networkpolicies"), which omc cannot find. Replace it with an explicit map covering all Kinds the tool collects. log.Fatalf on unknown kinds ensures omissions are caught immediately rather than silently writing to a path omc cannot parse.
Download omc binary and verify it can parse the gathered output: get opentelemetrycollectors, get deployments, get pods, and logs.
omc logs defaults to the 'default' project, so it couldn't find the gather-collector pod in the chainsaw-must-gather namespace.
1733c49 to
b111871
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
omc is a CLI for browsing OpenShift must-gather output offline. Our must-gather used a non-standard directory layout that omc couldn't parse.
This PR rewrites the output to match the layout omc expects. After this change, all standard omc commands work against our must-gather output.
What changed
apps/deployments/,core/services/,opentelemetry.io/opentelemetrycollectors/) instead of flat files under the collector nameapiVersionandkindfields populated on all resources (controller-runtime strips these from List results)operators.coreos.com/clusterserviceversions/, etc.)pods/<pod>/<container>/<container>/logs/current.logconvention soomc logsworks/usr/bin/must-gather→/usr/bin/gather(OpenShift must-gather convention)~/must-gather→/must-gather(matches the mount pointoc adm must-gatherprovides)otlp-metrics-traces(now covered by the dedicatedmust-gathertest); added pod log assertionsBefore / after
Before (main)
After (this PR)
Test plan
omc get opentelemetrycollectors -A,omc get deployments,omc logs <pod>all return datatests/e2e-openshift/must-gather/) updated with collector pod log assertionsotlp-metrics-tracese2e (covered by dedicated test)Known issues
Try it
Build the image from
./cmd/gather/DockerfilePush it to openshift internal registry or an alternative (docker, quay)
Deploy the otel operator, a collector etc.
Run must-gather
oc adm must-gather --dest-dir ./your-output-dir --image=IMAGE_URL_AND_TAG -- /usr/bin/gather --operator-namespace OPERATOR_NAMESPACE