-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO-JIRA: Switch layered build to treefile-apply
, drain get-ocp-repo.sh
#1780
Conversation
@jlebon: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Man, this |
okd-c9s
varianttreefile-apply
, drain get-ocp-repo.sh
This requires coreos/rpm-ostree#5351 and coreos/coreos-assembler#4054. |
cc @Prashanth684 since this also touches OKD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome !
Just a couple of questions :)
@@ -16,9 +16,6 @@ supported: | |||
- `rhel-9.6`: RHEL 9.6-based CoreOS; without OpenShift components. | |||
- `ocp-rhel-9.6`: RHEL 9.6-based CoreOS; including OpenShift components. | |||
- `c9s`: CentOS Stream-based CoreOS, without OKD components. | |||
- `okd-c9s`: CentOS Stream-based CoreOS, including OpenShift components. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see the okd-c9s
variant used in the okd/scos build pipeline [1] run in MOC and more specifically in the latest commit okd-project/okd-coreos-pipeline@d4be53e for 4.19.
But according to openshift/release#62296 the scos imagestream (to be used as node image) is now populated by the OpenShift CI itself instead of the MOC pipeline.
So maybe we should decommission the MOC pipeline [2] before merging this patch ? What do you think @Prashanth684 ? It's not a blocker though, the MOC builds would just fail and can be deal as a follow-up.
[1] https://github.com/search?q=repo%3Aokd-project%2Fokd-coreos-pipeline%20okd-c9s&type=code
[2] https://github.com/okd-project/okd-coreos-pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The okd-c9s
"variant" used in that pipeline is for the extensions build, not the base OS AFAIK.
That said, there is indeed a small cleanup possible there which is that it no longer needs to provide a VARIANT
argument to the extensions build since that's auto-detected now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe we should decommission the MOC pipeline [2] before merging this patch ? What do you think @Prashanth684 ? It's not a blocker though, the MOC builds would just fail and can be deal as a follow-up.
Correct. MOC is only used for 4.18. Once we release 4.19 as stable, we will stop those also. We are working to migrate off MOC (we still do OKD release promotions from there) to an internal cluster.
It looks like the CI base image does not contain the |
Yeah, it needed coreos/coreos-assembler#4054. That's merged now but there's no point retriggering the tests until coreos/rpm-ostree#5351 percolates down too. We're working on that. |
/retest |
OK, we have new enough rpm-ostree and cosa now. Let's try this out! |
@Prashanth684 Can you take of this bit once this lands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM - some comments
extensions/Dockerfile
Outdated
# buildah doesn't seem to support heredoc output | ||
# redirection like buildkit so do it manually here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a bug for this we can link to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find one, but I could file one I guess.
Ahhh OK, just saw your other comment below related to this. The feature I'm talking about here isn't just bash redirection, but a RUN
feature so you can write e.g.
RUN <<EOF > /out
echo foobar
and it'll go to /out
. Which would've been perfect for our use case here.
extensions/Dockerfile
Outdated
MANIFEST="manifest-rhel-9.6.yaml" | ||
EXTENSIONS="extensions-ocp-rhel-9.6.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply these paths will need to be updated each time we say bump to the next version of RHEL? (i.e. RHEL 9.7 or 9.8 are here?)
I would kind of prefer that we didn't have to remember to update these variables when that happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 9.7, but 9.8 yeah. It would be part of all the tree updates we do after branching for 4.22 I guess? When bumping the base variant from rhel-9.6
to rhel-9.8
. Though by then we should be on top of rhel-bootc
which likely means we can rework this too since it's then driven by the base RHEL version of the rhel-bootc image.
Hmm, the builder is choking on the heredocs. I think possibly the Dockerfile parser there (which happens before it's handed off to buildah) is getting tripped up on something. :( I'll dig into this a bit, but if it can't be easily worked around, I think I'll just add a commit for now that moves the heredocs to shell scripts for now. c9s builds are failing on
which I have no idea why. It's using new enough cosa with coreos/coreos-assembler#4054. And using that new cosa locally I can't reproduce this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my inline comment, sorry I missed it in my previous review.
Otherwise LGTM 👍
It's because the latest run CI job pull cosa commit b45a4066b16a2332517f659111d4f474372f77d5 [1]
and not the latest one with the changes we want coreos/coreos-assembler@ae0e86a Maybe some cache issue or race condition issue ? I'm not yet familiar with the CI workflow to be sure, maybe a simple /retest should work |
Hmm forget about it, wrong assumption, the codebase up to b45a4 contains the changes we want from https://github.com/coreos/coreos-assembler/pull/4054/commits |
OK, coreos/coreos-assembler#4059 should fix the GPG issue! |
/retest |
Instead of having to explicitly pass in the `VARIANT`, we can autodetect it based on the node image we're building `FROM`.
`apply-manifest` was essentially folded back into rpm-ostree in: coreos/rpm-ostree#5274 The only thing we need to keep is the workaround for cri-o's `/var/opt`, which... we should just try to get fixed.
Previously, when building the layered node image, we were relying on the default repo enablement settings. This though is at the source of a lot of complexity because then we need to make sure that we only inject just the repos that we need with the right enablement. See e.g. the complex logic in `get-ocp-repo.sh`. Let's instead match the semantics already in use by the base compose and extensions builds, both of which explicitly list the repos to enable. This means that we can be a lot less careful in what repo definitions we inject into the build environment, knowing only the necessary ones will be enabled. This is pretty easy to do now that (1) rpm-ostree suppports inlined treefiles, and (2) `treefile-apply` supports a `--var` option to define variables at invocation time.
100d9f8
to
ebd3ff7
Compare
Now that (1) we've reworked the layered node image build to only enable the repos it needs, and (2) we've simplified the CentOS Stream GPG keys, we can delete all of the complex logic in this repo. It basically just boils down to curl'ing down all the repo files we may need to build the various artifacts that use this script.
We only want certain packages to come from the 4.19 plashet. And we can't just rely on NVRs because the plashet may sometimes win. Long-term we should sever that dependence on ART packages, but for now, let's add a hack to essentially generate a repo on the fly from the 4.19 repo with the filters we need. The advantage of doing it this way instead of e.g. in the `get-ocp-repo.sh` script is that this applies both in CI and locally.
The OCP builder API path isn't parsing the heredoc correctly for some reason: error: build error: EOF: unterminated heredoc This will be fixed by openshift/builder#469. Anyway, just work around this for now by moving all the logic to scripts. It does make the Containerfiles cleaner at least now that it has gotten so larger and we get syntax highlighting, ShellCheck, etc... so probably for the best.
Before we inherited this from the ocp-rhel-9.6 manifest. But now that we're inheriting from the rhel-9.6 manifest, that repo isn't enabled by default there since it's not strictly needed (because we don't ship openvswitch in the base). So we need to enable it here ourselves.
OK, the OKD node image build is failing on
which I know why. Got a fix for that, but let's see if CI for the other tests pass. If they do, then let's get this in, and I'll add my fix to #1498 instead when I rebase it. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dustymabe, jbtrystram, jlebon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This will known fail. It'll be fixed in #1498. /override ci/prow/okd-scos-images OK, as for the RHCOS failure, it seems to have just... timed out?
Nothing in particular in the journal logs. Last operation was
So possibly a genuine timeout because of e.g. slow I/O. Anyway, don't think it needs to block this. It'll rerun in #1498. |
@jlebon: Overrode contexts on behalf of jlebon: ci/prow/okd-scos-images In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@jlebon: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
We now need to support both EL9 and EL10. Using the conditionnal includes for treefiles added in [1], update `osversion` to contain the variant (centos/rhel) and the major version. This allows the layered build to source `/etc/release` and include the correct repos. Update denylist entries to matcht that. [1] openshift#1780
See individual commit messages.