-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Set podTemplate on Tasks to enable multi-arch builds with Matrix #8599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/kind feature |
The following is the coverage report on the affected files.
|
/assign |
@@ -119,6 +120,9 @@ type TaskSpec struct { | |||
// Results are values that this Task can output | |||
// +listType=atomic | |||
Results []TaskResult `json:"results,omitempty"` | |||
|
|||
// PodTemplate holds pod specific configuration | |||
PodTemplate *pod.PodTemplate `json:"podTemplate,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example to the examples directory? They act as e2e tests as well and help validate if the given code is working properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly!
@@ -687,6 +687,13 @@ func ApplyReplacements(spec *v1.TaskSpec, stringReplacements map[string]string, | |||
container.ApplyStepTemplateReplacements(spec.StepTemplate, stringReplacements, arrayReplacements) | |||
} | |||
|
|||
// Apply variable expansion to podTemplate fields. | |||
if spec.PodTemplate != nil { | |||
for key, value := range spec.PodTemplate.NodeSelector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the code very well there, but that only applies to the NodeSelector in the pod template; would there be more podtemplate specs that would needs to be applied here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chmouel Thanks for taking a look - yes I think it would make sense to support variable expansion in all of the podTemplate fields. Currently working on that and will push it up (hopefully) soon with some other updates.
pkg/pod/pod.go
Outdated
if taskRun.Spec.PodTemplate != nil { | ||
podTemplate = *taskRun.Spec.PodTemplate | ||
} else if taskSpec.PodTemplate != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note here, I think this will ideally instead be a merge function that overrides only those values that the TaskRun defines instead of the whole podTemplate being replaced, but wanted to get others thoughts on this.
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
Ok, I pushed up some updates to this for tests as well as param substitution on all podTemplate fields (at least where it made sense. Bool and Int fields I have left out). Not sure if I covered all the needed areas for testing, let me know. @chmouel @waveywaves Would you be able to take another look at this? |
The following is the coverage report on the affected files.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
test/e2e-common.sh
Outdated
@@ -26,7 +26,7 @@ function install_pipeline_crd() { | |||
cat "${ko_target}" | sed -e 's%"level": "info"%"level": "debug"%' \ | |||
| sed -e 's%loglevel.controller: "info"%loglevel.controller: "debug"%' \ | |||
| sed -e 's%loglevel.webhook: "info"%loglevel.webhook: "debug"%' \ | |||
| kubectl apply -R -f - || fail_test "Build pipeline installation failed" | |||
| kubectl apply -R -f --server-side - || fail_test "Build pipeline installation failed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added --server-side
to (hopefully) fix the metadata.annotations: Too long: must have at most 262144 bytes
error that was occurring for all e2e tests. I was also seeing this locally. --force-conflicts
may be needed as well or possibly using kubectl create
instead of apply
?
The following is the coverage report on the affected files.
|
@@ -687,6 +698,312 @@ func ApplyReplacements(spec *v1.TaskSpec, stringReplacements map[string]string, | |||
container.ApplyStepTemplateReplacements(spec.StepTemplate, stringReplacements, arrayReplacements) | |||
} | |||
|
|||
// Apply variable expansion to podTemplate fields. | |||
if spec.PodTemplate != nil { | |||
if len(spec.PodTemplate.NodeSelector) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the changes below. Wouldn't it be enough for this issue to restrict the variable expansion to the nodeSelector
field?
I might be missing something but going from the provided sample this might be enough to do the matrixed pipelinerun.
Thank you for you work! 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Yeah in the first draft of this PR, it was only including nodeSelector as a minimum working example and it does work with just that. But I did feel that it makes sense to include variable expansion on all of the podTemplate fields even in this PR, since there is more to scheduling pods on multi-arch than just nodeSelector. Also thought it would be strange for the user to have only that field supported. See the comment above: #8599 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I did feel that it makes sense to include variable expansion on all of the podTemplate fields even in this PR, since there is more to scheduling pods on multi-arch than just nodeSelector.
Thanks for the explanation. I understand your reasoning.
I still believe that including all fields might not be necessary if the users are not asking for it. What would you say are the most required fields besides nodeSelector
? Maybe it would be enough to add just the most relevant and wait with the others until they are requested.
@afrittoli @chmouel @waveywaves do you have an opinion on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, having both nodeselector/nodeAffinity and tolerations is the minimum viable option, although I expect users can get value from interPod(anti)Affinity too (beyond Multiarch specific issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Konflux-CI project is looking to use this feature when it becomes available and I can confirm that we'll need variable expansion for nodeSelector
and tolerations
. Thanks for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also speaking for Konflux - we have future needs to include RuntimeClass name as well. We are experimenting with Kata containers to run tasks on remote peer pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoGiants I agree, also runtime information like RuntimeClass also needs to be deliberated on for a bit to ensure that we don't start adding too much runtime information and runtime information doesn't leak into Task definitions. How about we tackle supporting node selector just for this PR and then tackling the other podTemplate configs in subsequent PRs ? wdyt cc @vdemeester @afrittoli
The following is the coverage report on the affected files.
|
PriorityClassName: priorityClassName, | ||
ActiveDeadlineSeconds: &defaultActiveDeadlineSeconds, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test for checking if a task and a taskrun both have a pod templates mentioned, the taskrun pod template takes precedence in case of competing configurations for the same parameter ? the test added here looks like both the task and the taskrun have the same parameter. I can't see the rest of the code so am a bit confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does this via the DNSConfig Nameservers string. The TaskRunSpec value has 1.1.1.1 while the TaskSpec value has 8.8.8.8.
@@ -687,6 +698,312 @@ func ApplyReplacements(spec *v1.TaskSpec, stringReplacements map[string]string, | |||
container.ApplyStepTemplateReplacements(spec.StepTemplate, stringReplacements, arrayReplacements) | |||
} | |||
|
|||
// Apply variable expansion to podTemplate fields. | |||
if spec.PodTemplate != nil { | |||
if len(spec.PodTemplate.NodeSelector) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twoGiants I agree, also runtime information like RuntimeClass also needs to be deliberated on for a bit to ensure that we don't start adding too much runtime information and runtime information doesn't leak into Task definitions. How about we tackle supporting node selector just for this PR and then tackling the other podTemplate configs in subsequent PRs ? wdyt cc @vdemeester @afrittoli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, sorry for the late review 🙇♂️.
So, as proposed here, this would add runtime information (pod affinity, dns config, …) in a definition type, which is something we avoid since the beginning of the project. I would love to see or discuss an alternative where we can have those, somehow, on PipelineRun
.
Thinking out loud, I wonder if we could just make taskRunSpecs
a bit more dynamic. Like the following.
kind: PipelineRun
metadata:
generateName: matrixed-pipelinerun-
spec:
taskRunSpecs:
- pipelineTaskName: build-and-push-manifest-* # This is "the key"
podTemplate:
nodeSelector:
kubernetes.io/arch: $(params.arch)
pipelineSpec:
tasks:
- name: build-and-push-manifest
matrix:
params:
- name: arch
value:
- "amd64"
- "arm64"
taskSpec:
results:
- name: manifest
type: string
params:
- name: arch
podTemplate:
nodeSelector:
kubernetes.io/arch: $(params.arch)
steps:
- name: build-and-push
image: ubuntu
script: |
echo "building on $(params.arch)"
echo "testmanifest-$(params.arch)" | tee $(results.manifest.path)
- name: create-manifest-list
params:
- name: manifest
value: $(tasks.build-and-push-manifest.results.manifest[*])
taskSpec:
steps:
- name: echo-manifests
image: ubuntu
args: ["$(params.manifest[*])"]
script: echo "$@"
It doesn't necessarily need to be -*
(especially as I think we can influence the name of the task from a matrix) though…
@vdemeester thanks for taking a look! Just so I understand correctly, is the issue here with |
I agree that, generally speaking, runtime information does not belong with When Tekton runs on Kubernetes, the information about the architecture required by a I see two options:
I think option (1) using existing annotations could be relatively easy to implement, although we would have ot deal with some complexity in case:
|
The following is the coverage report on the affected files.
|
That's true. It could be inherent to the
Indeed, there could be cases where the same exact task needs to run for multiple architecture (hence the task itself would not be the "holder" of the architecture information). I even think it's going to be the main use case. @afrittoli Maybe it is the I "like" the annotation approach (put an annotation on a TaskRun, and it will be running on a given arch), but today, there is no ways to pass an annotation for a specific Taskrun in a Pipeline (PipelineTask). If we had that, we could use that with matrix to create TaskRun from Pipeline that would have that annotation. The controller could even "warn" (or be configured to be strict and fail) if the arch annotation is not listed in the supported architecture of the task. |
Thanks for the input everyone - after some discussion I think the way we want to take this forward is to see if we can use the current functionality of this PR on TaskRunSpecs instead, as mentioned here #8599 (review) |
The following is the coverage report on the affected files.
|
I drafted a related proposal in Shipwright on creating an API that is specific for running multi-arch builds - SHIP-0043. Linking here mainly to help us on the Shipwright side "keep track" of these capabilities on the Tekton side. I see the matrix support here (whether at the Pipeline or TaskRunSpec level) complementing, rather than competing with, the work that is described for Shipwright. |
Changes
Hello! @jeffdyoung and I are working towards implementing the feature requested in: #6742, namely enabling Tekton to do builds on clusters with nodes of multiple architectures by enabling podTemplate to be set on Tasks.
This currently is an extremely basic approach without any of the supporting tests, validation, and other related code changes in order to show a proof of concept and get some feedback on the general approach before creating a TEP. This will of course later also be cleaned up to follow https://github.com/tektoncd/pipeline/blob/main/CONTRIBUTING.md.
These minimal changes do currently work to accomplish the goal, with an example Pipeline like:
Feedback appreciated, thanks!
Fixes #6742
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes