Skip to content

Set podTemplate on Tasks to enable multi-arch builds with Matrix #8599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dorzel
Copy link

@dorzel dorzel commented Feb 26, 2025

Changes

Hello! @jeffdyoung and I are working towards implementing the feature requested in: #6742, namely enabling Tekton to do builds on clusters with nodes of multiple architectures by enabling podTemplate to be set on Tasks.

This currently is an extremely basic approach without any of the supporting tests, validation, and other related code changes in order to show a proof of concept and get some feedback on the general approach before creating a TEP. This will of course later also be cleaned up to follow https://github.com/tektoncd/pipeline/blob/main/CONTRIBUTING.md.

These minimal changes do currently work to accomplish the goal, with an example Pipeline like:

kind: PipelineRun
metadata:
  generateName: matrixed-pipelinerun-
spec:
  pipelineSpec:
    tasks:
      - name: build-and-push-manifest
        matrix:
          params:
          - name: arch
            value: 
              - "amd64"
              - "arm64"
        taskSpec:
          results:
            - name: manifest
              type: string
          params:
            - name: arch
          podTemplate:
            nodeSelector:
              kubernetes.io/arch: $(params.arch)
          steps:
            - name: build-and-push
              image: ubuntu
              script: |
                echo "building on $(params.arch)"
                echo "testmanifest-$(params.arch)" | tee $(results.manifest.path)
      - name: create-manifest-list
        params:
          - name: manifest
            value: $(tasks.build-and-push-manifest.results.manifest[*])
        taskSpec:
          steps:
            - name: echo-manifests
              image: ubuntu
              args: ["$(params.manifest[*])"]
              script: echo "$@"

Feedback appreciated, thanks!

Fixes #6742

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Tasks now support specifying `podTemplate` fields. `podTemplate` fields on Tasks support variable substitution and fan-out via Matrix.

@tekton-robot tekton-robot added release-note-none Denotes a PR that doesnt merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 26, 2025
Copy link

linux-foundation-easycla bot commented Feb 26, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@tekton-robot tekton-robot requested a review from abayer February 26, 2025 16:56
@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 26, 2025
@afrittoli
Copy link
Member

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 26, 2025
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 93.3% 92.9% -0.3
pkg/reconciler/taskrun/resources/apply.go 99.4% 98.7% -0.6

@jeffdyoung
Copy link

@waveywaves
Copy link
Member

/assign

@waveywaves
Copy link
Member

@dorzel thank you for opening the draft PR, not sure if a TEP is required as the issue #6742 mentioned doesn't mention a requirement for one. But I will defer to the rest of the team for views on this.

@@ -119,6 +120,9 @@ type TaskSpec struct {
// Results are values that this Task can output
// +listType=atomic
Results []TaskResult `json:"results,omitempty"`

// PodTemplate holds pod specific configuration
PodTemplate *pod.PodTemplate `json:"podTemplate,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example to the examples directory? They act as e2e tests as well and help validate if the given code is working properly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly!

@@ -687,6 +687,13 @@ func ApplyReplacements(spec *v1.TaskSpec, stringReplacements map[string]string,
container.ApplyStepTemplateReplacements(spec.StepTemplate, stringReplacements, arrayReplacements)
}

// Apply variable expansion to podTemplate fields.
if spec.PodTemplate != nil {
for key, value := range spec.PodTemplate.NodeSelector {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the code very well there, but that only applies to the NodeSelector in the pod template; would there be more podtemplate specs that would needs to be applied here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chmouel Thanks for taking a look - yes I think it would make sense to support variable expansion in all of the podTemplate fields. Currently working on that and will push it up (hopefully) soon with some other updates.

pkg/pod/pod.go Outdated
if taskRun.Spec.PodTemplate != nil {
podTemplate = *taskRun.Spec.PodTemplate
} else if taskSpec.PodTemplate != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note here, I think this will ideally instead be a merge function that overrides only those values that the TaskRun defines instead of the whole podTemplate being replaced, but wanted to get others thoughts on this.

@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 16, 2025
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 64.7% -34.7

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

@dorzel
Copy link
Author

dorzel commented Apr 28, 2025

Ok, I pushed up some updates to this for tests as well as param substitution on all podTemplate fields (at least where it made sense. Bool and Int fields I have left out). Not sure if I covered all the needed areas for testing, let me know.

@chmouel @waveywaves Would you be able to take another look at this?

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

@dorzel dorzel requested a review from waveywaves May 5, 2025 17:18
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from waveywaves after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -26,7 +26,7 @@ function install_pipeline_crd() {
cat "${ko_target}" | sed -e 's%"level": "info"%"level": "debug"%' \
| sed -e 's%loglevel.controller: "info"%loglevel.controller: "debug"%' \
| sed -e 's%loglevel.webhook: "info"%loglevel.webhook: "debug"%' \
| kubectl apply -R -f - || fail_test "Build pipeline installation failed"
| kubectl apply -R -f --server-side - || fail_test "Build pipeline installation failed"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added --server-side to (hopefully) fix the metadata.annotations: Too long: must have at most 262144 bytes error that was occurring for all e2e tests. I was also seeing this locally. --force-conflicts may be needed as well or possibly using kubectl create instead of apply?

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

@@ -687,6 +698,312 @@ func ApplyReplacements(spec *v1.TaskSpec, stringReplacements map[string]string,
container.ApplyStepTemplateReplacements(spec.StepTemplate, stringReplacements, arrayReplacements)
}

// Apply variable expansion to podTemplate fields.
if spec.PodTemplate != nil {
if len(spec.PodTemplate.NodeSelector) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about the changes below. Wouldn't it be enough for this issue to restrict the variable expansion to the nodeSelector field?

I might be missing something but going from the provided sample this might be enough to do the matrixed pipelinerun.

Thank you for you work! 😸

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Yeah in the first draft of this PR, it was only including nodeSelector as a minimum working example and it does work with just that. But I did feel that it makes sense to include variable expansion on all of the podTemplate fields even in this PR, since there is more to scheduling pods on multi-arch than just nodeSelector. Also thought it would be strange for the user to have only that field supported. See the comment above: #8599 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I did feel that it makes sense to include variable expansion on all of the podTemplate fields even in this PR, since there is more to scheduling pods on multi-arch than just nodeSelector.

Thanks for the explanation. I understand your reasoning.

I still believe that including all fields might not be necessary if the users are not asking for it. What would you say are the most required fields besides nodeSelector? Maybe it would be enough to add just the most relevant and wait with the others until they are requested.

@afrittoli @chmouel @waveywaves do you have an opinion on that?

Copy link

@aleskandro aleskandro May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, having both nodeselector/nodeAffinity and tolerations is the minimum viable option, although I expect users can get value from interPod(anti)Affinity too (beyond Multiarch specific issues)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Konflux-CI project is looking to use this feature when it becomes available and I can confirm that we'll need variable expansion for nodeSelector and tolerations. Thanks for working on this!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also speaking for Konflux - we have future needs to include RuntimeClass name as well. We are experimenting with Kata containers to run tasks on remote peer pods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twoGiants I agree, also runtime information like RuntimeClass also needs to be deliberated on for a bit to ensure that we don't start adding too much runtime information and runtime information doesn't leak into Task definitions. How about we tackle supporting node selector just for this PR and then tackling the other podTemplate configs in subsequent PRs ? wdyt cc @vdemeester @afrittoli

@dorzel dorzel marked this pull request as ready for review May 12, 2025 16:43
@tekton-robot tekton-robot requested review from dibyom and jerop May 12, 2025 16:43
@dorzel dorzel changed the title WIP Set podTemplate on Tasks to enable multi-arch builds with Matrix Set podTemplate on Tasks to enable multi-arch builds with Matrix May 12, 2025
@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesnt merit a release note. labels May 12, 2025
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

PriorityClassName: priorityClassName,
ActiveDeadlineSeconds: &defaultActiveDeadlineSeconds,
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for checking if a task and a taskrun both have a pod templates mentioned, the taskrun pod template takes precedence in case of competing configurations for the same parameter ? the test added here looks like both the task and the taskrun have the same parameter. I can't see the rest of the code so am a bit confused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does this via the DNSConfig Nameservers string. The TaskRunSpec value has 1.1.1.1 while the TaskSpec value has 8.8.8.8.

@@ -687,6 +698,312 @@ func ApplyReplacements(spec *v1.TaskSpec, stringReplacements map[string]string,
container.ApplyStepTemplateReplacements(spec.StepTemplate, stringReplacements, arrayReplacements)
}

// Apply variable expansion to podTemplate fields.
if spec.PodTemplate != nil {
if len(spec.PodTemplate.NodeSelector) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twoGiants I agree, also runtime information like RuntimeClass also needs to be deliberated on for a bit to ensure that we don't start adding too much runtime information and runtime information doesn't leak into Task definitions. How about we tackle supporting node selector just for this PR and then tackling the other podTemplate configs in subsequent PRs ? wdyt cc @vdemeester @afrittoli

@vdemeester vdemeester requested review from vdemeester and removed request for chmouel May 21, 2025 08:26
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, sorry for the late review 🙇‍♂️.

So, as proposed here, this would add runtime information (pod affinity, dns config, …) in a definition type, which is something we avoid since the beginning of the project. I would love to see or discuss an alternative where we can have those, somehow, on PipelineRun.

Thinking out loud, I wonder if we could just make taskRunSpecs a bit more dynamic. Like the following.

kind: PipelineRun
metadata:
  generateName: matrixed-pipelinerun-
spec:
  taskRunSpecs:
  - pipelineTaskName: build-and-push-manifest-* # This is "the key"
    podTemplate:
      nodeSelector:
      kubernetes.io/arch: $(params.arch)
  pipelineSpec:
    tasks:
      - name: build-and-push-manifest
        matrix:
          params:
          - name: arch
            value: 
              - "amd64"
              - "arm64"
        taskSpec:
          results:
            - name: manifest
              type: string
          params:
            - name: arch
          podTemplate:
            nodeSelector:
              kubernetes.io/arch: $(params.arch)
          steps:
            - name: build-and-push
              image: ubuntu
              script: |
                echo "building on $(params.arch)"
                echo "testmanifest-$(params.arch)" | tee $(results.manifest.path)
      - name: create-manifest-list
        params:
          - name: manifest
            value: $(tasks.build-and-push-manifest.results.manifest[*])
        taskSpec:
          steps:
            - name: echo-manifests
              image: ubuntu
              args: ["$(params.manifest[*])"]
              script: echo "$@"

It doesn't necessarily need to be -* (especially as I think we can influence the name of the task from a matrix) though…

@dorzel
Copy link
Author

dorzel commented May 28, 2025

@vdemeester thanks for taking a look! Just so I understand correctly, is the issue here with podTemplate existing at all on Tasks, or just certain parts of it (though presumably, one could specify any field of the podTemplate even if param substitution wasn't supported for it)? There is a preference for keeping these on TaskRun/PipelineRun due to those holding the runtime information?

@afrittoli
Copy link
Member

@vdemeester thanks for taking a look! Just so I understand correctly, is the issue here with podTemplate existing at all on Tasks, or just certain parts of it (though presumably, one could specify any field of the podTemplate even if param substitution wasn't supported for it)? There is a preference for keeping these on TaskRun/PipelineRun due to those holding the runtime information?

I agree that, generally speaking, runtime information does not belong with Tasks. However, certain bits of information, like the required architecture, can be inherent to the Task itself, and we should not require users (those who execute the Task) to set it on the runtime every time.

When Tekton runs on Kubernetes, the information about the architecture required by a Task is implemented via the PodTemplate, a Kubernetes-specific, runtime-specific piece of information; hence, a small dilemma arises.

I see two options:

  1. Define a Tekton-specific API to capture a Task target architecture. Right now, this is done through annotations, maybe it's enough? With that, implement a change in the controller that automatically adds the required affinity to Pods.
  2. Allow a subset of the podTemplate ( namely podTemplate/nodeSelector:kubernetes.io/arch) to be set on Tasks

I think option (1) using existing annotations could be relatively easy to implement, although we would have ot deal with some complexity in case:

  • the task annotations includes multiple architectures
  • the podTemplate from the taskrun includes the arch based node selector as well

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

@vdemeester
Copy link
Member

I agree that, generally speaking, runtime information does not belong with Tasks. However, certain bits of information, like the required architecture, can be inherent to the Task itself, and we should not require users (those who execute the Task) to set it on the runtime every time.

When Tekton runs on Kubernetes, the information about the architecture required by a Task is implemented via the PodTemplate, a Kubernetes-specific, runtime-specific piece of information; hence, a small dilemma arises.

That's true. It could be inherent to the Task itself, or via the Pipeline I think.

I see two options:

1. Define a Tekton-specific API to capture a Task target architecture. Right now, this is done through annotations, maybe it's enough? With that, implement a change in the controller that automatically adds the required affinity to Pods.

2. Allow a subset of the `podTemplate` ( namely `podTemplate/nodeSelector:kubernetes.io/arch`) to be set on `Tasks`

I think option (1) using existing annotations could be relatively easy to implement, although we would have ot deal with some complexity in case:

* the task annotations includes multiple architectures

* the podTemplate from the taskrun includes the arch based node selector as well

Indeed, there could be cases where the same exact task needs to run for multiple architecture (hence the task itself would not be the "holder" of the architecture information). I even think it's going to be the main use case. @afrittoli

Maybe it is the PodTemplate notion that bothers me, but I agree, we need a Tekton-specific API to capture the Task/PipelineTask target architecture (especially in conjunction with the matrix feature).

I "like" the annotation approach (put an annotation on a TaskRun, and it will be running on a given arch), but today, there is no ways to pass an annotation for a specific Taskrun in a Pipeline (PipelineTask). If we had that, we could use that with matrix to create TaskRun from Pipeline that would have that annotation. The controller could even "warn" (or be configured to be strict and fail) if the arch annotation is not listed in the supported architecture of the task.

@dorzel
Copy link
Author

dorzel commented Jun 24, 2025

Thanks for the input everyone - after some discussion I think the way we want to take this forward is to see if we can use the current functionality of this PR on TaskRunSpecs instead, as mentioned here #8599 (review)

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/taskrun/resources/apply.go 99.4% 99.6% 0.3

@vdemeester vdemeester added this to the v1.3.0 (LTS) milestone Jun 25, 2025
@adambkaplan
Copy link

I drafted a related proposal in Shipwright on creating an API that is specific for running multi-arch builds - SHIP-0043. Linking here mainly to help us on the Shipwright side "keep track" of these capabilities on the Tekton side. I see the matrix support here (whether at the Pipeline or TaskRunSpec level) complementing, rather than competing with, the work that is described for Shipwright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

Pipeline: Support set pod template for tasks