Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub actions job with Kaniko container image set is using the wrong credentials #3963

Open
4 tasks done
OpsBurger opened this issue Mar 10, 2025 · 1 comment
Open
4 tasks done
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@OpsBurger
Copy link

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Create a service account which assumes the AWS IAM role:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ $.Values.environment }}-runners-arc
  namespace: {{ .Values.runners.namespace }}
  annotations:
    eks.amazonaws.com/role-arn: {{ $.Values.runners.roleArn }}

2. Deploy ARC in Kubernetes using helm in Kubernetes mode:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: arc-scale-set
  namespace: argocd
spec:
  project: arc
  destination:
    namespace: {{ .Values.runners.namespace }}
    server: https://kubernetes.default.svc
  source:
    repoURL: ghcr.io/actions/actions-runner-controller-charts
    chart: gha-runner-scale-set
    targetRevision: 0.9.3
    helm:
      values: |
        runnerScaleSetName: {{ .Values.environment }}-runners
        githubConfigUrl: https://github.com/elementor
        githubConfigSecret: github-runners-github-auth
        controllerServiceAccount:
          namespace: {{ .Values.runners.namespace }}
          name: arc-scale-set-controller-gha-rs-controller

        template:
          spec:
            serviceAccountName: {{ .Values.environment }}-runners-arc
            initContainers:
              - name: kube-init
                image: ghcr.io/actions/actions-runner:latest
                command: ["sudo", "chown", "-R", "runner:runner", "/home/runner/_work"]
                volumeMounts:
                  - name: work
                    mountPath: /home/runner/_work
            containers:
              - name: runner
                image: ghcr.io/actions/actions-runner:latest
                command: ["/home/runner/run.sh"]
                env:
                  - name: AWS_SDK_LOAD_CONFIG  # for kaniko to use IRSA
                    value: "true"
                  - name: AWS_EC2_METADATA_DISABLED
                    value: "true"
                  - name: ACTIONS_RUNNER_CONTAINER_HOOKS
                    value: /home/runner/k8s/index.js
                  - name: ACTIONS_RUNNER_POD_NAME
                    valueFrom:
                      fieldRef:
                        fieldPath: metadata.name
                  - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
                    value: "false"  # allows jobs to run without the need to specify the container value
                volumeMounts:
                  - name: work
                    mountPath: /home/runner/_work

            volumes:
              - name: work
                ephemeral:
                  volumeClaimTemplate:
                    spec:
                      accessModes: [ "ReadWriteOnce" ]
                      storageClassName: {{ .Values.runners.storageClassName }}
                      resources:
                        requests:
                          storage: 2Gi

        containerMode:
          type: "kubernetes"
          kubernetesModeWorkVolumeClaim:
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 4Gi

        maxRunners: 30
        minRunners: 0
  syncPolicy:
    automated:
      prune: true
      selfHeal: true


3. Use the runners group you created to run the following job:

  build:
    needs: pre-build
    runs-on: dev-runners
    container:
      image: gcr.io/kaniko-project/executor:v1.20.0-debug
    permissions:
      contents: read
    steps:
      - name: Run Kaniko
        env:
          GIT_TOKEN: ""
          TAGS: ${{ needs.pre-build.outputs.TAGS }}
        run: |
          # Debug output
          echo "Received TAGS: $TAGS"
          
          # Create the base command with proper quoting
          kaniko_cmd="/kaniko/executor"
          kaniko_cmd="$kaniko_cmd --dockerfile=Dockerfile"
          kaniko_cmd="$kaniko_cmd --context='git://github.com/${{ github.repository }}#refs/heads/${{ github.ref_name }}#${{ github.sha }}'"
          kaniko_cmd="$kaniko_cmd --cache=true"
          kaniko_cmd="$kaniko_cmd --push-retry=5"
          kaniko_cmd="$kaniko_cmd --build-arg=BUILDKIT_INLINE_CACHE=1"
          
          # Process tags using sh-compatible syntax
          destinations=""
          old_IFS=$IFS
          IFS=","
          for tag in $TAGS; do
            echo "Processing tag: $tag"
            if [ -n "$tag" ]; then
              echo "Adding destination: $tag"
              destinations="$destinations --destination='$tag'"
            fi
          done
          IFS=$old_IFS
          
          kaniko_cmd="$kaniko_cmd $destinations"
          
          # Execute the command
          echo "Final Kaniko command: $kaniko_cmd"
          echo "Current directory: $(pwd)"
          echo "Directory contents: $(ls)"
          eval "$kaniko_cmd"

Describe the bug

The issue occurs specifically in the Kaniko build job.

What I expect to happen:

I expect the job pod to use the mounted service account token to connect to AWS services.
I can see that the right Kubenretes service account is mounted alongside the credentials secret mounted - meaning it does not use the default SA, therefore there is not reason for it to assume the node permissions.

Describe the expected behavior

What actually happens:

The pod is using the node permissions in order to connect to AWS services, and then fails for lack of permissions.
This happens even though the right service account is mounted.

Additional Context

the controller and the runners themselves run in different namespaces as GitHub suggest is the best practice.

Controller Logs

https://gist.github.com/OpsBurger/94164223f9aa3286d5d4945e22dde583

Runner Pod Logs

runner pod (specific Kaniko build job): https://gist.github.com/OpsBurger/1a6c00a4038903e79351b690342d1169

runner describe command: https://gist.github.com/OpsBurger/ab7cab4a9e43dd65e60bef360dbaf8fb
@OpsBurger OpsBurger added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Mar 10, 2025
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

1 participant