Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: resolve Git Anonymous Resolver excessive memory usage #8677

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aThorp96
Copy link
Contributor

@aThorp96 aThorp96 commented Mar 28, 2025

Switch git resolver from go-git library to use git binary.

The go-git library does not resolve deltas efficiently, and as a result is not recommended to be used to clone repositories with full-depth. In one example RemoteResolutionRequest targeting a repository which summed 145Mb, configuring the resolution timeout to 10 minutes and giving the resolver to have 25Gb of memory, the resolver pod was OOM killed after ~6 minutes. Additionally, since go-git's delta resolution does not accept any contexts, the time required and memory used during resolving a large repository will not be cutoff when the context is canceled, impacting the resolver's performance for other concurrent remote resolutions.

Since the git resolver can be provided a commit sha which is not tracked at any ref/head, and because go-git only supports fetching fully-qualified refs, go-git does not support fetching arbitrary revisions. Therefore, in order to guarantee the requested revision is fetched, if we continue to use the go-git library, we must fetch all revisions.

Switching to the git binary significantly improves the time and memory performance of the git resolver (by multiple orders of magnitude, in my testing). Using the git binary also enables the resolver to take advantage of shallow fetching/cloning due to the git-fetch's support for fetching arbitrary revisions, improving both the time and memory performance by another order of magnitude. Note that if the revision is not at any ref head or tag, fetching the revision does depend on the git server enabling uploadpack.allowReachableSHA1InWant. This feature is enabled and supported in Github and Gitlab but not Gitea: go-gitea/gitea#11958
See also: https://git-scm.com/docs/protocol-capabilities#_allow_reachable_sha1_in_want

Resolves #8652

Changes

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

TODO

Switch git resolver from go-git library to use git binary.

The go-git library does not resolve deltas efficiently, and as a result
is not recommended to be used to clone repositories with full-depth.
In one example RemoteResolutionRequest targeting a repository which
summed 145Mb, configuring the resolution timeout to 10 minutes and
giving the resolver to have 25Gb of memory, the resolver pod was OOM
killed after ~6 minutes. Additionally, since go-git's delta resolution
does not accept any contexts, the time required and memory used during
resolving a large repository will not be cutoff when the context is
canceled, impacting the resolver's performance for other concurrent
remote resolutions.

Since the git resolver can be provided a revision which is not tracked
at any ref in the repository, and because go-git only supports fetching
fully-qualified refs, go-git does not support fetching arbitrary
revisions. Therefore, in order to guarantee the requested revision is
fetched, if we continue to use the go-git library we must fetch all
revisions.

Switching to the git binary enables the git resolver to take advantage
of the git-fetch's support for fetching arbitrary revisions. Note that
if the revision is not at any ref head, fetching the revision does
depend on the git server enabling uploadpack.allowReachableSHA1InWant.

Resolves tektoncd#8652

See also: https://git-scm.com/docs/protocol-capabilities#_allow_reachable_sha1_in_want

NOTE: This feature is enabled and supported in Github and Gitlab but
not Gitea: go-gitea/gitea#11958
@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Mar 28, 2025
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jeromeju after the PR has been reviewed.
You can assign the PR to them by writing /assign @jeromeju in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 28, 2025
@tekton-robot tekton-robot requested review from afrittoli and jerop March 28, 2025 20:12
Comment on lines +52 to +73
_, err = repo.execGit(ctx, "clone", repo.url, tmpDir, "--depth=1", "--no-checkout")
if err != nil {
return nil, cleanupFunc, err
}

_, err = repo.execGit(ctx, "fetch", "origin", repo.revision, "--depth=1")
if err != nil {
return nil, cleanupFunc, err
}

_, err = repo.execGit(ctx, "checkout", "FETCH_HEAD")
if err != nil {
return nil, cleanupFunc, err
}

revisionSha, err := repo.execGit(ctx, "rev-list", "-n1", "HEAD")
if err != nil {
return nil, cleanupFunc, err
}
repo.revision = strings.TrimSpace(string(revisionSha))

return &repo, cleanupFunc, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially simply did a full git clone and then simply did git checkout. However doing a shallow clone with --no-checkout, fetching the revision, and then checking the revision out improved the time and space performance significantly. Using the same repository mentioned in the original issue as a benchmark, using a shallow clone reduced the clone time from ~110 seconds by a factor of ten. Using --no-checkout similarly reduced the disk space and time slightly (before the necessary revision was checked-out)

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/resolution/resolver/git/git.go Do not exist 86.0%
pkg/resolution/resolver/git/resolver.go 83.9% 84.8% 0.9

@@ -29,7 +29,7 @@ spec:
description: Username to be used to login to the container registry
default: "_json_key"
- name: releaseAsLatest
description: Whether to tag and publish this release as Pipelines' latest
description: Whether to tag and publish this release as Pipelines latest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not a grammatical error, I just noticed it threw off the syntax highlighting of my editor and both versions seemed equally correct. I am happy to revert if needed.

@@ -55,7 +55,7 @@ function ko_resolve() {
github.com/tektoncd/pipeline/cmd/nop: ghcr.io/tektoncd/pipeline/github.com/tektoncd/pipeline/combined-base-image:latest
github.com/tektoncd/pipeline/cmd/workingdirinit: ghcr.io/tektoncd/pipeline/github.com/tektoncd/pipeline/combined-base-image:latest

github.com/tektoncd/pipeline/cmd/git-init: cgr.dev/chainguard/git
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git-init package no longer exists.

@aThorp96
Copy link
Contributor Author

/kind feat

@tekton-robot
Copy link
Collaborator

@aThorp96: The label(s) kind/feat cannot be applied, because the repository doesn't have them.

In response to this:

/kind feat

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aThorp96
Copy link
Contributor Author

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 28, 2025
@aThorp96
Copy link
Contributor Author

Cc @vdemeester

@vdemeester vdemeester self-assigned this Mar 28, 2025
@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesnt merit a release note. labels Mar 29, 2025
@waveywaves
Copy link
Member

waveywaves commented Apr 1, 2025

 --- FAIL: TestGitResolver_Clone_Failure (0.06s)
    --- FAIL: TestGitResolver_Clone_Failure/commit_does_not_exist (1.08s)
    --- PASS: TestGitResolver_Clone_Failure/path_does_not_exist (2.06s)
    --- FAIL: TestGitResolver_Clone_Failure/repo_does_not_exist (2.08s)

beta e2e tests are failing
cc @aThorp96

@aThorp96
Copy link
Contributor Author

aThorp96 commented Apr 1, 2025

@waveywaves yeah, hoping to get to addressing the e2e test failures today or tomorrow.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 1, 2025
@tekton-robot
Copy link
Collaborator

@aThorp96: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

bug: Excessive memory consumption during git Resolver remote resolution
4 participants