Skip to content

Conversation

@upodroid
Copy link
Member

We need to run the access tokens and hugepage services perf tests for scale jobs to achieve parity with the current GCE 5k test.

https://github.com/kubernetes/test-infra/blob/7c5dc4ca148503409199df1945856845730b94bb/config/jobs/kubernetes/sig-scalability/sig-scalability-release-blocking-jobs.yaml#L162

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 14, 2025
@k8s-ci-robot k8s-ci-robot requested review from dims and hakman October 14, 2025 11:38
@hakman
Copy link
Member

hakman commented Oct 14, 2025

Scenario is shared by both AWS and GCP.
/hold for feedback from @kubernetes/sig-scalability @dims

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2025
@upodroid
Copy link
Member Author

I can scope the change to just GCE, but let's benchmark the presubmits.

/test presubmit-kops-aws-scale-amazonvpc-using-cl2
/test presubmit-kops-gce-scale-ipalias-using-cl2

@hakman
Copy link
Member

hakman commented Oct 14, 2025

I can scope the change to just GCE, but let's benchmark the presubmits.

I don't have anything against enabling those, just would be nice to be all on the same page.
If these diverge too much, we might as well split the script.

@hakman
Copy link
Member

hakman commented Oct 14, 2025

/retest

@hakman
Copy link
Member

hakman commented Oct 14, 2025

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@upodroid
Copy link
Member Author

Something isn't right, the test suite shrunk

@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

1 similar comment
@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 15, 2025
@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2

@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-small-scale-using-cl2

@kubernetes kubernetes deleted a comment from k8s-ci-robot Oct 15, 2025
@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2025
@upodroid upodroid force-pushed the scale-tweaks-one branch 2 times, most recently from 3756f25 to f5468c8 Compare October 20, 2025 10:35
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 20, 2025
- args:
- --v=2
- --conf=/etc/kubernetes/kops-controller/config/config.yaml
- --log_file=/var/log/kops-controller.log
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this

kubernetes/klog#60

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blast from the past :-)

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@alaypatel07
Copy link

@upodroid, looks like the scale jobs are failing due to latency. Should we also test DRA the config in this pre-submit job? https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kops/17671/presubmit-kops-gce-scale-ipalias-using-cl2/1980346550060060672

@upodroid
Copy link
Member Author

We don't have a DRA presubmit against this repo. Also, I need this merged first, then we can soak it and then address the other performance issues.

d.GCPProject = resource.Name
klog.V(1).Infof("Got project %s from boskos", d.GCPProject)

if os.Getenv("SCALE_SCENARIO") == "performance" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required after kubernetes/perf-tests#3653

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is a good idea though.... kubetest2 assumes that we will build everything into kubetest2, and I think there's a lot of power in more modular tests for performance and specific scenarios. In #17636 I am trying out exporting the variable so that the test can source it: 7d339a6

@k8s-ci-robot k8s-ci-robot added the area/provider/gcp Issues or PRs related to gcp provider label Oct 24, 2025
@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

}

b.addLogRotate(c, "docker", "/var/log/docker.log", logRotateOptions{})
b.addLogRotate(c, "kops-controller", "/var/log/kops-controller.log", logRotateOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kops-controller is "just a pod" and doesn't write logs here. But it is relatively important, so I think we could make the case for doing so...

d.GCPProject = resource.Name
klog.V(1).Infof("Got project %s from boskos", d.GCPProject)

if os.Getenv("SCALE_SCENARIO") == "performance" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is a good idea though.... kubetest2 assumes that we will build everything into kubetest2, and I think there's a lot of power in more modular tests for performance and specific scenarios. In #17636 I am trying out exporting the variable so that the test can source it: 7d339a6

if version > "1.29" {
// Requires https://github.com/kubernetes/kops/pull/16128
args = append(args, "--set", `spec.containerd.configAdditions=plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler.runtime_type=io.containerd.runc.v2`)
"--set", `spec.containerd.configAdditions=plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler.runtime_type=io.containerd.runc.v2`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a cleanup for handling of old kube versions that we no longer test? (I.e. could be split into its own PR, though that would be a very short PR!)


exec.InheritOutput(cmd)
err = cmd.Run()
err := cmd.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I like if err := cmd.Run(); err != nil { because then you don't influence the next single-error return. But this is an improvement so 👍

- args:
- --v=2
- --conf=/etc/kubernetes/kops-controller/config/config.yaml
- --log_file=/var/log/kops-controller.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blast from the past :-)

requests:
cpu: 50m
memory: 50Mi
securityContext:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a case that kops-controller is important so should have a log in /var/log, but the components that are there today are the ones that we can't get with kubectl logs (when they fail). Is there e.g. a perf issue with getting kops-controller logs from kubectl logs (or some other reason)? Because now we will have to deal with the scanners flagging this one as running as root...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a performance issue and assuming the control plane is bootstrapped, we can see the logs from kops toolbox dump.

Happy to remove it if it's too tricky

@@ -0,0 +1 @@
defaultBaseImage: gcr.io/distroless/static-debian12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to introducing this file so we can more easily configure some of these ko options, for example I realized we aren't passing the Version properly to ko

if err != nil {
return err
}
b.AddFirewallRulesTasks(c, "ssh-external-to-master", &gcetasks.FirewallRule{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this hits the "main code path" of kOps. Can we SSH through the bastion? Or if not, can we somehow make this not change the configuration for "everyone else" - e.g. with a feature flag or by adding something in the cluster or instancegroup? (The feature flag is normally easiest)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific change won't be in this PR, just need the patch to get the scale tests passing.

It is being worked in a different PR

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2025
@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 24, 2025

@upodroid: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmit-kops-gce-scale-ipalias-using-cl2 11f922b link true /test presubmit-kops-gce-scale-ipalias-using-cl2
presubmit-kops-aws-scale-amazonvpc-using-cl2 11f922b link false /test presubmit-kops-aws-scale-amazonvpc-using-cl2

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/addons area/nodeup area/provider/gcp Issues or PRs related to gcp provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants