-
Notifications
You must be signed in to change notification settings - Fork 193
Description
Brief summary
The starter and stopper jobs lack a BackoffLimit setting, causing them to retry up to 6 times on failures, while initializer and runner jobs are configured with BackoffLimit: &zero32 for immediate failure. This inconsistency leads to excessive pod creation, resource waste, and delayed error detection when starter/stopper curl connections fail.
k6-operator version or image
latest/main branch
Helm chart version (if applicable)
No response
TestRun / PrivateLoadZone YAML
apiVersion: k6.io/v1alpha1
kind: TestRun
metadata:
name: k6-test
spec:
parallelism: 1
script:
configMap:
name: test
file: test.jsOther environment details (if applicable)
No response
Steps to reproduce the problem
- Create a TestRun that will cause starter/stopper curl to fail (e.g., by deleting k6 runner pod immediately after creation for starter, or causing network issues during test execution for stopper)
- Observe that starter/stopper jobs create multiple pods (up to 6) that all fail
- Compare with initializer and runner jobs which have BackoffLimit: &zero32
Example commands
- kubectl apply -f testrun.yaml
- kubectl delete pod k6-test-1-xxx # Delete runner pod to cause starter failure
- kubectl get pods -l job-name=k6-test-starter # Observe multiple failed pods
- kubectl get pods -l job-name=k6-test-stopper # Observe multiple failed pods for stopper
Expected behaviour
Starter and stopper jobs should fail immediately (after curl's internal 3 retries) and not create multiple pods, consistent with initializer and runner job behavior which both use BackoffLimit: &zero32
Actual behaviour
Starter and stopper jobs create up to 6 pods (default Kubernetes BackoffLimit) when curl connection fails, causing:
- Resource waste: Multiple unnecessary pods created
- Inconsistent behavior: initializer/runner fail immediately, but starter/stopper retry 6 times
- Excessive logging: Same failure logged 6 times instead of once
- Delayed failure detection: Takes longer to identify actual issues
Each pod performs curl with --retry 3, so total attempts = 6 pods × 3 curl retries = 18 attempts for the same failure.
Current code in pkg/resources/jobs/starter.go and pkg/resources/jobs/stopper.go lack BackoffLimit setting, while initializer.go and runner.go both have BackoffLimit: &zero32.