Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e test make sure resource quota error is surfaced #3087

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

han-steve
Copy link
Contributor

@han-steve han-steve commented Feb 21, 2025

Thanks for adding the new conditions to the RayCluster!
Small PR to add a quick integration test to make sure that we surface the resource quota error for better observability.

go test -timeout 30m -v ./test/e2e/raycluster_test.go ./test/e2e/support.go -run TestRayClusterWithResourceQuota
=== RUN   TestRayClusterWithResourceQuota
Modified Ray Image to: rayproject/ray:2.41.0-aarch64 for ARM chips
Modified Ray Image to: rayproject/ray:2.41.0-aarch64 for ARM chips
    raycluster_test.go:140: [2025-02-21T12:30:35-08:00] Created RayCluster test-ns-x22l7/raycluster-resource-quota successfully
    raycluster_test.go:142: [2025-02-21T12:30:35-08:00] Waiting for RayCluster test-ns-x22l7/raycluster-resource-quota to have ReplicaFailure condition
    test.go:100: [2025-02-21T12:30:36-08:00] Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:103: [2025-02-21T12:30:36-08:00] Output directory has been created at: /var/folders/96/5hxqn3kx5dngwwjwnd8v1pv00000gp/T/TestRayClusterWithResourceQuota2456324343/001
--- PASS: TestRayClusterWithResourceQuota (1.05s)
PASS
ok      command-line-arguments  1.628s

@han-steve han-steve changed the title add e2e test make sure resource quota error is surfaced Add e2e test make sure resource quota error is surfaced Feb 21, 2025
@han-steve
Copy link
Contributor Author

han-steve commented Feb 21, 2025

I also have a question about running integration tests. When I run

 make test-e2e 

I always get

--- FAIL: TestRayClusterManagedBy (0.03s)
    --- FAIL: TestRayClusterManagedBy/Failed_creation_of_cluster,_managed_by_external_non_supported_controller (0.01s)
panic: test executed panic(nil) or runtime.Goexit

goroutine 1895 [running]:
testing.tRunner.func1.2({0x102119220, 0x10318ce40})
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1631 +0x1c4
testing.tRunner.func1()
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1634 +0x33c
runtime.Goexit()
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:626 +0x60
testing.(*common).FailNow(0x1400026dba0)
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1005 +0x48
testing.(*common).Fatalf(0x1400026dba0, {0x101ccf8fa?, 0x10?}, {0x14000935960?, 0x1400026db01?, 0x1033d0f18?})
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1089 +0x64
github.com/onsi/gomega.NewWithT.(*Gomega).ConfigureWithT.func1({0x140006e2240, 0x38}, {0x0?, 0x140006e2240?, 0x101968f40?})
        /Users/stevehan/go/pkg/mod/github.com/onsi/[email protected]/internal/gomega.go:37 +0xac
github.com/onsi/gomega/internal.(*Assertion).match(0x14000990080, {0x1023be898, 0x10323d9a0}, 0x1, {0x0, 0x0, 0x0})
        /Users/stevehan/go/pkg/mod/github.com/onsi/[email protected]/internal/assertion.go:106 +0x174
github.com/onsi/gomega/internal.(*Assertion).To(0x14000990080, {0x1023be898, 0x10323d9a0}, {0x0, 0x0, 0x0})
        /Users/stevehan/go/pkg/mod/github.com/onsi/[email protected]/internal/assertion.go:62 +0xa8
github.com/ray-project/kuberay/ray-operator/test/e2e.TestRayClusterManagedBy.func3(0x140009596c0?)
        /Users/stevehan/p/rbx/kuberay/ray-operator/test/e2e/raycluster_test.go:75 +0x41c
testing.tRunner(0x140009596c0, 0x14000691380)
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1689 +0xec
created by testing.(*T).Run in goroutine 1891
        /Users/stevehan/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1742 +0x318
FAIL    github.com/ray-project/kuberay/ray-operator/test/e2e    332.033s
FAIL
make: *** [test-e2e] Error 1

Is this a known issue?
This is the issue: #2970

@han-steve
Copy link
Contributor Author

Another question, since each integration test creates its own namespace, is there a way to run the integration tests in parallel? I see we do t.Parallel within each subtest, but we don't do it for the top-level tests.

@kevin85421
Copy link
Member

@rueian would you mind reviewing this PR?

Comment on lines +143 to +154
g.Eventually(func() bool {
rc, err := RayCluster(test, namespace.Name, rayCluster.Name)()
if err != nil {
return false
}
for _, condition := range rc.Status.Conditions {
if condition.Type == "ReplicaFailure" && strings.Contains(condition.Message, "forbidden: exceeded quota") {
return true
}
}
return false
}, TestTimeoutShort).Should(BeTrue(), "Expected ReplicaFailure condition with message containing 'forbidden: exceeded quota'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @han-steve,

Thank you for the new test!

Suggested change
g.Eventually(func() bool {
rc, err := RayCluster(test, namespace.Name, rayCluster.Name)()
if err != nil {
return false
}
for _, condition := range rc.Status.Conditions {
if condition.Type == "ReplicaFailure" && strings.Contains(condition.Message, "forbidden: exceeded quota") {
return true
}
}
return false
}, TestTimeoutShort).Should(BeTrue(), "Expected ReplicaFailure condition with message containing 'forbidden: exceeded quota'")
g.Eventually(RayCluster(test, namespace.Name, rayCluster.Name), TestTimeoutMedium).
Should(WithTransform(StatusCondition(rayv1.RayClusterReplicaFailure), MatchCondition(metav1.ConditionTrue, ...)))

Could you also help us extend the existing MatchCondition or add a new matcher that can match condition message?

@rueian
Copy link
Contributor

rueian commented Feb 22, 2025

Another question, since each integration test creates its own namespace, is there a way to run the integration tests in parallel? I see we do t.Parallel within each subtest, but we don't do it for the top-level tests.

I think it could be a good try but should be careful about resource control for the stability of the tests themselves. Log isolation for easy debugging might be another issue to be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants