-
Notifications
You must be signed in to change notification settings - Fork 53
Enable persistent Ray cluster state via external Redis GCS fault tolerance #821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: kramaranya <[email protected]>
4ebb3dd
to
8b1fbbd
Compare
great work! lgtm |
Signed-off-by: kramaranya <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #821 +/- ##
==========================================
+ Coverage 92.40% 92.45% +0.04%
==========================================
Files 23 23
Lines 1396 1418 +22
==========================================
+ Hits 1290 1311 +21
- Misses 106 107 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding the tests!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kryanbeane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue link
RHOAIENG-11115
What changes have been made
Provided Ray cluster head pod persistency through GCS fault tolerance.
Added new config options:
enable_gcs_ft
,redis_address
,redis_password_secret
andexternal_storage_namespace
Added unit tests to cover gcs fault tolerance
Verification steps
ray list actors
from RayCluster head podray list actors
- you should see previously created actorChecks