-
Notifications
You must be signed in to change notification settings - Fork 30
[Core][REP] GPU Memory awareness scheduling #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
jjyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will continue
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
| ```python | ||
| # Request a fractional GPU with specified gpu_memory in bytes. | ||
| # Mutually exclusive with num_gpus. | ||
| @ray.remote(gpu_memory=1024 * 1024 * 1024) # 1 mb request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support string-based syntactic sugar? Feels more pythonic that way (i.e., gpu_memory="3gb")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now we just follow how memory is defined. I think the pythonic support can be done separately which covers both gpu_memory and memory changes
| ```python | ||
| pg = placement_group([{"gpu_memory": 1024 * 1024, "CPU": 1}, {"GPU": 1}]) | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the observability section here as this complicates the observability semantics.
- how is it displayed in ray status?
- for ray status, it should potentially display sth like gpu_memory: 4 gpus (A10) * 3gb?
- In ray status, if a task is scheduled with gpu_memory, both gpu & gpu memory values are subtracted?
- How is it displayed in resource_requirement in ray list tasks? Is it translated into num_gpus? Or it only includes gpu_memory? Or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ray list nodes, it will be GPU (resources left) * gpu_memory_per_gpu which is the constant stored in node label. ray status, ray list task and ray.available_resources currently didn't show GPU memory but if we added one, it will be the same as ray list nodes.
and yes, basically both gpu and gpu_memory values are subtracted to show the remaining
|
|
||
| # Requesting 30GB of GPU memory from a A10 GPU with 24GB of memory. | ||
| # Task won't be able to be scheduled. | ||
| @ray.remote(gpu_memory=30 * 1024 * 1024 * 1024 * 1024, accelerator_type="NVIDIA_TESLA_A10G") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have 40GB gpu
and schedule 1 task with 20GB
and schedule another with with num_gpus=1, would it fail to schedule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the second one will fail since the GPU remaining after scheduler 20GB task will be 0.5
|
|
||
| ```python | ||
| # Request a fractional GPU both num_gpus and gpu_memory is not allowed | ||
| @ray.remote(gpu_memory=1024 * 1024 * 1024, num_gpus=0.5) # raise ValueError exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to express 2 GPUs using gpu_memory? Or is it not allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you specify this in REP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not allowed, since only either one of num_gpus or gpu_memory (1 gpu per request) can be specified in request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could they both be allowed? If both num_gpus and gpu_memory are specified, then it would require that much memory on that many GPUs. num_gpus would default to 1, so not specifying it would get the behavior described above. It could be an error condition to specify a fractional value for num_gpus if also specifying gpu_memory. Thoughts?
The GPU memory scheduling prototype:
ray-project/ray#41147