[Core][REP] GPU Memory awareness scheduling #47

jonathan-anyscale · 2023-11-19T02:12:00Z

The GPU memory scheduling prototype:
ray-project/ray#41147

Signed-off-by: Jonathan Nitisastro <[email protected]>

.ipynb_checkpoints/Untitled-checkpoint.ipynb

reps/2023-10-30-gpu-memory-support.md

Signed-off-by: Jonathan Nitisastro <[email protected]>

jjyao

will continue

reps/2023-10-30-gpu-memory-support.md

Signed-off-by: Jonathan Nitisastro <[email protected]>

reps/2023-10-30-gpu-memory-support.md

Signed-off-by: Jonathan Nitisastro <[email protected]>

Signed-off-by: Jiajun Yao <[email protected]>

rkooo567 · 2023-12-08T05:44:34Z

reps/2023-10-30-gpu-memory-support.md

+```python
+# Request a fractional GPU with specified gpu_memory in bytes.
+# Mutually exclusive with num_gpus.
+@ray.remote(gpu_memory=1024 * 1024 * 1024) # 1 mb request


Can we support string-based syntactic sugar? Feels more pythonic that way (i.e., gpu_memory="3gb")

for now we just follow how memory is defined. I think the pythonic support can be done separately which covers both gpu_memory and memory changes

rkooo567 · 2023-12-08T05:50:45Z

reps/2023-10-30-gpu-memory-support.md

+```python
+pg = placement_group([{"gpu_memory": 1024 * 1024, "CPU": 1}, {"GPU": 1}])
+```
+


I think we need the observability section here as this complicates the observability semantics.

how is it displayed in ray status?

for ray status, it should potentially display sth like gpu_memory: 4 gpus (A10) * 3gb?

In ray status, if a task is scheduled with gpu_memory, both gpu & gpu memory values are subtracted?

How is it displayed in resource_requirement in ray list tasks? Is it translated into num_gpus? Or it only includes gpu_memory? Or both?

in ray list nodes, it will be GPU (resources left) * gpu_memory_per_gpu which is the constant stored in node label. ray status, ray list task and ray.available_resources currently didn't show GPU memory but if we added one, it will be the same as ray list nodes.

and yes, basically both gpu and gpu_memory values are subtracted to show the remaining

rkooo567 · 2023-12-08T05:51:35Z

reps/2023-10-30-gpu-memory-support.md

+
+# Requesting 30GB of GPU memory from a A10 GPU with 24GB of memory.
+# Task won't be able to be scheduled.
+@ray.remote(gpu_memory=30 * 1024 * 1024 * 1024 * 1024, accelerator_type="NVIDIA_TESLA_A10G")


If you have 40GB gpu

and schedule 1 task with 20GB
and schedule another with with num_gpus=1, would it fail to schedule?

yes, the second one will fail since the GPU remaining after scheduler 20GB task will be 0.5

rkooo567 · 2023-12-08T05:55:03Z

reps/2023-10-30-gpu-memory-support.md

+
+```python
+# Request a fractional GPU both num_gpus and gpu_memory is not allowed
+@ray.remote(gpu_memory=1024 * 1024 * 1024, num_gpus=0.5) # raise ValueError exception


is it possible to express 2 GPUs using gpu_memory? Or is it not allowed?

can you specify this in REP?

it's not allowed, since only either one of num_gpus or gpu_memory (1 gpu per request) can be specified in request

Could they both be allowed? If both num_gpus and gpu_memory are specified, then it would require that much memory on that many GPUs. num_gpus would default to 1, so not specifying it would get the behavior described above. It could be an error condition to specify a fractional value for num_gpus if also specifying gpu_memory. Thoughts?

init rep doc

981349e

Signed-off-by: Jonathan Nitisastro <[email protected]>

jonathan-anyscale marked this pull request as draft November 19, 2023 03:34

add conversion example

0f73faa

Signed-off-by: Jonathan Nitisastro <[email protected]>

jonathan-anyscale assigned jjyao Nov 20, 2023

add usage example

cc21d13

Signed-off-by: Jonathan Nitisastro <[email protected]>

jjyao reviewed Nov 27, 2023

View reviewed changes

.ipynb_checkpoints/Untitled-checkpoint.ipynb Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

clean rep

35839fd

Signed-off-by: Jonathan Nitisastro <[email protected]>

jjyao reviewed Nov 28, 2023

View reviewed changes

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Show resolved Hide resolved

more fix

38ee994

Signed-off-by: Jonathan Nitisastro <[email protected]>

jjyao reviewed Dec 1, 2023

View reviewed changes

jjyao reviewed Dec 2, 2023

View reviewed changes

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

reps/2023-10-30-gpu-memory-support.md Outdated Show resolved Hide resolved

jonathan-anyscale and others added 2 commits December 2, 2023 20:20

placement group example

af09266

Signed-off-by: Jonathan Nitisastro <[email protected]>

final update

20e553b

Signed-off-by: Jiajun Yao <[email protected]>

jonathan-anyscale marked this pull request as ready for review December 6, 2023 20:00

rkooo567 reviewed Dec 8, 2023

View reviewed changes

This was referenced Dec 11, 2023

[Ray Core] - Add ability to specify gpu memory resources in addition to gpu units ray-project/ray#37574

Open

[Core] Remote placement using gpu memory ray-project/ray#26929

Open

movchan74 mentioned this pull request Feb 20, 2024

Pydantic v2 Migration dropbox/aana_sdk#54

Merged

anyscalesam added enhancement New feature or request pending-committer-vote labels Sep 24, 2024

[Core][REP] GPU Memory awareness scheduling #47

Are you sure you want to change the base?

[Core][REP] GPU Memory awareness scheduling #47

Uh oh!

Conversation

jonathan-anyscale commented Nov 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjyao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jonathan-anyscale commented Nov 19, 2023 •

edited

Loading