-
Notifications
You must be signed in to change notification settings - Fork 38
Introduce configuration to utilize nvidia GPU on dockerIM #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,9 +41,12 @@ const DockerIMType IMType = "docker" | |
|
|
||
| type DockerIMConfig struct { | ||
| DockerImageName string | ||
| GpuManufacturer string | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid terms like
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My suggestion is equivalant to
So, I need to define a new name to pass information whether DockerIM will utilize GPU or not. Retrieving such information by parsing
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's solve #494 (comment) first.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ability to create docker instances using GPU should be part of the CO public API, not hidden as a CO configuration. Please explain your case why do you want to hide this ability from the end users. For reference, the ability to add accelerators is part of the public API for GCE hosts. See #319. Also the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I cannot say it's under same principle because of the way how On the other hand, I think it's a bit complicated to make agreement from here... I'll propose a design around GPU utilization when I have time, perhaps with GPU allocation mechanism too.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. |
||
| HostOrchestratorPort int | ||
| } | ||
|
|
||
| const gpuManufacturerNvidia = "nvidia" | ||
|
|
||
| const ( | ||
| dockerLabelCreatedBy = "created_by" | ||
| dockerLabelKeyManagedBy = "managed_by" | ||
|
|
@@ -66,11 +69,14 @@ const ( | |
| DeleteHostOPType OPType = "deletehost" | ||
| ) | ||
|
|
||
| func NewDockerInstanceManager(cfg Config, cli *client.Client) *DockerInstanceManager { | ||
| func NewDockerInstanceManager(cfg Config, cli *client.Client) (*DockerInstanceManager, error) { | ||
| if cfg.Docker.GpuManufacturer != "" && cfg.Docker.GpuManufacturer != gpuManufacturerNvidia { | ||
| return nil, fmt.Errorf("unsupported GPU manufacturer: %q", cfg.Docker.GpuManufacturer) | ||
| } | ||
| return &DockerInstanceManager{ | ||
| Config: cfg, | ||
| Client: cli, | ||
| } | ||
| }, nil | ||
| } | ||
|
|
||
| func (m *DockerInstanceManager) ListZones() (*apiv1.ListZonesResponse, error) { | ||
|
|
@@ -371,6 +377,9 @@ func (m *DockerInstanceManager) createDockerContainer(ctx context.Context, user | |
| Tty: true, | ||
| Labels: dockerLabelsDict(user), | ||
| } | ||
| if m.Config.Docker.GpuManufacturer == gpuManufacturerNvidia { | ||
| config.Env = []string{"NVIDIA_DRIVER_CAPABILITIES=all"} | ||
| } | ||
| hostConfig := &container.HostConfig{ | ||
| Mounts: []mount.Mount{ | ||
| { | ||
|
|
@@ -381,6 +390,17 @@ func (m *DockerInstanceManager) createDockerContainer(ctx context.Context, user | |
| }, | ||
| Privileged: true, | ||
| } | ||
| if m.Config.Docker.GpuManufacturer == gpuManufacturerNvidia { | ||
| hostConfig.Resources = container.Resources{ | ||
| DeviceRequests: []container.DeviceRequest{ | ||
| { | ||
| Count: -1, | ||
| Capabilities: [][]string{{"gpu"}}, | ||
| }, | ||
| }, | ||
| } | ||
| hostConfig.Runtime = "nvidia" | ||
| } | ||
0405ysj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| createRes, err := m.Client.ContainerCreate(ctx, config, hostConfig, nil, nil, "") | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to create docker container: %w", err) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should return
(instances.Manager, error)if it can fail like this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used
log.Fatalto keep consistency, as other place in this file does. Would you want me to refactor here?