Skip to content

K3S compatibility #1

@dbkegley

Description

@dbkegley

First of all thank you for sharing your work on this plugin, it has saved me a lot of time already.

I have started working on k3s compatibility for this device plugin and I have gotten to the point when the plugin discovers available TPUs and registers them with the kubelet.

$ kubectl describe nodes
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource              Requests   Limits
  --------              --------   ------
  cpu                   100m (2%)  0 (0%)
  memory                70Mi (7%)  170Mi (18%)
  ephemeral-storage     0 (0%)     0 (0%)
  kkohtaka.org/edgetpu  1          1
I0327 17:02:58.014104       1 plugin.go:98] Started gRPC service on plugin socket
I0327 17:02:58.014183       1 plugin.go:101] Started monitoring devices
I0327 17:02:58.014211       1 plugin.go:49] gRPC server started.
I0327 17:02:58.015416       1 plugin.go:118] Opened connection to kubelet socket
I0327 17:02:58.015486       1 plugin.go:121] Registering dpServer: &{[] 0xd58100 0xd58140}
I0327 17:02:58.017211       1 plugin.go:134] Registered device plugin
I0327 17:02:58.019309       1 server.go:56] Start watching devices
I0327 17:02:58.019419       1 server.go:66] Update a device list
I0327 17:02:58.019455       1 server.go:126] Starting Edge TPU device monitor
I0327 17:03:03.184672       1 server.go:155] Edge TPU became active.
I0327 17:03:03.185096       1 server.go:66] Update a device list
I0327 17:04:03.390627       1 server.go:79] Container TPU request: &ContainerAllocateRequest{DevicesIDs:[42],}
I0327 17:04:03.390765       1 server.go:80] Allocating devices... Device IDs: [42]

side note: it looks like the device id is hardcoded to 42 so only 1 TPU is currently allowed per node, do you plan to support multiple devices?

I am able to schedule a pod which requests a TPU but the container fails to start due to:

Events:
  Type     Reason     Age               From                  Message
  ----     ------     ----              ----                  -------
  Normal   Scheduled  86s               default-scheduler     Successfully assigned default/edgetpu-demo-54f5l to raspberrypi
  Normal   Pulling    2s (x2 over 84s)  kubelet, raspberrypi  pulling image "quay.io/kkohtaka/edgetpu-demo:arm32"
  Warning  Failed     2s                kubelet, raspberrypi  Error: failed to generate container "41e1245b846a2a815f54ca40741d10fc071f91b34eadf22309c52f949ea1d4ce" spec: failed to set devices mapping [&Device{ContainerPath:/dev/bus/usb,HostPath:/dev/bus/usb,Permissions:rw,}]: not a device node
  Normal   Pulled     1s (x2 over 2s)   kubelet, raspberrypi  Successfully pulled image "quay.io/kkohtaka/edgetpu-demo:arm32"
  Warning  Failed     1s                kubelet, raspberrypi  Error: failed to generate container "d72f48a0b04a560b1c7e81ec20d686b6ca3710f0a02fd38cecb1b937b52e8d05" spec: failed to set devices mapping [&Device{ContainerPath:/dev/bus/usb,HostPath:/dev/bus/usb,Permissions:rw,}]: not a device node

It looks like I need to specify an absolute path to the device but I'm not sure what that would be, could you point me in the right direction? I'm using a raspberrypiB+ with a coral accelerator usb for testing

If you're open to it, I'd be happy to submit a PR for k3s support once I get this working.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions