Skip to content

Improve advanced KubeCluster customization API #899

Open
@jacobtomlinson

Description

@jacobtomlinson

Background

When creating a new KubeCluster the __init__()/_start() process happens in a couple of steps.

  1. Generate a dictionary of the Kubernetes spec from provided kwargs using make_cluster_spec()
  2. Create a DaskCluster class (which was created by kr8s.asyncio.objects.make_class()) from that dictionary
  3. Call DaskCluster.create()

If users want to customize their cluster spec before creating it they can use make_cluster_spec() directly which takes the same kwargs as KubeCluster and returns the dictionary that is generated in step 1, then they can modify the dict before passing it directly to step 2 via the kwarg KubeCluster(custom_cluster_spec=...) .

Problem statement

When it comes to adding new kwargs to make_cluster_spec() we are very conservative because it is easy for that method to become extremely overwhelming. However as highlighted in #898 there may be customizations we want to do regularly like setting the memory limit on the dask worker CLI args which is not very ergonomic to do by modifying the dict directly. It would be nice of the object returned from make_cluster_spec() was easier to customize than a simple dictionary.

For example

from dask_kubernetes.operator import KubeCluster, make_cluster_spec

cluster_spec = make_cluster_spec("foo", ...)

# Currently we do this
cluster_spec["spec"]["worker"]["spec"]["containers"][0]["args"] += ["--memory-limit", "4GB"]
# It would be nicer if we could do something like this instead
# cluster_spec.set_worker_memory_limit("4GB")

cluster = KubeCluster(custom_cluster_spec=cluster_spec)

Option 1

Instead of returning a dict from make_cluster_spec() we could return the DaskCluster object from step 2. This object is dict-like so any existing code that modifies the dict should work as expected (although this will need to be verified).

Pros

The existing DaskCluster class is a convenient place to add methods to simplify some of the common customizations. It will be created anyway, we would just be changing the step at which the user can intervene from 1 to 2.

Cons

The DaskCluster object that would be returned by make_cluster_spec() would be an async kr8s object and potentially KubeCluster is being used in a sync context, and we would then be extending it with sync methods to modify things.

Returning the DaskCluster object leaks the implementation detail that we are using kr8s to interact with Kubernetes.

It also means in theory a user could call await cluster_spec.create() which would create the Dask cluster, but it wouldn't handle any of the port forwarding, log retrieval, etc and couldn't be passed to dask.distributed.Client.

Having KubeCluster and DaskCluster, which serve different purposes but have similar names, may cause confusion with users.

Option 2

We could introduce some new intermediate object with a name like KubeClusterSpec which would effectively be a subclass of dict but with whatever convenience methods we want. We would return this from make_cluster_spec().

We would need to ensure that DaskCluster can be passed a KubeClusterSpec instead of a dict. There is already support in kr8s for passing a few different things, so we should be able to just conform to one of those.

Pros

Adding a new class would avoid any user confusion as users would continue to never interact with kr8s resources directly.

The purpose of the new class is well scoped, it is a dictionary of the spec with some utility methods to simplify modifying the spec.

Cons

Introducing a new class into the chain increases complexity.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions