Adaptivity should be managed on the scheduler

This is a follow on to a point mentioned in #4263

> Perhaps as a longer term goal adaptivity should be handled entirely by the scheduler via a scheduler plugin.

Right now the status is that _adaptivity_ can be running form anywhere a `Cluster` object exists.
Normally the only place a cluster object exists is the laptop (etc) that first connected to the cluster.
And what it does is do a background task that periodically talks to the scheduler and checks if it would like more workers (or less).
The reasons it might is maybe some have died due to OOM, or perhaps been taken offline.
Happens for example with Fargate Spot using daskcloudproviders, the scheduler will never be taken off line by the workers might.
And without that _adaptivity_ no new ones ever come back on to replace them.
Also might be that their is a lot of work and so it would like more workers, that isn't so problematic but still is a key feature of adaptivity.

So _adaptivity_ running from where-ever the cluster was connected to was started is kind of annoying.
Because it is pretty important,
If you have multiple people seperately connecting to the cluster, and all running _adaptivity_, that can do weird things I think, like provision workers that are not needed.
And if no one is you have problems.
most likely in shared clusters you would be having one person who setup the cluster and actually knows the correct settings _adaptivity_ can range over, and most users shouldn't touch it.

In my particular (ab)use case, the originating laptop will often be shutdown without waiting for results, because we use `fire_and_forget` as we know that as a side effect of the last job that gets scheduled the result we are looking for gets written out to a database. But right now the laptop has to keep running and has to keep being able to connect to the cluster, just to run the _adaptivity_.

My current planned work-around is to start a seperate process on the same machine that is running the scheduler. Have it connect to the cluster (which for it is `localhost`), and then have it just running the _adaptivity_.
An alternative i have considered, was same thing: connect to the existing cluster, then run `_adaptivity_`, but actually run it in the scheduler's process via [`client.run_on_schduler`](https://docs.dask.org/en/stable/futures.html#distributed.Client.run_on_scheduler).

But i feel like the ideal solution is this is just on the scheduler always.
And is probably configured by a `SchedulerPlugin`, which if needed the user can always replace with a new instance of the plugin with different settings.
Alternatively is not a plugin, but is something that the user can connect to the scheduler and change the settings on (so calling `cluster.adaptivity` would connect to the scheduler and change it's target).
Either way, who ever tries to set it last should win, rather than both happening at the same time managed from different peoples laptops etc.
And the adaptivity should continue running even if the laptop is not running anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adaptivity should be managed on the scheduler #9307

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Adaptivity should be managed on the scheduler #9307

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions