This is a follow on to a point mentioned in #4263
Perhaps as a longer term goal adaptivity should be handled entirely by the scheduler via a scheduler plugin.
Right now the status is that adaptivity can be running form anywhere a Cluster object exists.
Normally the only place a cluster object exists is the laptop (etc) that first connected to the cluster.
And what it does is do a background task that periodically talks to the scheduler and checks if it would like more workers (or less).
The reasons it might is maybe some have died due to OOM, or perhaps been taken offline.
Happens for example with Fargate Spot using daskcloudproviders, the scheduler will never be taken off line by the workers might.
And without that adaptivity no new ones ever come back on to replace them.
Also might be that their is a lot of work and so it would like more workers, that isn't so problematic but still is a key feature of adaptivity.
So adaptivity running from where-ever the cluster was connected to was started is kind of annoying.
Because it is pretty important,
If you have multiple people seperately connecting to the cluster, and all running adaptivity, that can do weird things I think, like provision workers that are not needed.
And if no one is you have problems.
most likely in shared clusters you would be having one person who setup the cluster and actually knows the correct settings adaptivity can range over, and most users shouldn't touch it.
In my particular (ab)use case, the originating laptop will often be shutdown without waiting for results, because we use fire_and_forget as we know that as a side effect of the last job that gets scheduled the result we are looking for gets written out to a database. But right now the laptop has to keep running and has to keep being able to connect to the cluster, just to run the adaptivity.
My current planned work-around is to start a seperate process on the same machine that is running the scheduler. Have it connect to the cluster (which for it is localhost), and then have it just running the adaptivity.
An alternative i have considered, was same thing: connect to the existing cluster, then run _adaptivity_, but actually run it in the scheduler's process via client.run_on_schduler.
But i feel like the ideal solution is this is just on the scheduler always.
And is probably configured by a SchedulerPlugin, which if needed the user can always replace with a new instance of the plugin with different settings.
Alternatively is not a plugin, but is something that the user can connect to the scheduler and change the settings on (so calling cluster.adaptivity would connect to the scheduler and change it's target).
Either way, who ever tries to set it last should win, rather than both happening at the same time managed from different peoples laptops etc.
And the adaptivity should continue running even if the laptop is not running anymore.
This is a follow on to a point mentioned in #4263
Right now the status is that adaptivity can be running form anywhere a
Clusterobject exists.Normally the only place a cluster object exists is the laptop (etc) that first connected to the cluster.
And what it does is do a background task that periodically talks to the scheduler and checks if it would like more workers (or less).
The reasons it might is maybe some have died due to OOM, or perhaps been taken offline.
Happens for example with Fargate Spot using daskcloudproviders, the scheduler will never be taken off line by the workers might.
And without that adaptivity no new ones ever come back on to replace them.
Also might be that their is a lot of work and so it would like more workers, that isn't so problematic but still is a key feature of adaptivity.
So adaptivity running from where-ever the cluster was connected to was started is kind of annoying.
Because it is pretty important,
If you have multiple people seperately connecting to the cluster, and all running adaptivity, that can do weird things I think, like provision workers that are not needed.
And if no one is you have problems.
most likely in shared clusters you would be having one person who setup the cluster and actually knows the correct settings adaptivity can range over, and most users shouldn't touch it.
In my particular (ab)use case, the originating laptop will often be shutdown without waiting for results, because we use
fire_and_forgetas we know that as a side effect of the last job that gets scheduled the result we are looking for gets written out to a database. But right now the laptop has to keep running and has to keep being able to connect to the cluster, just to run the adaptivity.My current planned work-around is to start a seperate process on the same machine that is running the scheduler. Have it connect to the cluster (which for it is
localhost), and then have it just running the adaptivity.An alternative i have considered, was same thing: connect to the existing cluster, then run
_adaptivity_, but actually run it in the scheduler's process viaclient.run_on_schduler.But i feel like the ideal solution is this is just on the scheduler always.
And is probably configured by a
SchedulerPlugin, which if needed the user can always replace with a new instance of the plugin with different settings.Alternatively is not a plugin, but is something that the user can connect to the scheduler and change the settings on (so calling
cluster.adaptivitywould connect to the scheduler and change it's target).Either way, who ever tries to set it last should win, rather than both happening at the same time managed from different peoples laptops etc.
And the adaptivity should continue running even if the laptop is not running anymore.