strategy refactoring by Arthur-Prince · Pull Request #4082 · Parsl/parsl

Arthur-Prince · 2026-02-24T05:57:51Z

Description

This PR refactors _general_strategy to reduce technical debt and improve readability.

Each scaling case (Case 1, 2, 4a, 4b) was extracted into its own helper function, and the initialization logic was consolidated into a dedicated init_strategy helper to remove duplication.

This is a structural refactor only. The goal is to make the strategy easier to understand. I believe this will also simplify future work if the strategy becomes a parsl module.

Changed Behaviour

No behavior changes. Scaling decisions and thresholds remain exactly the same.

Fixes

N/A

Type of change

Code maintenance/cleanup

benclifford · 2026-02-25T11:30:03Z

I feel like this makes the code less understandable to me. Look at all these new functions which take some of their state from self in an object oriented style but some of their state from long parameter lists in a more functional style.

Why can't all these new methods be functions and not have access to any kind of self? vs Why can't all the state be in self and no-body has parameters?

Arthur-Prince · 2026-02-25T19:38:48Z

I agree that it looks a bit awkward for the functions to have so many parameters.

My initial idea for structuring it this way was thinking ahead to a future refactor where strategy becomes a Parsl plugin. I thought it would be easier to move the functions around if they were more decoupled from object state.

For the remaining parameters, the main reason I avoided putting them in self is that their state changes on every strategy execution. One alternative could be to introduce something like an ExecutorState object that encapsulates this evolving state, with an update_state() function returning the refreshed snapshot. I think that would be better.

My main motivation overall was that, while writing the unit tests, it was quite difficult to reason about the possible strategy cases, so I tried to make it more explicit.

I’ll make these changes and then you can let me know what you think.

WardLT · 2026-02-26T13:17:40Z

Thanks for taking a look at this, @Arthur-Prince ! I'll aim to get some time to review it this weekend

Arthur-Prince · 2026-03-04T02:52:09Z

Did you look? what do you think?

WardLT · 2026-03-04T13:23:30Z

Thanks for following up. Sorry, I haven't got to it yet

WardLT

I think it's good to go, but do have another request if you have bandwidth and because you just reviewed this code's internals:

Could you document what the options are and how they differ in the elasticity documentation? A brief section after Parallelism would be excellent.

Does that work for you? If not, I'll just merge as-is.

WardLT · 2026-03-06T00:05:37Z

parsl/jobs/strategy.py

+        logger.debug('%s Executor %d active tasks, %d active slots, and %d/%d running/pending blocks',
+                     prefix, self.executors[label].active_tasks, self.executors[label].active_slots, running, pending)
+
+    def _case_1_no_tasks(self, executor: BlockProviderExecutor, prefix: str,) -> None:


Would you mind adding docstring to these functions? Something brief is fine.

WardLT · 2026-03-06T00:11:00Z

Also, I do agree that the class doesn't make much sense, but that might be a large change and the plugins will need some more thought. Adjusting the functions and - ideally - documenting expected behavior moves us towards that direction.

Arthur-Prince · 2026-03-07T06:43:23Z

Could you document what the options are and how they differ in the elasticity documentation? A brief section after Parallelism would be excellent.

I can do that, but it might take me about a week.

Would you mind adding docstring to these functions? Something brief is fine.

Sure, I'll add them.

Also, I do agree that the class doesn't make much sense, but that might be a large change and the plugins will need some more thought. Adjusting the functions and - ideally - documenting expected behavior moves us towards that direction.

Yes, that was also the direction we were planning on PR #4075, but in smaller and easier-to-review steps.

Arthur-Prince · 2026-03-08T05:26:58Z

I forgot to mention one behavior change related to case 1.

Previously, scale-in would only happen when the executor idle duration
was strictly greater than max_idletime. I changed this to >=
because when max_idletime == 0 the scale-in would sometimes not
trigger.

In practice this edge case was previously masked by the
assert idle_since is not None, which made the idle duration rarely
evaluate exactly to zero. After removing that assert during the
refactoring in PR #4075, some tests started failing because scale-in
was not triggered as expected.

WardLT

I'd like to propose a different direction for refactoring: getting rid of the logic where we implicitly define the strategy_type being passed between function by picking which of the wrapper functions (e.g., _strategy_simple) we choice, and instead making strategy_type a class attribute.

Setting strategy_type as a attribute will have the setting be controlled in the same way as any of the settings (like max_idletime).

That would also make strategize a formal class method able to be reflected in the API docs.

WardLT · 2026-03-11T23:45:45Z

parsl/jobs/strategy.py

+        capacity and requests scaling in by that amount, while respecting the
+        executor's minimum block limit.
+        """
+        executor_state = self.executors[executor.label]


I'm with @benclifford on making executor state an argument. The first line of each of these functions is always to lookup the state. So, accessing the state should be pulled out of these classes and used elsewhere.

WardLT · 2026-03-11T23:58:33Z

parsl/jobs/strategy.py

+        else:
+            logger.debug("%s Not requesting any blocks, because at maxblocks already", prefix)
+
+    def _case_4b_more_slots_than_tasks(self, executor: BlockProviderExecutor, prefix: str, strategy_type: str) -> None:


If this PR's a step towards #4075, let's make it very clear what the arguments to these functions are. Each are slightly different. Some w/ and w/o the strategy_type.

WardLT · 2026-03-12T00:06:28Z

parsl/jobs/strategy.py

 class Strategy:
    """Scaling strategy.

    As a workflow dag is processed by Parsl, new tasks are added and completed


Regarding refactoring the docs, all of this seems unrelated to what a "Strategy" class does and how to use it.

Arthur-Prince · 2026-03-12T02:30:37Z

I'll try to clarify the direction I had in mind.

My current understanding of the long-term goal is that Strategy should eventually become a plugin-like component. In that model I was imagining something like:

a Strategy interface
InitOnlyStrategy implementing Strategy
SimpleStrategy extending InitOnlyStrategy
HtexStrategy extending SimpleStrategy

One complication is that today strategy, max_idletime, and strategy_period are parameters of the DFK. In practice, not every strategy needs them. For example:

InitOnlyStrategy does not use any of those parameters.
SimpleStrategy might not need max_idletime, since it only performs scale-in when there are no tasks running on the executor.

Because of that, my idea was that those parameters would eventually be passed to the constructors of each concrete strategy class instead of being handled centrally.

PR #4075 proposes migrating the strategy from the DFK to the BlockProviderExecutor. That migration already helps move toward that design. Another possibility might be migrating the strategy into the provider itself, but that would likely require reimplementing things such as scale_in, scale_out, poll_facade, and handle_errors in the provider, and I'm not sure if that is feasible.

So for now, the goal of this PR is only to clean up the existing code without changing behavior. The intention is to make it easier to later introduce the plugin-style strategy classes (InitOnlyStrategy, SimpleStrategy, HtexStrategy).

@benclifford mentioned that smaller PRs are easier to review, which is why some changes that I think should eventually happen are not included yet. For example:

the logging messages could be improved
function names could be clearer
the following dispatch structure should eventually disappear:

self.strategies = {
    None: self._strategy_init_only,
    "none": self._strategy_init_only,
    "simple": self._strategy_simple,
    "htex_auto_scale": self._strategy_htex_auto_scale,
}

since those functions would become separate strategy classes.

There are also a few issues that I plan to address later:

Case 4a should probably be merged with case 2
The prefix parameter likely should not be passed around everywhere (it could become a class attribute, especially if we end up with one strategy object per executor).

The next PR I was planning is to remove the logic where Strategy owns the list of executors, and instead have JobStatusPoller manage that list. This change would require further refactoring of the strategy code.

After that is done, it should become clearer how the Strategy class should actually look. After that step, the strategy will likely still need to be refactored into three classes and one interface.

I'm with @benclifford on making executor state an argument. The first line of each of these functions is always to lookup the state. So, accessing the state should be pulled out of these classes and used elsewhere.

I agree this makes sense. My initial plan was to do that when implementing the PR that removes the executor list from Strategy. At that stage I was also planning to remove executor: BlockProviderExecutor from several function parameters.

If this PR's a step towards #4075, let's make it very clear what the arguments to these functions are. Each are slightly different. Some w/ and w/o the strategy_type.

Currently strategy_type is only used in case 4b, because that branch performs an additional check to see whether the executor is an instance of HighThroughputExecutor.

The reason is that only HTEX supports a "smart" scale-in, where the block that has been idle the longest (based on max_idletime) can be removed.

Regarding refactoring the docs, all of this seems unrelated to what a "Strategy" class does and how to use it.

Those comments were mostly useful for helping me understand how the class works internally. I agree they are not particularly strong as API documentation.

The only actual mistake in the example is that it is missing max_workers_per_node=2. But i think we can change it when we create the module.

It might be clearer if I first submit a PR that moves toward having one executor per strategy object, and then perform the deeper refactoring of the class afterwards.

If you prefer, I can follow that order instead.

For this PR specifically, I think the improvements I can make are:

rename some functions
improve the logging messages
merge case 2 with case 4a

All while keeping behavior unchanged.

strategy refactoring

e4f9730

refactoring update_state

4a8901a

WardLT self-assigned this Feb 26, 2026

WardLT reviewed Mar 6, 2026

View reviewed changes

Add docstrings for strategy case functions

d0f7696

WardLT requested changes Mar 12, 2026

View reviewed changes

Conversation

Arthur-Prince commented Feb 24, 2026

Description

Changed Behaviour

Fixes

Type of change

Uh oh!

benclifford commented Feb 25, 2026

Uh oh!

Arthur-Prince commented Feb 25, 2026

Uh oh!

WardLT commented Feb 26, 2026

Uh oh!

Arthur-Prince commented Mar 4, 2026

Uh oh!

WardLT commented Mar 4, 2026

Uh oh!

WardLT left a comment

Choose a reason for hiding this comment

Uh oh!

WardLT Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

WardLT commented Mar 6, 2026

Uh oh!

Arthur-Prince commented Mar 7, 2026

Uh oh!

Arthur-Prince commented Mar 8, 2026

Uh oh!

WardLT left a comment

Choose a reason for hiding this comment

Uh oh!

WardLT Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

WardLT Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

WardLT Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Arthur-Prince commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Arthur-Prince commented Mar 12, 2026 •

edited

Loading