Skip to content

strategy refactoring#4082

Open
Arthur-Prince wants to merge 3 commits intoParsl:masterfrom
Arthur-Prince:strategy_refactoring
Open

strategy refactoring#4082
Arthur-Prince wants to merge 3 commits intoParsl:masterfrom
Arthur-Prince:strategy_refactoring

Conversation

@Arthur-Prince
Copy link
Contributor

Description

This PR refactors _general_strategy to reduce technical debt and improve readability.

Each scaling case (Case 1, 2, 4a, 4b) was extracted into its own helper function, and the initialization logic was consolidated into a dedicated init_strategy helper to remove duplication.

This is a structural refactor only. The goal is to make the strategy easier to understand. I believe this will also simplify future work if the strategy becomes a parsl module.

Changed Behaviour

No behavior changes. Scaling decisions and thresholds remain exactly the same.

Fixes

N/A

Type of change

  • Code maintenance/cleanup

@benclifford
Copy link
Collaborator

I feel like this makes the code less understandable to me. Look at all these new functions which take some of their state from self in an object oriented style but some of their state from long parameter lists in a more functional style.

Why can't all these new methods be functions and not have access to any kind of self? vs Why can't all the state be in self and no-body has parameters?

@Arthur-Prince
Copy link
Contributor Author

I agree that it looks a bit awkward for the functions to have so many parameters.

My initial idea for structuring it this way was thinking ahead to a future refactor where strategy becomes a Parsl plugin. I thought it would be easier to move the functions around if they were more decoupled from object state.

For the remaining parameters, the main reason I avoided putting them in self is that their state changes on every strategy execution. One alternative could be to introduce something like an ExecutorState object that encapsulates this evolving state, with an update_state() function returning the refreshed snapshot. I think that would be better.

My main motivation overall was that, while writing the unit tests, it was quite difficult to reason about the possible strategy cases, so I tried to make it more explicit.

I’ll make these changes and then you can let me know what you think.

@WardLT WardLT self-assigned this Feb 26, 2026
@WardLT
Copy link
Contributor

WardLT commented Feb 26, 2026

Thanks for taking a look at this, @Arthur-Prince ! I'll aim to get some time to review it this weekend

@Arthur-Prince
Copy link
Contributor Author

Did you look? what do you think?

@WardLT
Copy link
Contributor

WardLT commented Mar 4, 2026

Thanks for following up. Sorry, I haven't got to it yet

Copy link
Contributor

@WardLT WardLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to go, but do have another request if you have bandwidth and because you just reviewed this code's internals:

Could you document what the options are and how they differ in the elasticity documentation? A brief section after Parallelism would be excellent.

Does that work for you? If not, I'll just merge as-is.

logger.debug('%s Executor %d active tasks, %d active slots, and %d/%d running/pending blocks',
prefix, self.executors[label].active_tasks, self.executors[label].active_slots, running, pending)

def _case_1_no_tasks(self, executor: BlockProviderExecutor, prefix: str,) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding docstring to these functions? Something brief is fine.

@WardLT
Copy link
Contributor

WardLT commented Mar 6, 2026

Also, I do agree that the class doesn't make much sense, but that might be a large change and the plugins will need some more thought. Adjusting the functions and - ideally - documenting expected behavior moves us towards that direction.

@Arthur-Prince
Copy link
Contributor Author

Could you document what the options are and how they differ in the elasticity documentation? A brief section after Parallelism would be excellent.

I can do that, but it might take me about a week.

Would you mind adding docstring to these functions? Something brief is fine.

Sure, I'll add them.

Also, I do agree that the class doesn't make much sense, but that might be a large change and the plugins will need some more thought. Adjusting the functions and - ideally - documenting expected behavior moves us towards that direction.

Yes, that was also the direction we were planning on PR #4075, but in smaller and easier-to-review steps.

@Arthur-Prince
Copy link
Contributor Author

I forgot to mention one behavior change related to case 1.

Previously, scale-in would only happen when the executor idle duration
was strictly greater than max_idletime. I changed this to >=
because when max_idletime == 0 the scale-in would sometimes not
trigger.

In practice this edge case was previously masked by the
assert idle_since is not None, which made the idle duration rarely
evaluate exactly to zero. After removing that assert during the
refactoring in PR #4075, some tests started failing because scale-in
was not triggered as expected.

Copy link
Contributor

@WardLT WardLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose a different direction for refactoring: getting rid of the logic where we implicitly define the strategy_type being passed between function by picking which of the wrapper functions (e.g., _strategy_simple) we choice, and instead making strategy_type a class attribute.

Setting strategy_type as a attribute will have the setting be controlled in the same way as any of the settings (like max_idletime).

That would also make strategize a formal class method able to be reflected in the API docs.

capacity and requests scaling in by that amount, while respecting the
executor's minimum block limit.
"""
executor_state = self.executors[executor.label]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @benclifford on making executor state an argument. The first line of each of these functions is always to lookup the state. So, accessing the state should be pulled out of these classes and used elsewhere.

else:
logger.debug("%s Not requesting any blocks, because at maxblocks already", prefix)

def _case_4b_more_slots_than_tasks(self, executor: BlockProviderExecutor, prefix: str, strategy_type: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR's a step towards #4075, let's make it very clear what the arguments to these functions are. Each are slightly different. Some w/ and w/o the strategy_type.

class Strategy:
"""Scaling strategy.

As a workflow dag is processed by Parsl, new tasks are added and completed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding refactoring the docs, all of this seems unrelated to what a "Strategy" class does and how to use it.

@Arthur-Prince
Copy link
Contributor Author

Arthur-Prince commented Mar 12, 2026

I'll try to clarify the direction I had in mind.

My current understanding of the long-term goal is that Strategy should eventually become a plugin-like component. In that model I was imagining something like:

  • a Strategy interface
  • InitOnlyStrategy implementing Strategy
  • SimpleStrategy extending InitOnlyStrategy
  • HtexStrategy extending SimpleStrategy

One complication is that today strategy, max_idletime, and strategy_period are parameters of the DFK. In practice, not every strategy needs them. For example:

  • InitOnlyStrategy does not use any of those parameters.
  • SimpleStrategy might not need max_idletime, since it only performs scale-in when there are no tasks running on the executor.

Because of that, my idea was that those parameters would eventually be passed to the constructors of each concrete strategy class instead of being handled centrally.

PR #4075 proposes migrating the strategy from the DFK to the BlockProviderExecutor. That migration already helps move toward that design. Another possibility might be migrating the strategy into the provider itself, but that would likely require reimplementing things such as scale_in, scale_out, poll_facade, and handle_errors in the provider, and I'm not sure if that is feasible.

So for now, the goal of this PR is only to clean up the existing code without changing behavior. The intention is to make it easier to later introduce the plugin-style strategy classes (InitOnlyStrategy, SimpleStrategy, HtexStrategy).

@benclifford mentioned that smaller PRs are easier to review, which is why some changes that I think should eventually happen are not included yet. For example:

  • the logging messages could be improved
  • function names could be clearer
  • the following dispatch structure should eventually disappear:
self.strategies = {
    None: self._strategy_init_only,
    "none": self._strategy_init_only,
    "simple": self._strategy_simple,
    "htex_auto_scale": self._strategy_htex_auto_scale,
}

since those functions would become separate strategy classes.

There are also a few issues that I plan to address later:

  • Case 4a should probably be merged with case 2
  • The prefix parameter likely should not be passed around everywhere (it could become a class attribute, especially if we end up with one strategy object per executor).

The next PR I was planning is to remove the logic where Strategy owns the list of executors, and instead have JobStatusPoller manage that list. This change would require further refactoring of the strategy code.

After that is done, it should become clearer how the Strategy class should actually look. After that step, the strategy will likely still need to be refactored into three classes and one interface.


I'm with @benclifford on making executor state an argument. The first line of each of these functions is always to lookup the state. So, accessing the state should be pulled out of these classes and used elsewhere.

I agree this makes sense. My initial plan was to do that when implementing the PR that removes the executor list from Strategy. At that stage I was also planning to remove executor: BlockProviderExecutor from several function parameters.

If this PR's a step towards #4075, let's make it very clear what the arguments to these functions are. Each are slightly different. Some w/ and w/o the strategy_type.

Currently strategy_type is only used in case 4b, because that branch performs an additional check to see whether the executor is an instance of HighThroughputExecutor.

The reason is that only HTEX supports a "smart" scale-in, where the block that has been idle the longest (based on max_idletime) can be removed.

Regarding refactoring the docs, all of this seems unrelated to what a "Strategy" class does and how to use it.

Those comments were mostly useful for helping me understand how the class works internally. I agree they are not particularly strong as API documentation.

The only actual mistake in the example is that it is missing max_workers_per_node=2. But i think we can change it when we create the module.


It might be clearer if I first submit a PR that moves toward having one executor per strategy object, and then perform the deeper refactoring of the class afterwards.

If you prefer, I can follow that order instead.

For this PR specifically, I think the improvements I can make are:

  • rename some functions
  • improve the logging messages
  • merge case 2 with case 4a

All while keeping behavior unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants