Skip to content

coll: CSEL redesign#7547

Draft
hzhou wants to merge 57 commits intopmodels:mainfrom
hzhou:2508_csel
Draft

coll: CSEL redesign#7547
hzhou wants to merge 57 commits intopmodels:mainfrom
hzhou:2508_csel

Conversation

@hzhou
Copy link
Copy Markdown
Contributor

@hzhou hzhou commented Aug 25, 2025

Pull Request Description

  • coll_algorithms.txt catalogs all collective algorithms and conditions
  • coll_selection.json specifies decision tree
  • JSON subtree allows composition and local customization
  • MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS for debug summary

MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS

[0] ==== Dump collective algorithm counters ====
[0]          4  MPIR_Bcast_intra_scatter_ring_allgather
[0]         16  MPIDI_POSIX_mpi_bcast_release_gather
[0]          1  MPIR_Reduce_intra_binomial
[0] ==== END collective algorithm counters ====

[skip warnings]

Discussion

Reference: #7544
Also see comments in #7598 and #7666

image

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou force-pushed the 2508_csel branch 3 times, most recently from 23d90b4 to 98abcd7 Compare September 2, 2025 19:00
@hzhou hzhou force-pushed the 2508_csel branch 17 times, most recently from bc86294 to e3acd5a Compare September 5, 2025 20:06
@hzhou hzhou mentioned this pull request Apr 8, 2026
4 tasks
hzhou added 6 commits April 8, 2026 21:09
We will use a single-level JSON for algorithm selection including
device-specific algorithms. Remove the collective ADI for now. We'll add
the mechanism of selecting device-level algorithms later.

gen_coll.py is updated to skip calling MPID_ collectives.

Device collective CVARs are removed.
We will add the mechanism of selecting device-layer algorithms later.
Temporarily comment out the composition code that calls netmod/shm
collectives since we will remove these apis next.

Some NULL composition functions are removed.
We will replace the device-algorithm selelction later at MPIR-layer.
The auto selection should take care of restrictions. Error rather than
fallback.

If user use CVAR to select specific algorithm, we should check
restrictions before jumping the the algorithm. We will design a common
fallback handling there.
hzhou and others added 29 commits April 8, 2026 21:09
In addition to prototypes for various algorithm function, generate enums
and structs in coll_algos.h as well.

Assume add_prototype won't be called redundantly and keep it simple.
Generate constants for enum MPIR_Csel_coll_type.

Note that we split the entries between intra and inter, e.g.
    MPIR_CSEL_COLL_TYPE__INTRA_BCAST,
    MPIR_CSEL_COLL_TYPE__INTRA_IBCAST,
    MPIR_CSEL_COLL_TYPE__INTER_BCAST,
    MPIR_CSEL_COLL_TYPE__INTER_IBCAST

Temporary patch code to keep the old code buildable.
Define MPIR_Csel_node_s and generate enum MPII_Csel_container_type,
which defines the list of algorithm id constants.

MPIR_Csel_node_s will replace the csel_node_s.
Update load_coll_algos to load coll_algorithms.txt with a conditions
section. Every the condition maps to a condition function.

Also generate G.algo_list as a flat list so we can dump the table of
algorithms and generate sequential algorithm IDs.
Remove the optional validate_tree and print_tree to facilitate
trasitioning to auto generated parsing routines.

We will add back the debug print routine later.
Add back the debug print routine.
Replace hard coded parsing routines with autogenerated lookup table and
subrountines including MPIR_Coll_algo_names,
MPII_Csel_parse_container_params, MPII_Csel_parse_operator, and
MPII_Csel_run_condition.

Simplify MPIR_Csel_node_s and MPIR_Csel_node_type_e.

The auto-generation from coll_algorithms.txt is in the later commits.
These routines are replaced by condition functions (see previous
commits).
Dump a wrapper function for each algorithm that takes (cont, coll_sig).

Separately Declare algorithm prototypes.

Separately Decleare sched_auto prototypes.
Generate collective implement functions that assemble coll_sig and call
MPIR_Coll_auto.

Remove or replace the old MPIR_Xxx_impl and MPIR_Xxx_allcomm_auto
interfaces. Their original functions, CVAR selection and JSON selection,
are now in MPIR_Coll_auto.
Current compositional algorithms call MPIR collectives. We will refactor
them later. But for now, generate a wrapper MPIR functions that calls
_impl functions.
Add MPIR_init_coll_sig and MPID_init_coll_sig so we can add arbitrary
attr bits or additional fields without hacking maint/gen_coll.py.
Generate those IDs, table entries, and json parsing from
coll_algorithms.txt.
They are replaced by MPIR_Coll_nb.
In coll_algorithms.txt, add "inline" attribute to skip add prototype for
the corresponding algorithm function since it is inlined in the headers.

Add "func_name" to directly specify algorithm function name.

Add "macro_guard" to specify a preproc condition for the algorithm
function. For example, the ch4 posix algorithm function needs be
protected by "#if defined(MPIDI_CH4_SHM_POSIX)" (to be defined).
Add conditional condition - the condition function only can be called
inside preprocess macro guard.

We need generate another header file, coll_autogen.h, that are loaded
after mpidpos.h. "coll_algos.h" goes into mpir_coll.h, which is included
in between mpidpre.h and mpidpost.h.

Refactor a bit so all the conditions parsing logics are wrapped in
functions such as get_conditon_name, get_condition_func, etc. and they
are defined together.
Sometime we may want to do differently between restriction-check and
condition check. For example, algorithm like release_gather normally
gets selelcted only after user calls the collective certain number of
times. But if user selects the algorithm by CVAR, it won't make sense to
do this repeat check in the restriction-check.
Rather than add individual boolean flags, use bit mask "flags" instead.
It is easier to make sure we zero-initialize all the flags that way.
Provide a simple mechanism for a rank to dump collective algorithm
counters.

Set MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS to the global rank of the process
that we want it to dump since it is undesirable for every process to
dump yet it does not always makes sense for rank 0 to dump especially
when we don't always use comm world.

It is counted in the CSEL framework so internal collectives are not
counted when we internally use _fallback algorithms.
Enable CVARs and JSONs to select ch4-posix layer release_gather
algorithms.

Select MPIDI_POSIX_mpi_bcast_release_gather if it passes
MPIDI_CH4_release_gather condition check, which only passes if comm is
an posix intranode comm.
Extend the previous commit to activate release_gather algorithm for
reduce, allreduce, and barrier.
Remove MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE. It is now replaced
with MPIR_CVAR_COLL_SELECTION_JSON_FILE.

Although we could reuse the same CVAR name, but since we altered the
syntax of JSON, using a different name prevents potential confusion.
Parse the json as a list of named subtrees such as:
{
  "name=main": {...},
  "name=bcast-intra-auto": {...},
  ...
}

Inside the subtree, we can refer to the named subtree using "call=name".

If the json does not contain named subtrees, treat it as a single tree
with the name "main".
Load src/mpi/coll/coll_selection.json as named subtrees.

Add MPIR_Coll_run_tree which runs the selection on a subtree.

Replace MPIR_Coll_auto with MPIR_Coll_json, and add
MPIR_Coll_run_tree(csel_tree_auto, coll_sig) to allow recursive
algorithms such as compositional algorithms.

csel_tree_auto will fallback to csel_tree_main if it is not defined in
the json file. But similarly, we can easily introduce more predefined
subtree later, e.g. bcast-intra-auto etc.

In CVAR selection, the "auto" should be default and value should be 0.
Thus it should automatically fallthrough and run on csel_tree_main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant