Skip to content

OSHMEM/MCA/SPML/UCX: added support for team management functions#13177

Merged
janjust merged 1 commit intoopen-mpi:mainfrom
roiedanino:shmem/1.5-support-ucx
Apr 24, 2025
Merged

OSHMEM/MCA/SPML/UCX: added support for team management functions#13177
janjust merged 1 commit intoopen-mpi:mainfrom
roiedanino:shmem/1.5-support-ucx

Conversation

@roiedanino
Copy link
Copy Markdown
Contributor

@roiedanino roiedanino commented Apr 6, 2025

Implemented team-management functions according to OpenSHMEM 1.5 spec

Tests PR: https://github.com/openshmem-org/tests-mellanox/pull/59/files

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2025

Hello! The Git Commit Checker CI bot found a few problems with this PR:

26401af: OSHMEM/MCA/SPML/UCX: removed unnecessary team_type...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@roiedanino roiedanino force-pushed the shmem/1.5-support-ucx branch 2 times, most recently from 41068b3 to cff7c93 Compare April 7, 2025 10:43
@github-actions
Copy link
Copy Markdown

Hello! The Git Commit Checker CI bot found a few problems with this PR:

450086a: OSHMEM/MCA/SPML/UCX: WIP

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@roiedanino roiedanino force-pushed the shmem/1.5-support-ucx branch from 450086a to ba0d28e Compare April 21, 2025 14:49
@roiedanino
Copy link
Copy Markdown
Contributor Author

Hi @MamziB can you please take a look?

@roiedanino roiedanino force-pushed the shmem/1.5-support-ucx branch 2 times, most recently from a024387 to a6babaf Compare April 21, 2025 15:14
janjust
janjust previously approved these changes Apr 22, 2025
Comment thread oshmem/mca/spml/ucx/spml_ucx.c Outdated
Comment on lines +1864 to +1870
ucx_new_team->config = calloc(1, sizeof(mca_spml_ucx_team_config_t));

if (config != NULL) {
memcpy(&ucx_new_team->config->super, config, sizeof(shmem_team_config_t));
}

ucx_new_team->config = (mca_spml_ucx_team_config_t*)config;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucx_new_team->config is getting leaked here.

Comment thread oshmem/mca/spml/ucx/spml_ucx.c Outdated
/* In order to simplify pe translations start and stride are calculated with respect to
* world_team */
ucx_new_team = (mca_spml_ucx_team_t *)malloc(sizeof(mca_spml_ucx_team_t));
ucx_new_team->start = parent_start + start;
Copy link
Copy Markdown
Contributor

@MamziB MamziB Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect, we should consider the parent's stride as well when we want to calculate the new start based on the comm world:
ucx_new_team->start = parent_start + (start * parent_stride)

Comment thread oshmem/mca/spml/ucx/spml_ucx.c Outdated
int parent_stride;
int my_pe;

SPML_UCX_ASSERT(((start + size * stride) <= oshmem_num_procs()) && (start < size) && (stride > 0) && (size > 0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size can be any positive value , and start is PE index in the parent team, so start < size is not required to be true.

@MamziB
Copy link
Copy Markdown
Contributor

MamziB commented Apr 23, 2025

looks good now

Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino force-pushed the shmem/1.5-support-ucx branch from c8cce70 to d0a4e07 Compare April 24, 2025 07:31
@roiedanino
Copy link
Copy Markdown
Contributor Author

@jsquyres @janjust, can you please merge this?

@janjust janjust merged commit 1d301f7 into open-mpi:main Apr 24, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants