Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ReorderShardedAxis and MakeReshardingContiguous for DID loop split. #3900

Open
wujingyue opened this issue Feb 15, 2025 · 1 comment
Open

Comments

@wujingyue
Copy link
Collaborator

They currently assume DID is only in logical and will break when we switch to loop split.

The solution I have in mind and am open to discussion is:

  1. Change ReorderShardedAxis to move DID to the front in loop. We may even do this universally because many schedulers bear that assumption, but it can come separately.
  2. Change MakeReshardingContiguous to set allocation around each resharding expression (which has been decomposed and therefore can be lowered to communication) to be the same as loop and be contiguous.
  3. Change existing Communication IRs to respect the allocation order. I don't know exactly what needs to be fixed, but I think
    output_tensors[0].push_back(output_tensor.slice(0, j, j + 1));
    should be fixed to not assume the scattered axis is always 0.

That said, even without DID loop split, I think this is a good cleanup for ReorderShardedAxis. For example, it wouldn't need to insert Set.Permute, making fusion IR easier to follow.

For #2563

@wujingyue
Copy link
Collaborator Author

cc @samnordmann

This is one of the less urgent and more well-defined tasks that could use some help. When you have cycles, feel free to pick it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant