Skip to content

Fix ReorderShardedAxis and MakeReshardingContiguous for DID loop split. #3900

@wujingyue

Description

@wujingyue

They currently assume DID is only in logical and will break when we switch to loop split.

The solution I have in mind and am open to discussion is:

  1. Change ReorderShardedAxis to move DID to the front in loop. We may even do this universally because many schedulers bear that assumption, but it can come separately.
  2. Change MakeReshardingContiguous to set allocation around each resharding expression (which has been decomposed and therefore can be lowered to communication) to be the same as loop and be contiguous.
  3. Change existing Communication IRs to respect the allocation order. I don't know exactly what needs to be fixed, but I think
    output_tensors[0].push_back(output_tensor.slice(0, j, j + 1));
    should be fixed to not assume the scattered axis is always 0.

That said, even without DID loop split, I think this is a good cleanup for ReorderShardedAxis. For example, it wouldn't need to insert Set.Permute, making fusion IR easier to follow.

For #2563

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions