Skip to content

Add generate_identity_sequences helper and replace lambdas with named functors#4283

Closed
assistant-librarian[bot] wants to merge 5 commits intodevelopfrom
import/develop/ROCm_composable_kernel/pr-3628
Closed

Add generate_identity_sequences helper and replace lambdas with named functors#4283
assistant-librarian[bot] wants to merge 5 commits intodevelopfrom
import/develop/ROCm_composable_kernel/pr-3628

Conversation

@assistant-librarian
Copy link
Copy Markdown
Contributor

@assistant-librarian assistant-librarian bot commented Feb 3, 2026

Summary

  • Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
  • Replace lambdas with named functors in transform_tensor_descriptor
  • Add unpack_and_merge_sequences helper functor
  • Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in transform_tensor_descriptor creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Changes

Part 1: generate_identity_sequences helper

  • Replaces common lambda pattern for generating identity sequences
  • Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
  • Named helper shares a single type across all uses

Part 2: Named functors in transform_tensor_descriptor

  • Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
  • Use generate_identity_sequences in matrix_padder.hpp

Test Plan

  • Added 7 unit tests:
    • 4 tests for generate_identity_sequences
    • 3 tests for unpack_and_merge_sequences
  • Waiting for full CI

Related PRs

This PR merges the functionality from:

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

Note: This PR supersedes ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged.


🔁 Imported from ROCm/composable_kernel#3628
🧑‍💻 Originally authored by @tenpercent

tenpercent and others added 5 commits January 29, 2026 14:26
Replace inline lambdas with named functor structs in transform_tensor_descriptor
to reduce template instantiation overhead and improve compile times.

Changes:
- Add three named functors in tensor_descriptor.hpp:
  - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID
  - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs
  - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges

- Add utility functions in sequence_helper.hpp and tuple_helper.hpp:
  - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them
  - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...>

- Update 14 call sites across threadwise transfer, wrapper, and device files
  to use generate_identity_sequences() instead of generate_tuple with lambdas

- Add comprehensive unit tests:
  - unit_sequence_helper.cpp: tests for new utility functions
  - unit_tensor_descriptor_functors.cpp: tests for new functors

Co-Authored-By: Claude <noreply@anthropic.com>
@illsilin
Copy link
Copy Markdown
Contributor

Hi @tenpercent, please resolve the merge conflict and run CI.

@tenpercent
Copy link
Copy Markdown
Contributor

Closing in favor of PR #4828 which has a cleaner commit history.

@tenpercent tenpercent closed this Feb 23, 2026
shumway pushed a commit that referenced this pull request Feb 28, 2026
… functors (#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
kokolchin pushed a commit to kokolchin/rocm-libraries that referenced this pull request Mar 4, 2026
… functors (ROCm#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue ROCm#4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes ROCm#4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
NaveenElumalaiAMD pushed a commit that referenced this pull request Mar 6, 2026
… functors (#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jovanau pushed a commit to jovanau/rocm-libraries that referenced this pull request Mar 19, 2026
… functors (ROCm#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue ROCm#4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes ROCm#4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
johannes-graner pushed a commit that referenced this pull request Mar 20, 2026
… functors (#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants