Skip to content

Add generate_identity_sequences helper and replace lambdas with named functors#4828

Merged
shumway merged 2 commits intodevelopfrom
users/tenpercent/ck/reduce-template-instantiations
Feb 28, 2026
Merged

Add generate_identity_sequences helper and replace lambdas with named functors#4828
shumway merged 2 commits intodevelopfrom
users/tenpercent/ck/reduce-template-instantiations

Conversation

@tenpercent
Copy link
Copy Markdown
Contributor

Summary

  • Add generate_identity_sequences<N>() helper that returns Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>
  • Replace lambdas with named functors in transform_tensor_descriptor
  • Add unpack_and_merge_sequences helper functor
  • Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Motivation

Multiple call sites use generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{}) pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in transform_tensor_descriptor creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Changes

Part 1: generate_identity_sequences helper

  • Replaces common lambda pattern for generating identity sequences
  • Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
  • Named helper shares a single type across all uses

Part 2: Named functors in transform_tensor_descriptor

  • Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
  • Use generate_identity_sequences in matrix_padder.hpp

Test Plan

  • Added 7 unit tests:
    • 4 tests for generate_identity_sequences
    • 3 tests for unpack_and_merge_sequences
  • Waiting for full CI

Related PRs

This PR merges the functionality from:

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

Note: This PR supersedes #4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged.


🔁 Imported from ROCm/composable_kernel#3628
🧑‍💻 Originally authored by @tenpercent

@tenpercent tenpercent requested a review from a team as a code owner February 23, 2026 23:26
@tenpercent tenpercent requested review from a team as code owners February 23, 2026 23:38
@github-actions github-actions bot added External CI github actions project: none Does not target any component labels Feb 23, 2026
@tenpercent tenpercent force-pushed the users/tenpercent/ck/reduce-template-instantiations branch from a058c59 to 11d1d3b Compare February 23, 2026 23:42
… functors

## Summary

- Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- Added 7 unit tests:
  - 4 tests for `generate_identity_sequences` in unit_sequence_helper.cpp
  - 3 tests for `unpack_and_merge_sequences` in unit_tensor_descriptor_functors.cpp
- Added unit_ford test from develop to test/util/CMakeLists.txt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tenpercent tenpercent force-pushed the users/tenpercent/ck/reduce-template-instantiations branch from 11d1d3b to 776da28 Compare February 23, 2026 23:44
@tenpercent tenpercent removed request for a team February 23, 2026 23:47
@tenpercent tenpercent removed the request for review from a team February 23, 2026 23:47
Copy link
Copy Markdown
Contributor

@cgmillette cgmillette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shumway shumway merged commit 7de19bb into develop Feb 28, 2026
32 checks passed
@shumway shumway deleted the users/tenpercent/ck/reduce-template-instantiations branch February 28, 2026 20:10
assistant-librarian bot pushed a commit to ROCm/composable_kernel that referenced this pull request Feb 28, 2026
Add generate_identity_sequences helper and replace lambdas
 with named functors (#4828)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- #3588 (generate_identity_sequences helper)
- #3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, #3588 and
#3589, which can be closed once this is merged.
cgmillette added a commit that referenced this pull request Mar 2, 2026
…nsfer

Resolve conflicts in 7 threadwise transfer headers by keeping our
refactored versions (code was extracted into the shared util helper).
Propagate generate_identity_sequences optimization from develop (#4828)
into threadwise_tensor_slice_transfer_util.hpp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kokolchin pushed a commit to kokolchin/rocm-libraries that referenced this pull request Mar 4, 2026
… functors (ROCm#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue ROCm#4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes ROCm#4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
NaveenElumalaiAMD pushed a commit that referenced this pull request Mar 6, 2026
… functors (#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jovanau pushed a commit to jovanau/rocm-libraries that referenced this pull request Mar 19, 2026
… functors (ROCm#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue ROCm#4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes ROCm#4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
johannes-graner pushed a commit that referenced this pull request Mar 20, 2026
… functors (#4828)

## Summary

- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)

## Motivation

Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.

Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.

## Changes

### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses

### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`

## Test Plan

- [x] Added 7 unit tests:
  - 4 tests for `generate_identity_sequences`
  - 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI

## Related PRs

This PR merges the functionality from:
- ROCm/composable_kernel#3588 (generate_identity_sequences helper)
- ROCm/composable_kernel#3589 (Named functors in
transform_tensor_descriptor)

Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)

**Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and
ROCm/composable_kernel#3589, which can be closed once this is merged.

---
🔁 Imported from
[ROCm/composable_kernel#3628](ROCm/composable_kernel#3628)
🧑‍💻 Originally authored by @tenpercent

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants