Conversation
a058c59 to
11d1d3b
Compare
… functors
## Summary
- Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction)
## Motivation
Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations.
Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.
## Changes
### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate template instantiations at every call site
- Named helper shares a single type across all uses
### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`
## Test Plan
- Added 7 unit tests:
- 4 tests for `generate_identity_sequences` in unit_sequence_helper.cpp
- 3 tests for `unpack_and_merge_sequences` in unit_tensor_descriptor_functors.cpp
- Added unit_ford test from develop to test/util/CMakeLists.txt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
11d1d3b to
776da28
Compare
2 tasks
assistant-librarian bot
pushed a commit
to ROCm/composable_kernel
that referenced
this pull request
Feb 28, 2026
Add generate_identity_sequences helper and replace lambdas
with named functors (#4828)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
- Add `generate_identity_sequences<N>()` helper that returns
`Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>`
- Replace lambdas with named functors in `transform_tensor_descriptor`
- Add `unpack_and_merge_sequences` helper functor
- Reduces `transform_tensor_descriptor` instantiations from 388 to 32
(92% reduction)
## Motivation
Multiple call sites use `generate_tuple([](auto i) { return
Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda
instantiations.
Additionally, each lambda in `transform_tensor_descriptor` creates a
unique closure type, causing the function to be instantiated separately
for every call site. Named functors share a single type, so the compiler
reuses the same instantiation.
## Changes
### Part 1: generate_identity_sequences helper
- Replaces common lambda pattern for generating identity sequences
- Each lambda expression creates a unique closure type, causing separate
template instantiations at every call site
- Named helper shares a single type across all uses
### Part 2: Named functors in transform_tensor_descriptor
- Add `unpack_and_merge_sequences` helper to replace lambda in
`GetNumOfHiddenDimension`
- Use `generate_identity_sequences` in `matrix_padder.hpp`
## Test Plan
- [x] Added 7 unit tests:
- 4 tests for `generate_identity_sequences`
- 3 tests for `unpack_and_merge_sequences`
- [ ] Waiting for full CI
## Related PRs
This PR merges the functionality from:
- #3588 (generate_identity_sequences helper)
- #3589 (Named functors in
transform_tensor_descriptor)
Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)
**Note:** This PR supersedes #4283, #3588 and
#3589, which can be closed once this is merged.
cgmillette
added a commit
that referenced
this pull request
Mar 2, 2026
…nsfer Resolve conflicts in 7 threadwise transfer headers by keeping our refactored versions (code was extracted into the shared util helper). Propagate generate_identity_sequences optimization from develop (#4828) into threadwise_tensor_slice_transfer_util.hpp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kokolchin
pushed a commit
to kokolchin/rocm-libraries
that referenced
this pull request
Mar 4, 2026
… functors (ROCm#4828) ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - ROCm/composable_kernel#3588 (generate_identity_sequences helper) - ROCm/composable_kernel#3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue ROCm#4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes ROCm#4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged. --- 🔁 Imported from [ROCm/composable_kernel#3628](ROCm/composable_kernel#3628) 🧑💻 Originally authored by @tenpercent Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
NaveenElumalaiAMD
pushed a commit
that referenced
this pull request
Mar 6, 2026
… functors (#4828) ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - ROCm/composable_kernel#3588 (generate_identity_sequences helper) - ROCm/composable_kernel#3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged. --- 🔁 Imported from [ROCm/composable_kernel#3628](ROCm/composable_kernel#3628) 🧑💻 Originally authored by @tenpercent Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jovanau
pushed a commit
to jovanau/rocm-libraries
that referenced
this pull request
Mar 19, 2026
… functors (ROCm#4828) ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - ROCm/composable_kernel#3588 (generate_identity_sequences helper) - ROCm/composable_kernel#3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue ROCm#4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes ROCm#4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged. --- 🔁 Imported from [ROCm/composable_kernel#3628](ROCm/composable_kernel#3628) 🧑💻 Originally authored by @tenpercent Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
johannes-graner
pushed a commit
that referenced
this pull request
Mar 20, 2026
… functors (#4828) ## Summary - Add `generate_identity_sequences<N>()` helper that returns `Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>` - Replace lambdas with named functors in `transform_tensor_descriptor` - Add `unpack_and_merge_sequences` helper functor - Reduces `transform_tensor_descriptor` instantiations from 388 to 32 (92% reduction) ## Motivation Multiple call sites use `generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})` pattern. A named helper reduces lambda instantiations. Additionally, each lambda in `transform_tensor_descriptor` creates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation. ## Changes ### Part 1: generate_identity_sequences helper - Replaces common lambda pattern for generating identity sequences - Each lambda expression creates a unique closure type, causing separate template instantiations at every call site - Named helper shares a single type across all uses ### Part 2: Named functors in transform_tensor_descriptor - Add `unpack_and_merge_sequences` helper to replace lambda in `GetNumOfHiddenDimension` - Use `generate_identity_sequences` in `matrix_padder.hpp` ## Test Plan - [x] Added 7 unit tests: - 4 tests for `generate_identity_sequences` - 3 tests for `unpack_and_merge_sequences` - [ ] Waiting for full CI ## Related PRs This PR merges the functionality from: - ROCm/composable_kernel#3588 (generate_identity_sequences helper) - ROCm/composable_kernel#3589 (Named functors in transform_tensor_descriptor) Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times) **Note:** This PR supersedes #4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged. --- 🔁 Imported from [ROCm/composable_kernel#3628](ROCm/composable_kernel#3628) 🧑💻 Originally authored by @tenpercent Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
generate_identity_sequences<N>()helper that returnsTuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>transform_tensor_descriptorunpack_and_merge_sequenceshelper functortransform_tensor_descriptorinstantiations from 388 to 32 (92% reduction)Motivation
Multiple call sites use
generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})pattern. A named helper reduces lambda instantiations.Additionally, each lambda in
transform_tensor_descriptorcreates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.Changes
Part 1: generate_identity_sequences helper
Part 2: Named functors in transform_tensor_descriptor
unpack_and_merge_sequenceshelper to replace lambda inGetNumOfHiddenDimensiongenerate_identity_sequencesinmatrix_padder.hppTest Plan
generate_identity_sequencesunpack_and_merge_sequencesRelated PRs
This PR merges the functionality from:
Part of PR stack for issue #4229 (Reduce CK/CKTile Build Times)
Note: This PR supersedes #4283, ROCm/composable_kernel#3588 and ROCm/composable_kernel#3589, which can be closed once this is merged.
🔁 Imported from ROCm/composable_kernel#3628
🧑💻 Originally authored by @tenpercent