-
Notifications
You must be signed in to change notification settings - Fork 5
[Optimization]: Reduce branching when possible in casting.hpp #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[Optimization]: Reduce branching when possible in casting.hpp #117
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the core casting helpers to reduce branching (especially for GPU code paths) and adjusts saturation behavior for some float→integer conversions, alongside adding a small test and extending supported type traits.
Changes:
- Refactors
ScalarSaturateCast/ScalarRangeCastlogic incasting.hppto use more branchless/min-max based clamping and special-case small integer widths. - Extends type traits support to include
long/ulongvectorized types. - Adds a new C++ test covering basic
SaturateCastbehavior and a few limit/vector cases. - Adjusts the GPU block dimensions for the Composite operator kernel launch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
include/core/detail/casting.hpp |
Refactors saturate/range cast implementations to reduce branching and adjust clamping/rounding logic. |
include/core/detail/type_traits.hpp |
Adds long / ulong to the type-traits macro set. |
tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp |
Introduces a basic unit test for SaturateCast, including a couple of vectorized casts. |
src/op_composite.cpp |
Changes GPU kernel launch block dimensions for the composite operator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if constexpr (sizeof(T) <= 2) { | ||
| // 8/16 bit integer cases. These can be represented exactly in floating point. | ||
| #ifdef __HIP_DEVICE_COMPILE__ | ||
| return static_cast<T>(rintf(fminf(fmaxf(v, minVal), maxVal))); | ||
| #else | ||
| return static_cast<T>(std::round(std::clamp(v, minVal, maxVal))); | ||
| #endif | ||
| } else { | ||
| // 32/64 bit integer cases. | ||
| #ifdef __HIP_DEVICE_COMPILE__ | ||
| U rounded = rintf(v); | ||
| #else | ||
| U rounded = std::round(v); | ||
| #endif |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device-side float->integer paths use float-specific intrinsics (rintf, fminf/fmaxf, __saturatef, __float2int_rn) even though U is any floating-point type. If U is double, this will downcast to float and can change rounding/saturation behavior. Consider either constraining these branches to U == float (static_assert / if constexpr) or adding double-correct implementations (rint, fmin/fmax, __double2int_rn, etc.).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #117 +/- ##
===========================================
- Coverage 73.88% 73.86% -0.02%
===========================================
Files 74 74
Lines 2864 2877 +13
Branches 615 610 -5
===========================================
+ Hits 2116 2125 +9
- Misses 330 331 +1
- Partials 418 421 +3
🚀 New features to boost your workflow:
|
Details
casting.hpp. Aims to reduce divergence on GPU kernel implementations.