New morton class with arithmetic and comparison operators #860

Fletterio · 2025-03-27T21:47:27Z

Description

Adds a new class for 2,3 and 4-dimensional morton codes, with arithmetic and comparison operators

Testing

TODO

TODO list:

Need to make sure all operators work properly before merging

… on HLSL side by specializing , a bunch of morton operators

…or both cpp and hlsl

include/nbl/builtin/hlsl/math/morton.hlsl

… bunch of operators and functional structs for vectorial types

include/nbl/builtin/hlsl/cpp_compat/basic.h

devshgraphicsprogramming · 2025-04-16T08:57:24Z

include/nbl/builtin/hlsl/cpp_compat/intrinsics.hlsl

+template<typename Condition, typename ResultType>
+NBL_CONSTEXPR_INLINE_FUNC ResultType select(Condition condition, ResultType object1, ResultType object2)
+{
+	return cpp_compat_intrinsics_impl::select_helper<Condition, ResultType>::__call(condition, object1, object2);
+}
+


but we already have mix in the #811 branch, shall we just make select(C,T,F) {return mix_helper<ResultType,Condition>::__call(F,T,C)} in the future

I can drop this and just use mix, yeah

ok the thing is that select can either:

-take a bool and return just one of the objects entirely

-take a vector<bool, N> and return a mix of each object (provided the objects are vectors as well)

so the latter does exactly the same as mix, but it also can act as the usual ternary ?

aaah ok mix doesn't work on structs, coordinate with @Przemog1 and @keptsecret later, so then mix would need to be done in terms of select_helper

include/nbl/builtin/hlsl/emulated/vector_t.hlsl

devshgraphicsprogramming · 2025-04-16T09:41:52Z

include/nbl/builtin/hlsl/emulated/vector_t.hlsl

+    #define NBL_EMULATED_VECTOR_OPERATOR(OP, ENABLE_CONDITION) NBL_CONSTEXPR_INLINE_FUNC enable_if_t< ENABLE_CONDITION , this_t> operator##OP (component_t val)\
+    {\
+        this_t output;\
+        [[unroll]]\
+        for (uint32_t i = 0u; i < CRTP::Dimension; ++i)\
+            output.setComponent(i, CRTP::getComponent(i) OP val);\
+        return output;\
+    }\
+    NBL_CONSTEXPR_INLINE_FUNC enable_if_t< ENABLE_CONDITION , this_t> operator##OP (this_t other)\


enable_if_t does not work if there's no "extra" unresolved/deducible template parameter
https://godbolt.org/z/h3rjcbdxd

and since templated operators are busted in DXC, you need to basically add bool IsComponentTypeIntegral in the style of bool IsComponentTypeFundamental and make 4 partial specializations (on/off for each bool)

devshgraphicsprogramming · 2025-04-16T09:46:57Z

include/nbl/builtin/hlsl/emulated/vector_t.hlsl

@@ -428,7 +478,7 @@ namespace impl
 template<typename To, typename From>
 struct static_cast_helper<emulated_vector_t2<To>, vector<From, 2>, void>
 {
-    static inline emulated_vector_t2<To> cast(vector<From, 2> vec)
+    NBL_CONSTEXPR_STATIC_INLINE emulated_vector_t2<To> cast(vector<From, 2> vec)


shouldn't be NBL_CONSTEXPR_STATIC_INLINE but NBL_CONSTEXPR_STATIC_FUNC or NBL_CONSTEXPR_STATIC_METHOD

devshgraphicsprogramming · 2025-04-16T09:55:49Z

include/nbl/builtin/hlsl/emulated/int64_t.hlsl

@@ -132,15 +130,19 @@ struct emulated_int64_base
    {
        // Either the topmost bits, when interpreted with correct sign, are less than those of `rhs`, or they're equal and the lower bits are less
        // (lower bits are always positive in both unsigned and 2's complement so comparison can happen as-is)
+        const bool MSBEqual = __getMSB() == rhs.__getMSB();
        const bool MSB = Signed ? (_static_cast<int32_t>(__getMSB()) < _static_cast<int32_t>(rhs.__getMSB())) : (__getMSB() < rhs.__getMSB());


probably want bit_cast instead of _static_cast

include/nbl/builtin/hlsl/morton.hlsl

devshgraphicsprogramming · 2025-04-16T10:38:01Z

include/nbl/builtin/hlsl/morton.hlsl

+    */
+    NBL_CONSTEXPR_STATIC_INLINE_FUNC portable_vector_t<encode_t, Dim> interleaveShift(NBL_CONST_REF_ARG(decode_t) decodedValue)
+    {
+        NBL_CONSTEXPR_STATIC encode_t EncodeMasks[CodingStages + 1] = { _static_cast<encode_t>(coding_mask_v<Dim, Bits, 0>), _static_cast<encode_t>(coding_mask_v<Dim, Bits, 1>), _static_cast<encode_t>(coding_mask_v<Dim, Bits, 2>) , _static_cast<encode_t>(coding_mask_v<Dim, Bits, 3>) , _static_cast<encode_t>(coding_mask_v<Dim, Bits, 4>) , _static_cast<encode_t>(coding_mask_v<Dim, Bits, 5>) };


was that really less effort to type out than macro your loop body and "hand unroll" it ?

also you're getting screwed over by DXC, see https://github.com/Devsh-Graphics-Programming/Nabla/pull/860/files#r2046653961

devshgraphicsprogramming · 2025-04-16T10:38:56Z

include/nbl/builtin/hlsl/morton.hlsl

+    {
+        NBL_CONSTEXPR_STATIC encode_t EncodeMasks[CodingStages + 1] = { _static_cast<encode_t>(coding_mask_v<Dim, Bits, 0>), _static_cast<encode_t>(coding_mask_v<Dim, Bits, 1>), _static_cast<encode_t>(coding_mask_v<Dim, Bits, 2>) , _static_cast<encode_t>(coding_mask_v<Dim, Bits, 3>) , _static_cast<encode_t>(coding_mask_v<Dim, Bits, 4>) , _static_cast<encode_t>(coding_mask_v<Dim, Bits, 5>) };
+        left_shift_operator<portable_vector_t<encode_t, Dim> > leftShift;
+        portable_vector_t<encode_t, Dim> interleaved = _static_cast<portable_vector_t<encode_t, Dim> >(decodedValue)& EncodeMasks[CodingStages];


AFAIK we don't use static_cast to widen or truncate our scalars and vectors, use and specialize promote/truncate instead

making a truncate then

devshgraphicsprogramming · 2025-04-16T10:42:06Z

include/nbl/builtin/hlsl/morton.hlsl

+        NBL_CONSTEXPR_STATIC uint16_t Stages = mpl::log2_ceil_v<Bits>;
+        [[unroll]]
+        for (uint16_t i = Stages; i > 0; i--)


this loop will never unroll, static const will never be constexpr as a plain variable in a function, @keptsecret got screwed over by this in the HLSL Path Tracer!

Better unroll by hand, or use mpl::log2_ceil_v<Bits> directly as an initializer to uint16_t i=0

devshgraphicsprogramming · 2025-04-16T10:47:53Z

include/nbl/builtin/hlsl/morton.hlsl

+struct MortonEncoder
+{
+    template<typename decode_t = conditional_t<(Bits > 16), vector<uint32_t, Dim>, vector<uint16_t, Dim> >
+    NBL_FUNC_REQUIRES(concepts::IntVector<decode_t> && 8 * sizeof(typename vector_traits<decode_t>::scalar_type) >= Bits)


its actually >Bits+Dim not >=Bits because you will be left shifting the components, and last will have its MSB at Bits+Dim-1

But this is for the decode_t, which immediately gets transformed to a vector of encode_t which does have enough Bits to hold the interleaved and shifted coordinates

Idk what I was thinking when I wrote this tbh, maybe the check should be the other way around?

8 * sizeof(typename vector_traits<decode_t>::scalar_type) <= max(Bits, 16)

to ensure you don't get an implicit truncation

or just drop that altogether idk

devshgraphicsprogramming · 2025-04-16T10:50:44Z

include/nbl/builtin/hlsl/morton.hlsl

+        encode_t encoded = _static_cast<encode_t>(uint64_t(0));
+        array_get<portable_vector_t<encode_t, Dim>, encode_t> getter;
+        [[unroll]]
+        for (uint16_t i = 0; i < Dim; i++)
+            encoded = encoded | getter(interleaveShifted, i);


I wouldn't count on compiler noticing that |0 is identity and can be optimzed out for emulated_uint64_t, so do

encode_t ecnoded = getter(interleaveShifted,0); [[unroll]] for (uint32_t i=1; i<Dim; i++) encoded = encoded | getter(interleaveShifted,i);

devshgraphicsprogramming · 2025-04-16T10:52:33Z

include/nbl/builtin/hlsl/morton.hlsl

+// ----------------------------------------------------------------- MORTON ENCODER ---------------------------------------------------
+
+template<uint16_t Dim, uint16_t Bits, typename encode_t NBL_PRIMARY_REQUIRES(Dimension<Dim> && Dim * Bits <= 64 && 8 * sizeof(encode_t) == mpl::round_up_to_pot_v<Dim * Bits>)
+struct MortonEncoder


morton::impl::Morton,, too many mortons

devshgraphicsprogramming · 2025-04-16T10:53:48Z

include/nbl/builtin/hlsl/morton.hlsl

+};
+
+// ----------------------------------------------------------------- MORTON DECODER ---------------------------------------------------
+
+template<uint16_t Dim, uint16_t Bits, typename encode_t NBL_PRIMARY_REQUIRES(Dimension<Dim> && Dim * Bits <= 64 && 8 * sizeof(encode_t) == mpl::round_up_to_pot_v<Dim * Bits>)
+struct MortonDecoder
+{


why not merge Decoder and Encoder into a single Transcoder ?

devshgraphicsprogramming · 2025-04-16T10:54:19Z

include/nbl/builtin/hlsl/morton.hlsl

+struct MortonDecoder
+{
+    template<typename decode_t = conditional_t<(Bits > 16), vector<uint32_t, Dim>, vector<uint16_t, Dim> >
+    NBL_FUNC_REQUIRES(concepts::IntVector<decode_t> && 8 * sizeof(typename vector_traits<decode_t>::scalar_type) >= Bits)


same thing with >=Bits needing to be > Bits+Dim

actually same comments as for the interleaveShift function

devshgraphicsprogramming · 2025-04-16T10:57:54Z

include/nbl/builtin/hlsl/morton.hlsl

+            setter(decoded, i, encodedValue);
+        decoded = rightShift(decoded, _static_cast<vector<uint32_t, Dim> >(vector<uint32_t, 4>(0, 1, 2, 3)));


could just write setter(decoded, i, encodedValue>>i);

devshgraphicsprogramming · 2025-04-16T11:10:35Z

include/nbl/builtin/hlsl/morton.hlsl

+NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(2, 0x5555555555555555)        // Groups bits by 1  on, 1  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 1, uint64_t(0x3333333333333333)) // Groups bits by 2  on, 2  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 2, uint64_t(0x0F0F0F0F0F0F0F0F)) // Groups bits by 4  on, 4  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 3, uint64_t(0x00FF00FF00FF00FF)) // Groups bits by 8  on, 8  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(2, 4, uint64_t(0x0000FFFF0000FFFF)) // Groups bits by 16 on, 16 off
+
+NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(3, 0x9249249249249249)        // Groups bits by 1  on, 2  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 1, uint64_t(0x30C30C30C30C30C3)) // Groups bits by 2  on, 4  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 2, uint64_t(0xF00F00F00F00F00F)) // Groups bits by 4  on, 8  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 3, uint64_t(0x00FF0000FF0000FF)) // Groups bits by 8  on, 16 off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(3, 4, uint64_t(0xFFFF00000000FFFF)) // Groups bits by 16 on, 32 off
+
+NBL_HLSL_MORTON_SPECIALIZE_FIRST_CODING_MASK(4, 0x1111111111111111)        // Groups bits by 1  on, 3  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 1, uint64_t(0x0303030303030303)) // Groups bits by 2  on, 6  off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 2, uint64_t(0x000F000F000F000F)) // Groups bits by 4  on, 12 off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 3, uint64_t(0x000000FF000000FF)) // Groups bits by 8  on, 24 off
+NBL_HLSL_MORTON_SPECIALIZE_CODING_MASK(4, 4, uint64_t(0x000000000000FFFF)) // Groups bits by 16 on, 48 off (unused but here for completion + likely keeps compiler from complaining)


ull sufficies on the mask literals please

devshgraphicsprogramming · 2025-04-16T11:15:21Z

include/nbl/builtin/hlsl/morton.hlsl

+        // If `Bits` is greater than half the bitwidth of the decode type, then we can avoid `&`ing against the last mask since duplicated MSB get truncated
+        NBL_IF_CONSTEXPR(Bits > 4 * sizeof(typename vector_traits<decode_t>::scalar_type))


I think that > should be a >= because if you have 16 bit morton (e.g. dim=2 stored in a uint32_t) getting decoded into a vector of uint16_t you'll have a shift by 8 in the final coding round

But the comparison is against half the bitwidth. For example if decoding to a vector of uint16_t this decision is made based on whether we have more than 8 bits.

For example if you have exactly 8 bits the last shift is by 4. Ignore hex, let's just say the encoded number is ABCDEFGH where each letter is just representing a binary value. Then in the last round you'll have decoded = 0000ABCD0000EFGH, decoded >> 4 = 00000000ABCD0000 (need 16 bits to hold two 8-bit Mortons) and the | between these looks like 0000ABCDABCDEFGH. Here to get the correct value I do need to mask off the highest 8 bits.

Now say you have more than half the bitwidth of the decode type. For example a 9bit Morton ABCDEFGHI being decoded to a uint16_t. Here since we have more than 8 bits, the last round is a shift by 8 so the spacing between bits is also 8, so decoded will look like decoded = 000000000000000A00000000BCDEFGHI (now need 32 bits to hold two 9-bit mortons) and decoded >> 8 = 00000000000000000000000A00000000 so the | between them returns 000000000000000A0000000ABCDEFGHI. Now there's no need to mask, since taking only the lowest 16bits correctly yields 0000000ABCDEFGHI (same holds for any value from 10 to 16bits)

devshgraphicsprogramming · 2025-04-16T11:18:47Z

include/nbl/builtin/hlsl/morton.hlsl

+    template<typename I NBL_FUNC_REQUIRES(Comparable<Signed, Bits, storage_t, true, I>)
+    NBL_CONSTEXPR_STATIC_INLINE_FUNC vector<bool, D> __call(NBL_CONST_REF_ARG(storage_t) value, NBL_CONST_REF_ARG(portable_vector_t<I, D>) rhs)
+    {
+        NBL_CONSTEXPR portable_vector_t<storage_t, D> zeros = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(0,0,0,0)));


again, this will create a hidden variable with an initializer

you literally have to use a temporary to compare against

or declare the variable as a plain const, not a static const

wait what does static const do vs using just const

read the discord thread

devshgraphicsprogramming · 2025-04-16T11:20:47Z

include/nbl/builtin/hlsl/morton.hlsl

+        NBL_CONSTEXPR_STATIC portable_vector_t<storage_t, D> InterleaveMasks = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(coding_mask_v<D, Bits, 0>, coding_mask_v<D, Bits, 0> << 1, coding_mask_v<D, Bits, 0> << 2, coding_mask_v<D, Bits, 0> << 3)));
+        NBL_CONSTEXPR_STATIC portable_vector_t<storage_t, D> SignMasks = _static_cast<portable_vector_t<storage_t, D> >(_static_cast<vector<uint64_t, D> >(vector<uint64_t, 4>(SignMask<Bits, D>, SignMask<Bits, D> << 1, SignMask<Bits, D> << 2, SignMask<Bits, D> << 3)));


again a plain const is okay, static const is not

also is there a pretier way to write this ? or at least format (ever component new line?)

devshgraphicsprogramming · 2025-04-16T11:25:57Z

include/nbl/builtin/hlsl/morton.hlsl

+        // Obtain a vector of deinterleaved coordinates and flip their sign bits
+        const portable_vector_t<storage_t, D> thisCoord = (InterleaveMasks & value) ^ SignMasks;
+        // rhs already deinterleaved, just have to cast type and flip sign
+        const portable_vector_t<storage_t, D> rhsCoord = _static_cast<portable_vector_t<storage_t, D> >(rhs) ^ SignMasks;


why are you always flipping signs, regardless of Signed ?

devshgraphicsprogramming · 2025-04-16T11:55:03Z

include/nbl/builtin/hlsl/morton.hlsl

+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> equals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::Equals<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }  
+
+    NBL_CONSTEXPR_INLINE_FUNC bool operator!=(NBL_CONST_REF_ARG(this_t) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return value != rhs.value;
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> notEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return !equals<BitsAlreadySpread, I>(rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> less(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::LessThan<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> lessEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::LessEquals<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> greater(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC
+    {
+        return impl::GreaterThan<Signed, Bits, D, storage_t, BitsAlreadySpread>::__call(value, rhs);
+    }
+
+    template<bool BitsAlreadySpread, typename I
+    NBL_FUNC_REQUIRES(impl::Comparable<Signed, Bits, storage_t, BitsAlreadySpread, I>)
+    NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> greaterEquals(NBL_CONST_REF_ARG(vector<I, D>) rhs) NBL_CONST_MEMBER_FUNC


spelling nitpick, those functions are usually called equal without an s at the end
https://registry.khronos.org/OpenGL-Refpages/gl4/html/equal.xhtml

`NBL_CONSTEXPR_FUNC` Adds `OpUndef` to spirv `intrinsics.hlsl` and `cpp_compat.hlsl` Adds an explicit `truncate` function for vectors and emulated vectors Adds a bunch of specializations for vectorial types in `functional.hlsl` Bugfixes and changes to Morton codes, very close to them working properly with emulated ints

Fletterio added 15 commits March 21, 2025 15:48

Initial commit

cdcc9ad

Merge branch 'concepts_fix' into mortons

8e84558

Merge branch 'concepts_fix' into mortons

d33fab5

CHeckpoint before master merge

5fe6c08

Checkpoint before merging new type_traits change

f18b2fa

Merge branch 'master' into mortons

7d86cba

Works, but throws DXC warning

4ebc555

Added concept for valid morton dimensions

55a2ef6

Creation from vector working as intended

f516256

Added some extra macro specifiers, vector truncation with no warnings…

534d81b

… on HLSL side by specializing , a bunch of morton operators

Add safe copile-time vector truncation and some function specifiers f…

6256390

…or both cpp and hlsl

Morton class done!

246cefc

Remove some leftover commented code

1c7f791

Remove leaking macro

5088799

Bugfixes with arithmetic

e25a35c

devshgraphicsprogramming reviewed Mar 29, 2025

View reviewed changes

include/nbl/builtin/hlsl/math/morton.hlsl Outdated Show resolved Hide resolved