Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New morton class with arithmetic and comparison operators #860

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

Fletterio
Copy link
Contributor

Description

Adds a new class for 2,3 and 4-dimensional morton codes, with arithmetic and comparison operators

Testing

TODO

TODO list:

Need to make sure all operators work properly before merging

Comment on lines 101 to 108
/**
* @brief Decodes this Morton code back to a set of cartesian coordinates
*/
explicit operator vector<I, D>() const noexcept
{
// Definition below, we override `impl::static_cast_helper` to have this conversion in both CPP/HLSL
return _static_cast<vector<I, D>, this_t>(*this);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the specialization of impl::_static_cast_helper appears AFTER this operator, which will angry the compiler.

you need to forward declare it here and define it out of line after the specialization of impl::_static_cast_helper

Comment on lines 252 to 263
NBL_CONSTEXPR_INLINE_FUNC bool coordEquals(NBL_CONST_REF_ARG(this_t) rhs, uint16_t coord) NBL_CONST_MEMBER_FUNC
{
NBL_CONSTEXPR_STATIC vector<U, D> Masks = NBL_HLSL_MORTON_MASKS(U, D);
return (value & Masks[coord]) == (rhs.value & Masks[coord]);
}

NBL_CONSTEXPR_INLINE_FUNC vector<bool, D> operator==(NBL_CONST_REF_ARG(this_t) rhs) NBL_CONST_MEMBER_FUNC
{
vector<bool, D> retVal;
[[unroll]]
for (uint16_t coord = 0; coord < D; coord++)
retVal[coord] = coordEquals(rhs, coord);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its faster done with a single XOR then masking the result (1 xor, 3 ands, 3 cast to bool)

Comment on lines +186 to +193
template<uint16_t Dim, uint16_t Bits, typename encode_t>
struct MortonDecoder;

template<uint16_t Bits, typename encode_t>
struct MortonDecoder<2, Bits, encode_t>
{
template<typename decode_t>
NBL_CONSTEXPR_STATIC_INLINE_FUNC decode_t decode(NBL_CONST_REF_ARG(encode_t) encodedValue)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, you may:

  1. check that the encode_t is no larger than it needs to be (sizeof(encode_t)*8 is the smallest PoT larger or equal to Dim*Bits)
  2. check that decode_t is large enough to hold Bits of output and even deduce its vector type
  3. you're not pre-shifting the encodedValue by i in each channel encodedValue[i] = rightShift(encodedValue[i],i)
  4. there's probably no need for a partial specialization for every dimension (doable in unspecialized/generic)

}
NBL_IF_CONSTEXPR(Bits > 16)
{
decoded = (decoded | rightShift(decoded, 16)) & morton_mask_2_5_v<encode_t>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you ensure that sizeof(decode_t)*8 <= Bits you don't need to perform the last bitwise-and, because the decoded type will never be more than half the original encoded type and the MSB will just truncate away during a cast

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be true for all dimensions in the PoT method

Comment on lines +283 to +284
template<bool Signed, uint16_t Bits, uint16_t D, typename storage_t, bool BitsAlreadySpread>
struct Equals;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not deduce storage_t from D*Bits ?

Comment on lines 190 to 228
const vector<uint32_t, 2> retValData = { (operand.data.x << _bits) | (operand.data.y >> shift), operand.data.y << _bits };
return type_t::create(retValData);
}
};

template<>
struct arithmetic_right_shift_operator<emulated_uint64_t>
{
using type_t = emulated_uint64_t;
NBL_CONSTEXPR_STATIC uint32_t ComponentBitWidth = uint32_t(8 * sizeof(uint32_t));

NBL_CONSTEXPR_INLINE_FUNC type_t operator()(NBL_CONST_REF_ARG(type_t) operand, uint16_t bits)
{
if (!bits)
return operand;
const uint32_t _bits = uint32_t(bits);
const uint32_t shift = ComponentBitWidth - _bits;
// We need the `y` component of the vector (which represents the lower bits of the emulated uint64) to get the `bits` lower bits of the `x` component
const vector<uint32_t, 2> retValData = { operand.data.x >> _bits, (operand.data.x << shift) | (operand.data.y >> _bits) };
return emulated_uint64_t::create(retValData);
}
};

template<>
struct arithmetic_right_shift_operator<emulated_int64_t>
{
using type_t = emulated_int64_t;
NBL_CONSTEXPR_STATIC uint32_t ComponentBitWidth = uint32_t(8 * sizeof(uint32_t));

NBL_CONSTEXPR_INLINE_FUNC type_t operator()(NBL_CONST_REF_ARG(type_t) operand, uint16_t bits)
{
if (!bits)
return operand;
const uint32_t _bits = uint32_t(bits);
const uint32_t shift = ComponentBitWidth - _bits;
// We need the `y` component of the vector (which represents the lower bits of the emulated uint64) to get the `bits` lower bits of the `x` component
// Also the right shift *only* in the top bits happens as a signed arithmetic right shift
const vector<uint32_t, 2> retValData = { _static_cast<uint32_t>(_static_cast<int32_t>(operand.data.x)) >> _bits, (operand.data.x << shift) | (operand.data.y >> _bits) };
return emulated_int64_t::create(retValData);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if you can use OpSelect between operand and the return value to avoid a branch?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mix() works both in C++ and HLSL and uses OpSelect in HLSL

Comment on lines +422 to +426
storage_t encodedCartesian = _static_cast<storage_t>(uint64_t(0));
[[unroll]]
for (uint16_t i = 0; i < D; i++)
{
encodedCartesian = encodedCartesian | leftShift(impl::MortonEncoder<D, Bits, storage_t>::encode(_static_cast<U>(cartesian[i])), i);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the impl::Encoder and Decoder can work in parallel on a vector (gives compiler more chance to overlap ALU ops to eliminate dependencies - channels have none until the end)

template<typename I>
constexpr inline explicit operator vector<I, D>() const noexcept
{
return _static_cast<vector<I, D>, morton::code<is_signed_v<I>, Bits, D>>(*this);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you can't define a function that calls _static_cast (which causes _static_cast_helper to get instantiated) before you specialize the _static_cast_helper #860 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC supports it, is it a problem on gcc/clang? I couldn't get it to compile when trying to define it out of line after the specialization of the helper

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to define operator vector<I,D>() const out of line, maybe you placed constexpr and inline twice

CC: @AnastaZIuk link your triple compiler godbolt for testing such things

Comment on lines +505 to +508
allOnes.value = leftShift(_static_cast<storage_t>(1), D) - _static_cast<storage_t>(1);
// Using 2's complement property that arithmetic negation can be obtained by bitwise negation then adding 1
this_signed_t retVal;
retVal.value = (operator~() + allOnes).value;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not do

this_t zero;
zero.value = _static_cast<storage_t>(0);
#ifndef __HLSL_VERSION
return zero-*this;
#else
return zero-this;
#endif

Comment on lines +558 to +596
enable_if_t<(is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread), vector<bool, D> > operator==(NBL_CONST_REF_ARG(vector<I, D>) rhs)
{
impl::Equals<Signed, Bits, D, storage_t, BitsAlreadySpread> equals;
return equals(value, rhs);
}

NBL_CONSTEXPR_INLINE_FUNC bool operator!=(NBL_CONST_REF_ARG(this_t) rhs) NBL_CONST_MEMBER_FUNC
{
return value != rhs.value;
}

template<bool BitsAlreadySpread, typename I>
enable_if_t<(is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread), vector<bool, D> > operator!=(NBL_CONST_REF_ARG(vector<I, D>) rhs)
{
return !operator== <BitsAlreadySpread, I>(rhs);
}

template<bool BitsAlreadySpread, typename I>
enable_if_t<(is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread), vector<bool, D> > operator<(NBL_CONST_REF_ARG(vector<I, D>) rhs)
{
impl::LessThan<Signed, Bits, D, storage_t, BitsAlreadySpread> lessThan;
return lessThan(value, rhs);
}

template<bool BitsAlreadySpread, typename I>
enable_if_t<(is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread), vector<bool, D> > operator<=(NBL_CONST_REF_ARG(vector<I, D>) rhs)
{
impl::LessEquals<Signed, Bits, D, storage_t, BitsAlreadySpread> lessEquals;
return lessEquals(value, rhs);
}

template<bool BitsAlreadySpread, typename I>
enable_if_t<(is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread), vector<bool, D> > operator>(NBL_CONST_REF_ARG(vector<I, D>) rhs)
{
return !operator<= <BitsAlreadySpread, I>(rhs);
}

template<bool BitsAlreadySpread, typename I>
enable_if_t<(is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread), vector<bool, D> > operator>=(NBL_CONST_REF_ARG(vector<I, D>) rhs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (is_signed_v<I> == Signed) || (is_same_v<I, storage_t> && BitsAlreadySpread) is wrong, lets analyze the cases:

  1. if bits are already spread we want I to be same size as storage_t or the same as a Signed variant of storage
  2. if the bits are not spread we want a vector which matches signedness AND where I is such that its the smallest satisfying sizeof(I)*8 >= Bits

Hence your condition must be

is_signed_v<T> == Signed && (Case 1 || Case 2)

Note that in case (1) you'll want a signed_v<I> but the storage is ALWAYS unsigned

Comment on lines +674 to +676
// I must be of same signedness as the morton code, and be wide enough to hold each component
template<typename I, uint16_t Bits, uint16_t D, typename _uint64_t> NBL_PARTIAL_REQ_TOP(concepts::IntegralScalar<I> && (8 * sizeof(I) >= Bits))
struct static_cast_helper<vector<I, D>, morton::code<is_signed_v<I>, Bits, D, _uint64_t> NBL_PARTIAL_REQ_BOT(concepts::IntegralScalar<I> && (8 * sizeof(I) >= Bits)) >

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_casts are to do whatever so no component width matching

Comment on lines +293 to +298
vector<bool, D> retVal;
[[unroll]]
for (uint16_t i = 0; i < D; i++)
{
retVal[i] = (value & leftShift(Mask, i)) == leftShift(rhs[i], i);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XOR the values, then mask the results #860 (comment)

Comment on lines +311 to +316
vector<storage_t, D> interleaved;
[[unroll]]
for (uint16_t i = 0; i < D; i++)
{
interleaved[i] = impl::MortonEncoder<D, Bits, storage_t>::encode(_static_cast<U>(rhs[i]));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're repeating the job of the constructor, and the only reason youre doing that is because you haven't defined the operator== out of line (not inside the class) such that you can declare the specialization of impl::Equals between the class (which has the ctor and forward declaration of operator==) and the definition of operator== that uses `impl::Equals

template<typename I>
constexpr inline explicit operator vector<I, D>() const noexcept
{
return _static_cast<vector<I, D>, morton::code<is_signed_v<I>, Bits, D>>(*this);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to define operator vector<I,D>() const out of line, maybe you placed constexpr and inline twice

CC: @AnastaZIuk link your triple compiler godbolt for testing such things

Comment on lines 108 to 115
const spirv::AddCarryOutput<uint32_t> lowerAddResult = addCarry(__getLSB(), rhs.__getLSB());
const this_t retVal = create(lowerAddResult.result, __getMSB() + rhs.__getMSB() + lowerAddResult.carry);
return retVal;
}

NBL_CONSTEXPR_INLINE_FUNC this_t operator-(NBL_CONST_REF_ARG(this_t) rhs) NBL_CONST_MEMBER_FUNC
{
const spirv::SubBorrowOutput<uint32_t> lowerSubResult = subBorrow(data.y, rhs.data.y);
const storage_t subResult = { data.x - rhs.data.x - lowerSubResult.borrow, lowerSubResult.result };
const this_t retVal = create(subResult);
const spirv::SubBorrowOutput<uint32_t> lowerSubResult = subBorrow(__getLSB(), rhs.__getLSB());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the spirv namespace output structs in C++ ?

@Przemog1 how does our modf and stuff work in C++ without touching spirv namespace?

Copy link
Contributor Author

@Fletterio Fletterio Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah because those output structs are defined by us under hlsl/spirv_intrinsics/output_structs.hlsl and exposed via cpp_compat.hlsl to cpp side. I added tests to 22_CppCompat (not yet pushed because I'm still working on other stuff in the same branch) and they run fine

// Either the topmost bits, when interpreted with correct sign, are less than those of `rhs`, or they're equal and the lower bits are less
// (lower bits are always positive in both unsigned and 2's complement so comparison can happen as-is)
const bool MSB = Signed ? (_static_cast<int32_t>(__getMSB()) < _static_cast<int32_t>(rhs.__getMSB())) : (__getMSB() < rhs.__getMSB());
return any(vector<bool, 2>(MSB, (__getMSB() == rhs.__getMSB()) && (__getLSB() < rhs.__getLSB())));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do hlsl::mix(MSB,LSB,msb equal) to avoid HLSL mandated short circuit on logic ops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a ternary_operator that calls ? in CPP and OpSelect in SPIR-V

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although for bools ? is the same as OpSelect anyway

Comment on lines 262 to 265
const uint32_t shift = bits >= ComponentBitWidth ? bits - ComponentBitWidth : ComponentBitWidth - bits;
const type_t shifted = type_t::create(bits >= ComponentBitWidth ? vector<uint32_t, 2>(0, operand.__getLSB() << shift)
: vector<uint32_t, 2>(operand.__getLSB() << bits, (operand.__getMSB() << bits) | (operand.__getLSB() >> shift)));
return bits ? shifted : operand;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return mix(operand,shifted,bool(bits))

avoid ? use mix(T,T,bool) instead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also stop repeating yourself with bits >= ComponentBitWidth and declare a const bool

Comment on lines 190 to 228
const vector<uint32_t, 2> retValData = { (operand.data.x << _bits) | (operand.data.y >> shift), operand.data.y << _bits };
return type_t::create(retValData);
}
};

template<>
struct arithmetic_right_shift_operator<emulated_uint64_t>
{
using type_t = emulated_uint64_t;
NBL_CONSTEXPR_STATIC uint32_t ComponentBitWidth = uint32_t(8 * sizeof(uint32_t));

NBL_CONSTEXPR_INLINE_FUNC type_t operator()(NBL_CONST_REF_ARG(type_t) operand, uint16_t bits)
{
if (!bits)
return operand;
const uint32_t _bits = uint32_t(bits);
const uint32_t shift = ComponentBitWidth - _bits;
// We need the `y` component of the vector (which represents the lower bits of the emulated uint64) to get the `bits` lower bits of the `x` component
const vector<uint32_t, 2> retValData = { operand.data.x >> _bits, (operand.data.x << shift) | (operand.data.y >> _bits) };
return emulated_uint64_t::create(retValData);
}
};

template<>
struct arithmetic_right_shift_operator<emulated_int64_t>
{
using type_t = emulated_int64_t;
NBL_CONSTEXPR_STATIC uint32_t ComponentBitWidth = uint32_t(8 * sizeof(uint32_t));

NBL_CONSTEXPR_INLINE_FUNC type_t operator()(NBL_CONST_REF_ARG(type_t) operand, uint16_t bits)
{
if (!bits)
return operand;
const uint32_t _bits = uint32_t(bits);
const uint32_t shift = ComponentBitWidth - _bits;
// We need the `y` component of the vector (which represents the lower bits of the emulated uint64) to get the `bits` lower bits of the `x` component
// Also the right shift *only* in the top bits happens as a signed arithmetic right shift
const vector<uint32_t, 2> retValData = { _static_cast<uint32_t>(_static_cast<int32_t>(operand.data.x)) >> _bits, (operand.data.x << shift) | (operand.data.y >> _bits) };
return emulated_int64_t::create(retValData);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mix() works both in C++ and HLSL and uses OpSelect in HLSL

for (uint16_t i = 0; i < D; i++)
{
storage_t thisCoord = value & leftShift(Mask, i);
storage_t rhsCoord = leftShift(rhs[i], i);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also take BitsAlreadySpread to mean that the individual coordinates have been pre-shifted as well.

Basically you took a morton, promoted it to a vector<storage_t,D> and then masked out the bits which are not relevant to a channel

Comment on lines +373 to +378
vector<storage_t, D> interleaved;
[[unroll]]
for (uint16_t i = 0; i < D; i++)
{
interleaved[i] = impl::MortonEncoder<D, Bits, storage_t>::encode(_static_cast<U>(rhs[i]));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the morton Encoder and Decoder take the decoded value to be a vector because if you do the OR, AND and SHIFT on vectors, the compiler will generate code with less ALU stalls (between every two operations which need their own results, you'll get Dim-1 others).

the encoded_t could then be semi_encoded_t which has the elements already pre-shifted by their ordinal (one step before final OR)

P.S. I know I suggested to make the encoder do the OR-ing itself already, maybe not the greatest idea.

Comment on lines +349 to +350
[[unroll]]
for (uint16_t i = 0; i < D; i++)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the whole thing vectorized

Comment on lines +327 to +338
// Aux method for extracting highest bit, used by the comparison below
template<uint16_t Bits, uint16_t D, typename storage_t>
NBL_CONSTEXPR_INLINE_FUNC storage_t extractHighestBit(storage_t value, uint16_t coord)
{
// Like above, if the number encoded in `coord` gets `bits(coord) = ceil((BitWidth - coord)/D)` bits for representation, then the highest index of these
// bits is `bits(coord) - 1`
const uint16_t coordHighestBitIdx = Bits / D - ((coord < Bits % D) ? uint16_t(0) : uint16_t(1));
// This is the index of that bit as an index in the encoded value
const uint16_t shift = coordHighestBitIdx * D + coord;
left_shift_operator<storage_t> leftShift;
return value & leftShift(_static_cast<storage_t>(uint16_t(1)), shift);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turn into a sign_mask_v<Bits,D>

Comment on lines +355 to +358
if (extractHighestBit<Bits, D, storage_t>(thisCoord) != _static_cast<storage_t>(uint64_t(0)))
thisCoord = thisCoord | ~leftShift(Mask, i);
if (extractHighestBit<Bits, D, storage_t>(rhsCoord) != _static_cast<storage_t>(uint64_t(0)))
rhsCoord = rhsCoord | ~leftShift(Mask, i);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done branchlessly with

mix(coord,~impl::mask_v<Bits,D>,vector<bool,D>(sign_mask_v<Bits,D> & coord));

also the problem is that you're still performing a comparison on storage_t (which is always uint64_t) and not int64_t

and you're adding 1 bits everywhere whenever number is negative, which makes it larger.

you should have caught broken negative number comparsions in your example test suite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have some code for our legacy deprecated SSE3 vectors which fashioned an unsigned comparison from a signed one (opposite to your problem)

if constexpr(!std::is_signed<T>::value) \

sign mask got XORED into the operand, turning numbers higher than or equal to 2^(N-1) negative

meaning 2^(N-1) got remapped to 0 and 2^N-1 got remapped to 2^(N-1)-1, while 0 got remapped to -2^(N-1) and 2^(N-1)-1 to -1

note that its enough to apply the same trick to your spread mortons:

  1. XOR the sign bit (so position 1<<(Bits-1)*D+i)
  2. still perform the comparison as storage_t (so unsigned comparison

What (1) does is that it remaps signed morton codes to unsigned in the following way:

MinSignedMortonPattern -> 0
-1 -> MaxUnsignedMortonPattern>>1
0 -> 1<<(Bits-1)*D
MinSignedMortonPattern  -> MaxUnsignedMortonPattern

…c and generic ternary operator that should work for all compatible types, address PR review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants