Replace GLM with custom vectorised maths

Calling `value_ptr()` and `memcpy` to correctly build buffers of vectors is a pain, build some maths helpers to manipulate vectors, implemented as raw blocks of memory instead. This'll allow GLM to be dropped as a dependency.

Vectorise with C++26's upcoming `<simd>` header.

### Dependencies:
 - [x] Merge `enable-arch-control`
   - Use `ARCH=x86-64` for debugging until it's fixed properly, tracked in #33
 - [ ] C++26 SIMD, tracked in #36

### Introduction:
 - [x] Add vector types and operations
 - [x] Disable 8-bit and 16-bit types
   - This should reduce build times and time spent fixing them
   - Helper functions (`formatVector()`, random vector fill) could then be simplified
   - Type promotions wouldn't have to be considered either
 - [x] Add column-first matrix types and operations
   - [x] Sizes from `2x2` up to `4x4`, floating-point types only
   - [x] Operations should include:
     - [x] `data`, `copy`, `copyCast`, `equal`, `set`, `diagonal`, `identity`
     - [x] `add`, `sub`
     - [x] `transpose`, `multiply`
     - [x] `determinant`, `inverse`
     - [x] `rotate`, `scale`, `translate`
     - [x] `lookAt`, `perspective`
 - [x] Add quaternion types and operations
   - [x] Templated type for floats and doubles
   - [x] Operations should include:
     - [x] `data`, `copy`, `copyCast`
     - [x] `fromEuler`, `toEuler`
     - [x] `fromPitchYawRoll`, `toPitchYawRoll`
     - [x] `dot`, `conjugate`, `length`, `normalise`, `inverse`
     - [x] `multiply (quaternion-quaternion)`, `multiply (quaternion-vector)`
     - [x] `toMatrix`
 - [x] Provide degrees and radians conversions
 - [x] Return a reference to the operand modified, if applicable

### C++26 port:
 - [ ] Wait for libstdc++ to support `<simd>`
 - [ ] Swap `<experimental/simd>` for `<simd>`, drop linter override and experimental regex
 - [ ] Convert the existing functions to the new syntax
 - [ ] Mark functions as constexpr where possible
 - [ ] Make use of partial loads / stores, permute and gather / scatter to allow vectorisation

### C++26 optimisations:
 - [ ] Make use of new C++26 SIMD features:
   - Partial loads and stores
   - Gather / scatter loads and stores
   - Permute
   - `<cmath>` overloads
   - `sum_to` and `multiply_to`
 - [ ] Promote vectors, matrices and quaternions with `bit_ceil`
   - This isn't necessary if compilers can handle this automatically
   - This includes templated sizes and fixed-size vectors
 - [ ] Determine whether a helper for automatic full / partial stores is required
 - [ ] Speed up `set(vector, vector, scalar)`
   - Load the second vector into the width of the first with a partial load and default value of the scalar
   - Store to the first vector
   - Handle equivalent sizes or cases where the destination is smaller
   - Consider reordering arguments for destination argument position consistency
 - [ ] Speed up `set([vector / matrix], scalar)`
   - Broadcast the scalar to a width-rounded vector, then use a parial store
 - [ ] Matrix transpose using (`load -> permute -> store`) or (`[gather / load] -> [store / scatter]`)
   - (`load -> permute -> store`) might be easier with width rounding
 - [ ] Vectorise `equals` by comparing floating point vectors
   - Started on `simp-cmp`, seems to have an issue with the experimental SIMD library
 - [ ] Speed up `cross`
   - Load the inputs, then permute a copy of each
   - Apply the arithmetic
   - Reorder and store the output
 - Speed up `normalise`
   - Square the vector elements
   - Rotate by one element to a copy, sum the vectors
     - Repeat with a two element rotation for lengths 3 and 4
     - Investigate guarding this behind checks for AVX and sufficient native vector size
       - Fall back to a reduce by addition otherwise
   - `std::sqrt` on the vector, then divide the original by it and store
   - Look for related functions
 - [ ] List out all functions to vectorise / optimise and wrk through them
 - [ ] Look into assembly optimisations
   - `_mm_dp_ps()`, `_mm_dp_pd()` and `_mm256_dp_ps()` for vector and quaternion dot products
   - These might be faster than vector and quaternion length calculation, by taking the dot product with itself, then calling `_mm_sqrt_ps()`, `_mm_sqrt_pd()`, `_mm256_sqrt_ps()` or `_mm256_sqrt_pd()`
 - [ ] Verify performance and inspect assembly for optimisations
   - Target `znver4`, `znver3` and ARM SVE(2)

### Rewrite GLM-dependent functions:
 - [ ] Rewrite matrix operations to be GLM-independent:
   - [ ] `transpose`, `multiply`
     - Use scatter / gather or load, permute, store
   - [ ] `determinant`, `inverse`
       - Hard-coding an equation for each element, specialised for each size might be faster than using a general algorithm
   - [ ] `rotate`, `scale`, `translate`
     - These can be derived, or have well known algorithms
   - [ ] `lookAt`, `perspective`
     - These both have well-known algorithms
 - [ ] Rewrite quaternion operations to be GLM-independent:
   - https://www.opengl-tutorial.org/assets/faq_quaternions/index.html
   - [ ] `fromEuler`, `toEuler`
   - [ ] `multiply (quaternion-quaternion)`, `multiply (quaternion-vector)`
   - [ ] `toMatrix`
 - [ ] Benchmark alternative implementations for operations that can be implemented with scatter / gather / permute
   - Consider using `<linalg>` instead of manual implementations

### GLM removal:
 - Started on `remove-glm`
 - [x] Convert codebase to new maths functions
   - [x] Remove explicit client code / build system dependencies
 - [ ] Rewrite any maths functions that are dependent on GLM
 - [ ] Remove from build system, docs, workflows and pkgconf
 - [ ] Remove GLM header filter for clang-tidy

### Clean up:
 - [x] Compile with `-fno-math-errno` and `-march=native`
 - [ ] Mention the vectorised maths in the README
 - [ ] Bump minimum versions (GCC, clang, clang-tidy) for docs and workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace GLM with custom vectorised maths #37

Dependencies:

Introduction:

C++26 port:

C++26 optimisations:

Rewrite GLM-dependent functions:

GLM removal:

Clean up:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Replace GLM with custom vectorised maths #37

Description

Dependencies:

Introduction:

C++26 port:

C++26 optimisations:

Rewrite GLM-dependent functions:

GLM removal:

Clean up:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions