Skip to content

Conversation

@Victorin-Brunel
Copy link
Contributor

@Victorin-Brunel Victorin-Brunel commented Jul 10, 2024

Here are the additions regarding BMat16 :

  • Creation of BMat16
  • Various constructors (with a vector register of 256 bits, 4 64 bits uint, or a two-dimensional vector)
  • Comparison operators (==, !=, <, >)
  • Acces operator to a particular bit at a position (i, j)
  • Method to set a bit at a position (i, j)
  • A convertor to a two-dimensional array
  • Bitwise or operator between two matrices
  • Naive transposition
  • Optimized transposition with vector instructions
  • Optimized matrix multplication
  • Optimized matrix multplication using BMat8 matrix multiplication
  • Two naive multiplications (one with the acces operator, the other with the array conversion)
  • Number of non-zero rows
  • Vector of rows
  • Identity matrix of size 0 to 16
  • Random matrix, and the possibility to specify a size from 1 to 16
  • Swap of two matrices
  • Display operator

@james-d-mitchell
Copy link
Member

Closing and reopening to trigger the ci again

Copy link
Member

@james-d-mitchell james-d-mitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried BMat16's with libsemigroups (basically dropping it in everywhere we use HPCombi::BMat8 in the tests) and found no issues whatsoever. So I'm happy to merge this if you are @hivert ?

@hivert
Copy link
Collaborator

hivert commented Jan 21, 2025

I've tried BMat16's with libsemigroups (basically dropping it in everywhere we use HPCombi::BMat8 in the tests) and found no issues whatsoever. So I'm happy to merge this if you are @hivert ?

I'd like to look a little at the generated assembly code to check that some think are properly optimized by the (a) compiler (notably some product by power of 2 which could be written as shift). I try to do it either during the week-end or next week. Other than that, I'm Ok to merge.

@hivert
Copy link
Collaborator

hivert commented Mar 18, 2025

The code looks good. I checked:

  • product in the BMat16::BMat16, BMat16::operator(), BMat16::set and BMat16::to_array are properly optimized with shift.
  • the loop in BMat16::mult_transpose is properly unrolled.

This should be fixed easily

  • The doc of operator< doesn't specify the order;
  • the is a bitwise | on matrices but not & of bitwise complement.

There are finally some possible improvements which needs to be fixed:

  • This should be fixed as soon as simde provides the implementation
inline size_t BMat16::nr_rows() const noexcept {
  [...] 
  //// Vectorized version which doesn't work due to the absence of popcnt in
   /// simde
   // xpu16 tmp = _data, zero = simde_mm256_setzero_si256();
   // xpu16 x = (tmp != zero);
   // return simde_mm256_popcnt_epi16(x);
}
  • This one is low priority (in random which is now that used and probably a little time consuming):
    // TO DO : Instead of nulling all the cols/rows one by one, one could do
    // that at once with the proper mask

@james-d-mitchell
Copy link
Member

Great! Thanks @hivert, should we then: merge this PR, and add issues for the TODOs that you mention, or would you rather that the TODOs are addressed in this PR first?

@james-d-mitchell
Copy link
Member

I just rebased onto the most recent versions on main

@james-d-mitchell james-d-mitchell merged commit 4aa10ca into libsemigroups:main Mar 20, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants