Skip to content

Comments

[otbn,simd] Add RTL of SIMD instructions implemented in BN ALU#29344

Open
etterli wants to merge 3 commits intolowRISC:masterfrom
etterli:otbn-simd-rtl-bnalu
Open

[otbn,simd] Add RTL of SIMD instructions implemented in BN ALU#29344
etterli wants to merge 3 commits intolowRISC:masterfrom
etterli:otbn-simd-rtl-bnalu

Conversation

@etterli
Copy link
Contributor

@etterli etterli commented Feb 20, 2026

This PR adds the first part of the SIMD instructions' RTL implementation. It adds the RTL for all instructions implemented in the Bignum ALU. See #29231 for the instruction definition / description.

Note that many regression tests still fail as not yet all new instructions are implemented in RTL.

Copy link
Contributor

@andrea-caforio andrea-caforio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool @etterli. I focused on the first commit because that's where
the math is. ;-)

* X0 = X[31:0], X1 = X[63:32], ..., X7 = X[255:224], same for Y
* Di = Decision by carry bits CXi and CYi
*
* D7 D7 D6 D7 D0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but is the first stage (decision) of this diagram
part of this module because the carry bits are generated externally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is D7 D0 correct as the inputs to the first decider stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagram does only show the selection stage. The decision stage is not depicted. And yes, this module is closely related to the actual adders in the bignum alu. I factored it out to hide some complexity (it is not much to be honest). The decision bits are based upon the actual carry bits from the two adders, so yes, externally.

D7 and D0 are the correct inputs to the selection MUX for the lowest 32 bits. Because depending on the ELEN (either 32 bit or 256 bit), this chunk must use the decision for chunk 0 (D0) in case ELEN = 32 or the selection must be based upon D7 which is the decision if we are operating on 256 bits. In the 256 bit case, the MSB carry decides for all chunks which result to take.

* The otbn_alu_bignum calculates pseudo modulo addition and subtraction by using two adders and
* evaluating their carry bits. Depending on the carry bits adder X or Y is selected as result.
*
* For addition, subtract mod if a + b >= mod:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this module in a way independent of the modulus? Because it simply
multiplexes some vector elements. So I'm not sure why the modulus
is mentioned here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. However, the whole selection logic makes only sense if it is put in context of the two adders and what they compute. I don't think this module makes any sense in any other standalone use.

Would it help if the header would introduce this context?

* - Adder X calculates X = a + b
* - Adder Y calculates Y = X - mod
*
* - If X generates a carry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what is meant here but it still slightly confusing to use the term "carry" here.
It is a decision bit that indicates whether a value is in the interval [0, mod-1] or
[mod, 2*mod-1].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ties in with my comment on mentioning the modulus here even though
the module is independent of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current naming, the decision bit is the bit carrying the information what this evaluation resulted in (0 to take result X, 1 to take result Y). The signal which is referred to here is the actual carry bit of the adder X (which is computed externally)..

*
* For subtraction, this stage generates an additional signal whether any vector element uses the
* result of adder Y. This signal is used for MOD integrity checks and blanking assertions. For
* addition this signal is always set as the carries of Y are used for the decisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. So this additional signal is only used in the subtraction case for
some security checks? Why not unconditionally set it to 1 like for addition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the behaviour of the current OTBN. I do not know the design rationale for making this check dependent on the result. Let's discuss this offline.


// Vector element length type for bignum vec ISA implemented in BN ALU for
// bn.addv(m), bn.subv(m) and bn.shv.
// The ISA forsees only 4 types (16 to 128 bits). However, only a subset is implemented.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "4 types (16 to 128 bits)" you mean 16, 32, 64 and 128?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But this line was updated in the last force push.

) (
input logic [LVLEN-1:0] operand_a_i,
input logic [LVLEN-1:0] operand_b_i,
input logic operand_b_invert_i,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the indicator bit for performing a subtraction, why not call it like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because executing a subtraction also requires to set the carry in accordingly (to 1, such that the two's complement is correctly computed). This signal only controls whether the operand B should be inverted or not. Could be useful if we want to use a one's complement (but I don't think so).

*
* This carry chaining allows to compute additions over multiples of LVChunkLEN wide elements
* including the full vector width (i.e., a non vectorized addition). To perform subtraction the
* input B can be inverted and all carries must be set to 1 as: a - b = a + ~b + 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the caller has to invert B in the subtraction case but it is handled in this
module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this be more clear:

A subtraction can be performed by setting the operand_b_invert_i signal and the input carries to one because: a - b = a + ~b + 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased it

/**
* OTBN vectorized shifter
*
* This shifter is capable of shifting vectors elementwise as well as concatenate and shift 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can mention somewhere that these are logical shifts as opposed
to arithmetic ones, which are only supported for the GPR registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned it

* This module transposes the elements of two input vectors in two different ways.
* It supports 32b, 64b and 128b element lengths.
*
* If there are two vectors with 4 elements the transpositions are as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you should mention that trn1 interleaves even coordinates and trn2 odd ones otherwise
the word transposition is a bit misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is indeed more clear. I updated it.

assign shifter_out_lower_mux[AluShiftDirLeft] = shifter_out_lower_reverse;
assign shifter_out_lower_mux[AluShiftDirRight] = shifter_out_lower;

prim_onehot_mux #(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention why there is a onehot mux here but not in the other modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this offline.

@etterli etterli force-pushed the otbn-simd-rtl-bnalu branch from ca2aa31 to c64a89a Compare February 20, 2026 13:55
@etterli etterli added the CI:Rerun Rerun failed CI jobs label Feb 21, 2026
@github-actions github-actions bot removed the CI:Rerun Rerun failed CI jobs label Feb 21, 2026
@etterli etterli force-pushed the otbn-simd-rtl-bnalu branch 2 times, most recently from d7d3fa7 to 38f6dd0 Compare February 21, 2026 12:15
This adds a vectorized adder, a modulo result selector, a vectorized shifter and a vector transposer
module. These modules are the building blocks to construct the vectorized BN ALU.

Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>
Add the vectorized instructions implemented in the BN ALU to the OTBN.

Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>
… shifter

The vector transposser and shifter are fully combinatorial and the clk and rst are only used for
assertions.

Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>
@etterli etterli force-pushed the otbn-simd-rtl-bnalu branch from 38f6dd0 to e26ec29 Compare February 21, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants