[otbn,simd] Add RTL of SIMD instructions implemented in BN ALU by etterli · Pull Request #29344 · lowRISC/opentitan

etterli · 2026-02-20T09:43:24Z

This PR adds the first part of the SIMD instructions' RTL implementation. It adds the RTL for all instructions implemented in the Bignum ALU. See #29231 for the instruction definition / description.

Note that many regression tests still fail as not yet all new instructions are implemented in RTL.

andrea-caforio

This is cool @etterli. I focused on the first commit because that's where
the math is. ;-)

andrea-caforio · 2026-02-20T12:08:45Z

hw/ip/otbn/rtl/otbn_mod_result_selector.sv

+ * X0 = X[31:0], X1 = X[63:32], ..., X7 = X[255:224], same for Y
+ * Di = Decision by carry bits CXi and CYi
+ *
+ * D7                           D7   D6                             D7   D0


Correct me if I'm wrong but is the first stage (decision) of this diagram
part of this module because the carry bits are generated externally?

Also is D7 D0 correct as the inputs to the first decider stage?

The diagram does only show the selection stage. The decision stage is not depicted. And yes, this module is closely related to the actual adders in the bignum alu. I factored it out to hide some complexity (it is not much to be honest). The decision bits are based upon the actual carry bits from the two adders, so yes, externally.

D7 and D0 are the correct inputs to the selection MUX for the lowest 32 bits. Because depending on the ELEN (either 32 bit or 256 bit), this chunk must use the decision for chunk 0 (D0) in case ELEN = 32 or the selection must be based upon D7 which is the decision if we are operating on 256 bits. In the 256 bit case, the MSB carry decides for all chunks which result to take.

andrea-caforio · 2026-02-20T12:12:19Z

hw/ip/otbn/rtl/otbn_mod_result_selector.sv

+ * The otbn_alu_bignum calculates pseudo modulo addition and subtraction by using two adders and
+ * evaluating their carry bits. Depending on the carry bits adder X or Y is selected as result.
+ *
+ * For addition, subtract mod if a + b >= mod:


Isn't this module in a way independent of the modulus? Because it simply
multiplexes some vector elements. So I'm not sure why the modulus
is mentioned here?

I see your point. However, the whole selection logic makes only sense if it is put in context of the two adders and what they compute. I don't think this module makes any sense in any other standalone use.

Would it help if the header would introduce this context?

andrea-caforio · 2026-02-20T12:14:44Z

hw/ip/otbn/rtl/otbn_mod_result_selector.sv

+ * - Adder X calculates X = a + b
+ * - Adder Y calculates Y = X - mod
+ *
+ * - If X generates a carry:


I know what is meant here but it still slightly confusing to use the term "carry" here.
It is a decision bit that indicates whether a value is in the interval [0, mod-1] or
[mod, 2*mod-1].

This ties in with my comment on mentioning the modulus here even though
the module is independent of it.

In the current naming, the decision bit is the bit carrying the information what this evaluation resulted in (0 to take result X, 1 to take result Y). The signal which is referred to here is the actual carry bit of the adder X (which is computed externally)..

andrea-caforio · 2026-02-20T12:17:24Z

hw/ip/otbn/rtl/otbn_mod_result_selector.sv

+ *
+ * For subtraction, this stage generates an additional signal whether any vector element uses the
+ * result of adder Y. This signal is used for MOD integrity checks and blanking assertions. For
+ * addition this signal is always set as the carries of Y are used for the decisions.


I don't understand this. So this additional signal is only used in the subtraction case for
some security checks? Why not unconditionally set it to 1 like for addition?

I followed the behaviour of the current OTBN. I do not know the design rationale for making this check dependent on the result. Let's discuss this offline.

andrea-caforio · 2026-02-20T12:19:05Z

hw/ip/otbn/rtl/otbn_pkg.sv

+
+  // Vector element length type for bignum vec ISA implemented in BN ALU for
+  // bn.addv(m), bn.subv(m) and bn.shv.
+  // The ISA forsees only 4 types (16 to 128 bits). However, only a subset is implemented.


With "4 types (16 to 128 bits)" you mean 16, 32, 64 and 128?

Yes. But this line was updated in the last force push.

andrea-caforio · 2026-02-20T12:25:47Z

hw/ip/otbn/rtl/otbn_vec_adder.sv

+) (
+  input  logic [LVLEN-1:0]     operand_a_i,
+  input  logic [LVLEN-1:0]     operand_b_i,
+  input  logic                 operand_b_invert_i,


This is the indicator bit for performing a subtraction, why not call it like that?

Because executing a subtraction also requires to set the carry in accordingly (to 1, such that the two's complement is correctly computed). This signal only controls whether the operand B should be inverted or not. Could be useful if we want to use a one's complement (but I don't think so).

andrea-caforio · 2026-02-20T12:26:28Z

hw/ip/otbn/rtl/otbn_vec_adder.sv

+ *
+ * This carry chaining allows to compute additions over multiples of LVChunkLEN wide elements
+ * including the full vector width (i.e., a non vectorized addition). To perform subtraction the
+ * input B can be inverted and all carries must be set to 1 as: a - b = a + ~b + 1.


This sounds like the caller has to invert B in the subtraction case but it is handled in this
module?

Would something like this be more clear:

A subtraction can be performed by setting the operand_b_invert_i signal and the input carries to one because: a - b = a + ~b + 1.

Rephrased it

andrea-caforio · 2026-02-20T12:29:39Z

hw/ip/otbn/rtl/otbn_vec_shifter.sv

+/**
+ * OTBN vectorized shifter
+ *
+ * This shifter is capable of shifting vectors elementwise as well as concatenate and shift 256


Maybe you can mention somewhere that these are logical shifts as opposed
to arithmetic ones, which are only supported for the GPR registers.

Mentioned it

andrea-caforio · 2026-02-20T12:32:51Z

hw/ip/otbn/rtl/otbn_vec_transposer.sv

+ * This module transposes the elements of two input vectors in two different ways.
+ * It supports 32b, 64b and 128b element lengths.
+ *
+ * If there are two vectors with 4 elements the transpositions are as follows:


Here you should mention that trn1 interleaves even coordinates and trn2 odd ones otherwise
the word transposition is a bit misleading.

Thanks, this is indeed more clear. I updated it.

andrea-caforio · 2026-02-20T12:33:55Z

hw/ip/otbn/rtl/otbn_vec_shifter.sv

+  assign shifter_out_lower_mux[AluShiftDirLeft]  = shifter_out_lower_reverse;
+  assign shifter_out_lower_mux[AluShiftDirRight] = shifter_out_lower;
+
+  prim_onehot_mux #(


Can you mention why there is a onehot mux here but not in the other modules?

Let's discuss this offline.

This adds a vectorized adder, a modulo result selector, a vectorized shifter and a vector transposer module. These modules are the building blocks to construct the vectorized BN ALU. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

Add the vectorized instructions implemented in the BN ALU to the OTBN. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

… shifter The vector transposser and shifter are fully combinatorial and the clk and rst are only used for assertions. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

etterli force-pushed the otbn-simd-rtl-bnalu branch from 93e5bdf to ca2aa31 Compare February 20, 2026 09:56

etterli requested review from andrea-caforio, andreaskurth, h-filali, nasahlpa, rswarbrick and vogelpi February 20, 2026 10:25

andrea-caforio reviewed Feb 20, 2026

View reviewed changes

etterli force-pushed the otbn-simd-rtl-bnalu branch from ca2aa31 to c64a89a Compare February 20, 2026 13:55

etterli added the CI:Rerun Rerun failed CI jobs label Feb 21, 2026

github-actions bot removed the CI:Rerun Rerun failed CI jobs label Feb 21, 2026

etterli force-pushed the otbn-simd-rtl-bnalu branch 2 times, most recently from d7d3fa7 to 38f6dd0 Compare February 21, 2026 12:15

etterli added 3 commits February 21, 2026 14:16

[otbn,rtl] Integrate vectorized BN ALU instructions

fb773c5

Add the vectorized instructions implemented in the BN ALU to the OTBN. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

[otbn,lint] Waive clk and rst for combinatorial vector transposer and…

e26ec29

… shifter The vector transposser and shifter are fully combinatorial and the clk and rst are only used for assertions. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

etterli force-pushed the otbn-simd-rtl-bnalu branch from 38f6dd0 to e26ec29 Compare February 21, 2026 13:17

Comments

Conversation

etterli commented Feb 20, 2026

Uh oh!

andrea-caforio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants