[otbn,simd] Add RTL of SIMD instructions implemented in BN ALU#29344
[otbn,simd] Add RTL of SIMD instructions implemented in BN ALU#29344etterli wants to merge 3 commits intolowRISC:masterfrom
Conversation
93e5bdf to
ca2aa31
Compare
andrea-caforio
left a comment
There was a problem hiding this comment.
This is cool @etterli. I focused on the first commit because that's where
the math is. ;-)
| * X0 = X[31:0], X1 = X[63:32], ..., X7 = X[255:224], same for Y | ||
| * Di = Decision by carry bits CXi and CYi | ||
| * | ||
| * D7 D7 D6 D7 D0 |
There was a problem hiding this comment.
Correct me if I'm wrong but is the first stage (decision) of this diagram
part of this module because the carry bits are generated externally?
There was a problem hiding this comment.
Also is D7 D0 correct as the inputs to the first decider stage?
There was a problem hiding this comment.
The diagram does only show the selection stage. The decision stage is not depicted. And yes, this module is closely related to the actual adders in the bignum alu. I factored it out to hide some complexity (it is not much to be honest). The decision bits are based upon the actual carry bits from the two adders, so yes, externally.
D7 and D0 are the correct inputs to the selection MUX for the lowest 32 bits. Because depending on the ELEN (either 32 bit or 256 bit), this chunk must use the decision for chunk 0 (D0) in case ELEN = 32 or the selection must be based upon D7 which is the decision if we are operating on 256 bits. In the 256 bit case, the MSB carry decides for all chunks which result to take.
| * The otbn_alu_bignum calculates pseudo modulo addition and subtraction by using two adders and | ||
| * evaluating their carry bits. Depending on the carry bits adder X or Y is selected as result. | ||
| * | ||
| * For addition, subtract mod if a + b >= mod: |
There was a problem hiding this comment.
Isn't this module in a way independent of the modulus? Because it simply
multiplexes some vector elements. So I'm not sure why the modulus
is mentioned here?
There was a problem hiding this comment.
I see your point. However, the whole selection logic makes only sense if it is put in context of the two adders and what they compute. I don't think this module makes any sense in any other standalone use.
Would it help if the header would introduce this context?
| * - Adder X calculates X = a + b | ||
| * - Adder Y calculates Y = X - mod | ||
| * | ||
| * - If X generates a carry: |
There was a problem hiding this comment.
I know what is meant here but it still slightly confusing to use the term "carry" here.
It is a decision bit that indicates whether a value is in the interval [0, mod-1] or
[mod, 2*mod-1].
There was a problem hiding this comment.
This ties in with my comment on mentioning the modulus here even though
the module is independent of it.
There was a problem hiding this comment.
In the current naming, the decision bit is the bit carrying the information what this evaluation resulted in (0 to take result X, 1 to take result Y). The signal which is referred to here is the actual carry bit of the adder X (which is computed externally)..
| * | ||
| * For subtraction, this stage generates an additional signal whether any vector element uses the | ||
| * result of adder Y. This signal is used for MOD integrity checks and blanking assertions. For | ||
| * addition this signal is always set as the carries of Y are used for the decisions. |
There was a problem hiding this comment.
I don't understand this. So this additional signal is only used in the subtraction case for
some security checks? Why not unconditionally set it to 1 like for addition?
There was a problem hiding this comment.
I followed the behaviour of the current OTBN. I do not know the design rationale for making this check dependent on the result. Let's discuss this offline.
hw/ip/otbn/rtl/otbn_pkg.sv
Outdated
|
|
||
| // Vector element length type for bignum vec ISA implemented in BN ALU for | ||
| // bn.addv(m), bn.subv(m) and bn.shv. | ||
| // The ISA forsees only 4 types (16 to 128 bits). However, only a subset is implemented. |
There was a problem hiding this comment.
With "4 types (16 to 128 bits)" you mean 16, 32, 64 and 128?
There was a problem hiding this comment.
Yes. But this line was updated in the last force push.
| ) ( | ||
| input logic [LVLEN-1:0] operand_a_i, | ||
| input logic [LVLEN-1:0] operand_b_i, | ||
| input logic operand_b_invert_i, |
There was a problem hiding this comment.
This is the indicator bit for performing a subtraction, why not call it like that?
There was a problem hiding this comment.
Because executing a subtraction also requires to set the carry in accordingly (to 1, such that the two's complement is correctly computed). This signal only controls whether the operand B should be inverted or not. Could be useful if we want to use a one's complement (but I don't think so).
hw/ip/otbn/rtl/otbn_vec_adder.sv
Outdated
| * | ||
| * This carry chaining allows to compute additions over multiples of LVChunkLEN wide elements | ||
| * including the full vector width (i.e., a non vectorized addition). To perform subtraction the | ||
| * input B can be inverted and all carries must be set to 1 as: a - b = a + ~b + 1. |
There was a problem hiding this comment.
This sounds like the caller has to invert B in the subtraction case but it is handled in this
module?
There was a problem hiding this comment.
Would something like this be more clear:
A subtraction can be performed by setting the operand_b_invert_i signal and the input carries to one because: a - b = a + ~b + 1.
hw/ip/otbn/rtl/otbn_vec_shifter.sv
Outdated
| /** | ||
| * OTBN vectorized shifter | ||
| * | ||
| * This shifter is capable of shifting vectors elementwise as well as concatenate and shift 256 |
There was a problem hiding this comment.
Maybe you can mention somewhere that these are logical shifts as opposed
to arithmetic ones, which are only supported for the GPR registers.
| * This module transposes the elements of two input vectors in two different ways. | ||
| * It supports 32b, 64b and 128b element lengths. | ||
| * | ||
| * If there are two vectors with 4 elements the transpositions are as follows: |
There was a problem hiding this comment.
Here you should mention that trn1 interleaves even coordinates and trn2 odd ones otherwise
the word transposition is a bit misleading.
There was a problem hiding this comment.
Thanks, this is indeed more clear. I updated it.
| assign shifter_out_lower_mux[AluShiftDirLeft] = shifter_out_lower_reverse; | ||
| assign shifter_out_lower_mux[AluShiftDirRight] = shifter_out_lower; | ||
|
|
||
| prim_onehot_mux #( |
There was a problem hiding this comment.
Can you mention why there is a onehot mux here but not in the other modules?
There was a problem hiding this comment.
Let's discuss this offline.
ca2aa31 to
c64a89a
Compare
d7d3fa7 to
38f6dd0
Compare
This adds a vectorized adder, a modulo result selector, a vectorized shifter and a vector transposer module. These modules are the building blocks to construct the vectorized BN ALU. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>
Add the vectorized instructions implemented in the BN ALU to the OTBN. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>
… shifter The vector transposser and shifter are fully combinatorial and the clk and rst are only used for assertions. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>
38f6dd0 to
e26ec29
Compare
This PR adds the first part of the SIMD instructions' RTL implementation. It adds the RTL for all instructions implemented in the Bignum ALU. See #29231 for the instruction definition / description.
Note that many regression tests still fail as not yet all new instructions are implemented in RTL.