Skip to content

Commit ae1ec24

Browse files
author
FirstName
committed
final changes to divider and reciprocal documentation for now
1 parent fd00f98 commit ae1ec24

4 files changed

Lines changed: 16 additions & 11 deletions

File tree

docs/src/architecture/divider.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The LUT method has an array of preset values, in which the initial guess of the
4444

4545
The LUT is the most commonly referred to method in selecting the initial guess for division, especially if area is not a large concern. Below is a python script that shows the methodology behind the generation. All that is being done is an accurate "in between" value between each step to best bring a denominator to the value of 1.
4646
```
47-
entries = 16
47+
entries = x
4848
def generate_lut(size):
4949
step = 1.0 / size
5050
seeds = torch.zeros(size, dtype=torch.bfloat16)
@@ -65,9 +65,9 @@ This specific magic number (`16'h7EF3`) was calculated using a python script on
6565

6666
## Finalized Design Choices
6767

68-
### Simulation
6968
Before deciding which design we'd stick to, we utilized a Python simulation to see if our math would work, as well giving ourselves the perspective to pick the design best suited for our needs.
7069

70+
### Simulation
7171
Below is a picture of the simulated ULP error at each iteration of the Goldschmidt division algorithm using both a magic number and LUT approach. This was simulated using the PyTorch library in Python. Example pseudo code and simulated graph is below.
7272
```
7373
def goldschmidt_div(method, dividend, divisor, iterations):
@@ -153,7 +153,7 @@ Below is a block diagram and RTL diagram of the design
153153

154154
![2MulDesign](../img/2-Multiplier-Divider.png)
155155

156-
## 3 Multiplier Design
156+
## Pipelined Design
157157
*Primary Author: Brian Zhuang*
158158

159159
This architecture instantiates **3 Multipliers and 1 Subtractor** and is completely pipelined.

docs/src/architecture/reciprocal.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Roy, T. D. (2019). Implementation of Goldschmidt's algorithm with hardware reduc
44

55
# Reciprocal Unit Documentation
66

7-
The Reciprocal Unit largely follows the implemenation outlined in `divider.md`
7+
The Reciprocal Unit largely follows the implemenation outlined in `divider.md`. A more thorough explaination of the Goldschmidt Algorithm and various design decesions is located there. This document expects you to have a decent understanding of `divider.md`
88

9-
### Algorithm & LUT
9+
## Algorithm & LUT
1010
The main difference between the reciprocal unit and the divider lies in the handling of the numerator. Because the numerator is guaranteed to be 1, the need for a multiplier for the numerator is eliminated and the N1 can be effectively written as F0 (the initial guess). With this new area, we can use a small 16-entry LUT to get an initial guess for the factor that will guarantee a max ULP of 1. The initial guess is indexed using the most significant bits of the denominator's manitssa. Below is a code snippet of the generation used to create the LUT.
1111

1212
```
@@ -32,13 +32,13 @@ with open("lut_values.txt", "w") as f:
3232
**The Iteration 2 Optimization:**
3333
Because the algorithm only requires 2 iterations, the mathematical sequence looks like this:
3434
```
35-
* **Iteration 1:** F0 = LUT guess
35+
Iteration 1: F0 = LUT guess
3636
D1 = D * F0
3737
F1 = 2.0 - D1
3838
39-
* **Iteration 2:** Result = F0 * F1
39+
Iteration 2: Result = F0 * F1
4040
```
41-
### 2 Multiplier Design
41+
## Pipelined Design
4242
*Primary Author: Brian Zhuang*
4343

4444
This architecture instantiates **2 Multipliers and 1 Subtractor** and is completely pipelined.
@@ -55,8 +55,9 @@ To match the timing requirement, the pipeline is split up to **9 Stages**:
5555
registers, and exits the subtractor. Latches output value at stage 5.
5656
* **Stage 8, 9:** Final exponent is calculated from initial exponent calculation in stage 1. Final answer then is latched in stage 9 where it is outputted.
5757

58-
Included below is a block diagram (top) and a RTL diagram (below)
59-
![img](img/Recip_unit_diagrams.png)
58+
Included below is a block diagram (top) and a RTL diagram (bottom)
59+
![recb](../img/recip_block.png)
60+
![recr](../img/recip_rtl.png)
6061

6162
#### Traffic Control
6263
Backpressure occurs when the downstream consumer (writeback in our case) is not ready to accept the divider's output. When this occurs, the pipe enable signals goes low and stalls through to the upstream producer.
@@ -72,6 +73,10 @@ Below is a table of results for the reciprocal unit. The ULP numbers are pulled
7273
| 44,368 | 21,168 | 0 | 0.65 | 1 | 15030.467 |
7374

7475
### Simulation
75-
Below is a picture of the simulated ULP error at each iteration of the Goldschmidt division algorithm using both a magic number and LUT approach. This was simulated using the PyTorch library in Python.
76+
Below is a picture of the simulated ULP error at each iteration of the Goldschmidt division algorithm using both a magic number and LUT approach. This was simulated using the PyTorch library in Python. This is the same one seen in `divider.md`
7677

7778
![img](../img/Goldschmidt_ULP_Analysis.jpg)
79+
80+
## Potential Changes
81+
### Please read this in case future changes are required
82+
We are choosing to utilize an LUT since unlike the pipelined divider which is the design we are utilizing for this reciprocal unit utilizes one less multiplier, the area is still overall less. However it is important to point back, to the average ULP (discussed ``in divider.md`` in the "Finalized Design Choices" section) is a decent bit less than magic number. This is because the LUT being utilized being 16 elements is very small. If average accuracy ever becomes a problem the simplest solution would be to increase the LUT size. If the area is too high, switch to magic number.

docs/src/img/recip_block.png

157 KB
Loading

docs/src/img/recip_rtl.png

210 KB
Loading

0 commit comments

Comments
 (0)