You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/architecture/divider.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ The LUT method has an array of preset values, in which the initial guess of the
44
44
45
45
The LUT is the most commonly referred to method in selecting the initial guess for division, especially if area is not a large concern. Below is a python script that shows the methodology behind the generation. All that is being done is an accurate "in between" value between each step to best bring a denominator to the value of 1.
46
46
```
47
-
entries = 16
47
+
entries = x
48
48
def generate_lut(size):
49
49
step = 1.0 / size
50
50
seeds = torch.zeros(size, dtype=torch.bfloat16)
@@ -65,9 +65,9 @@ This specific magic number (`16'h7EF3`) was calculated using a python script on
65
65
66
66
## Finalized Design Choices
67
67
68
-
### Simulation
69
68
Before deciding which design we'd stick to, we utilized a Python simulation to see if our math would work, as well giving ourselves the perspective to pick the design best suited for our needs.
70
69
70
+
### Simulation
71
71
Below is a picture of the simulated ULP error at each iteration of the Goldschmidt division algorithm using both a magic number and LUT approach. This was simulated using the PyTorch library in Python. Example pseudo code and simulated graph is below.
Copy file name to clipboardExpand all lines: docs/src/architecture/reciprocal.md
+13-8Lines changed: 13 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ Roy, T. D. (2019). Implementation of Goldschmidt's algorithm with hardware reduc
4
4
5
5
# Reciprocal Unit Documentation
6
6
7
-
The Reciprocal Unit largely follows the implemenation outlined in `divider.md`
7
+
The Reciprocal Unit largely follows the implemenation outlined in `divider.md`. A more thorough explaination of the Goldschmidt Algorithm and various design decesions is located there. This document expects you to have a decent understanding of `divider.md`
8
8
9
-
###Algorithm & LUT
9
+
## Algorithm & LUT
10
10
The main difference between the reciprocal unit and the divider lies in the handling of the numerator. Because the numerator is guaranteed to be 1, the need for a multiplier for the numerator is eliminated and the N1 can be effectively written as F0 (the initial guess). With this new area, we can use a small 16-entry LUT to get an initial guess for the factor that will guarantee a max ULP of 1. The initial guess is indexed using the most significant bits of the denominator's manitssa. Below is a code snippet of the generation used to create the LUT.
11
11
12
12
```
@@ -32,13 +32,13 @@ with open("lut_values.txt", "w") as f:
32
32
**The Iteration 2 Optimization:**
33
33
Because the algorithm only requires 2 iterations, the mathematical sequence looks like this:
34
34
```
35
-
* **Iteration 1:** F0 = LUT guess
35
+
Iteration 1: F0 = LUT guess
36
36
D1 = D * F0
37
37
F1 = 2.0 - D1
38
38
39
-
* **Iteration 2:** Result = F0 * F1
39
+
Iteration 2: Result = F0 * F1
40
40
```
41
-
### 2 Multiplier Design
41
+
##Pipelined Design
42
42
*Primary Author: Brian Zhuang*
43
43
44
44
This architecture instantiates **2 Multipliers and 1 Subtractor** and is completely pipelined.
@@ -55,8 +55,9 @@ To match the timing requirement, the pipeline is split up to **9 Stages**:
55
55
registers, and exits the subtractor. Latches output value at stage 5.
56
56
***Stage 8, 9:** Final exponent is calculated from initial exponent calculation in stage 1. Final answer then is latched in stage 9 where it is outputted.
57
57
58
-
Included below is a block diagram (top) and a RTL diagram (below)
59
-

58
+
Included below is a block diagram (top) and a RTL diagram (bottom)
59
+

60
+

60
61
61
62
#### Traffic Control
62
63
Backpressure occurs when the downstream consumer (writeback in our case) is not ready to accept the divider's output. When this occurs, the pipe enable signals goes low and stalls through to the upstream producer.
@@ -72,6 +73,10 @@ Below is a table of results for the reciprocal unit. The ULP numbers are pulled
72
73
| 44,368 | 21,168 | 0 | 0.65 | 1 | 15030.467 |
73
74
74
75
### Simulation
75
-
Below is a picture of the simulated ULP error at each iteration of the Goldschmidt division algorithm using both a magic number and LUT approach. This was simulated using the PyTorch library in Python.
76
+
Below is a picture of the simulated ULP error at each iteration of the Goldschmidt division algorithm using both a magic number and LUT approach. This was simulated using the PyTorch library in Python. This is the same one seen in `divider.md`
76
77
77
78

79
+
80
+
## Potential Changes
81
+
### Please read this in case future changes are required
82
+
We are choosing to utilize an LUT since unlike the pipelined divider which is the design we are utilizing for this reciprocal unit utilizes one less multiplier, the area is still overall less. However it is important to point back, to the average ULP (discussed ``in divider.md`` in the "Finalized Design Choices" section) is a decent bit less than magic number. This is because the LUT being utilized being 16 elements is very small. If average accuracy ever becomes a problem the simplest solution would be to increase the LUT size. If the area is too high, switch to magic number.
0 commit comments