Hello,
On page 13 of the 3rd lecture slides, RMSNorm is defined as:
$$
y = \frac{x}{\sqrt{||x||^2_2 + \epsilon}} * \gamma
$$
This definition seems to be missing the "divide by the number of elements" part, i.e. it should be
$$
y = \frac{x}{\sqrt{\dfrac{||x||^2_2}{n} + \epsilon}} * \gamma
$$
Best,
Uğur
Hello,
On page 13 of the 3rd lecture slides, RMSNorm is defined as:
This definition seems to be missing the "divide by the number of elements" part, i.e. it should be
Best,
Uğur