Replies: 1 comment 1 reply
-
|
Hi @IamAGP, As a fellow reader, I double-checked the value for LLMs-from-scratch/ch03/01_main-chapter-code/ch03.ipynb Lines 210 to 211 in 35354fa It looks like you used the truncated values from the figure for calculating the unscaled attention scores, which explains the slight difference. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
in section 3.4.1 , computing attention weights step by step , as shown in the screenshot here. attention score omega21 shouldn't it be 1.1 instead of 1.2 . just wanted to check once, my understanding of the calculation. Also a suggestion, throughout the notebook,can we have consistent naming like attention score and attention weight. attention score is unnnormalised and attention weight is normalised. do we need to define it as unnormalised attention score, coz they both are one and the same, right??
Completely unrelated, but a request, can u also dedicate one complete repo for DIFFUSION MODELS ? that would be amazing to cover two different paradigms (transformers and diffusion) and also different modality (one for images)..
Beta Was this translation helpful? Give feedback.
All reactions