Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/faster rms norm #182

Closed
wants to merge 9 commits into from
Closed

Conversation

S1ro1
Copy link
Contributor

@S1ro1 S1ro1 commented Aug 30, 2024

Summary

Implements partial aggregation in rms_norm, similar to that in layer_norm, as described in #179 .

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@S1ro1
Copy link
Contributor Author

S1ro1 commented Aug 30, 2024

i have made gradients w.r.t weights work properly, except 1 singular test, which just calculates random number of elements wrong with no apparent cause (every other call the number and indices of elements are different), however it's always the same test (test/transformers/test_rms_norm.py::test_correctness[LlamaRMSNorm-0.0-llama-dtype1-0.2-0.2-16-1024-4096]), so help would be appreciated to fix this issue as im getting clueless. (setting the seed makes the elements wrong be deterministic, however still the same issue)

@lancerts
Copy link
Collaborator

lancerts commented Sep 4, 2024

@S1ro1 did a fix on the main branch rmsnorm, can you check if the same error persist after rebase? ty

@S1ro1
Copy link
Contributor Author

S1ro1 commented Sep 4, 2024

E           Mismatch at index (150,): tensor1[(150,)] = 0.076171875, tensor2[(150,)] = -0.2099609375
E           Mismatch at index (806,): tensor1[(806,)] = -1.6953125, tensor2[(806,)] = -1.1875
E           Mismatch at index (2538,): tensor1[(2538,)] = -1.3125, tensor2[(2538,)] = -0.80078125
E           Mismatch at index (3000,): tensor1[(3000,)] = -0.7578125, tensor2[(3000,)] = -0.373046875
E           Mismatch at index (3927,): tensor1[(3927,)] = -0.1591796875, tensor2[(3927,)] = -0.71484375

@lancerts the error stays the same, just few elements mismatched in always 1 and the test same. I have no clue how to fix that to be fair, so I guess you can unassign me, I'm working on this on/off for 3/4 days but haven't got a sinlge step forward since the bug occurance. I guess I took a too big bite. The branch should have some base work done on the issue, but maybe going from scratch might be a better issue.

@lancerts lancerts closed this Oct 2, 2024
@lancerts
Copy link
Collaborator

lancerts commented Oct 2, 2024

#255 resolved by this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants