-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Thank you for sharing your results. In return I will share my own:
If you reformulate the code so that during the forward pass, it adds the decompressed MoRa weights into the nn.Linear weights, then you reduce the number of multiplies to the normal number. Furthermore, it becomes compatible with DoRa. In my testing, alternating between repeat and repeat_interleave (ReMoRa) improves on MoRa continued training, and ReMoRa + DoRa improves on ReMoRa.
Metadata
Metadata
Assignees
Labels
No labels