You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No luck with this repo, "bitsandbytes" dependency is heavily relying on CUDA.
But there is a repo for cpu inference, just change the prompts to prompts[0], so it doesn't crash with max_batch_size=1.
It takes more than 10 minutes to produce output with max_gen_len=20, even GPT-J 5B took me around a minute on CPU.
I also tried to make an MPS port with gpu acceleration, it works faster, but the output is not good enough imo, not sure if it is always good on cpu or if I just got lucky on my first generation. UPDATE: the model gives good outputs with python3.10 + pytorch-nightly
Actually, I was wrong. After I tried my port with a higher version of python+pytorch, the outputs were as good as the cpu ones, I am happy that it worked after all!
M1 / M2 32GB … 128GB any hopes?
The text was updated successfully, but these errors were encountered: