-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Absurdly slow training? #388
Comments
@tomhaydn Sure thing! I think i should have files for dataset still on my PC, or at least the config files for the dataset, so i will respond with the proper explaination once i figure it out. Just out of curiosity, what are your hardware that you're planning to train it on, and about your dataset? (How many hours of data?, conditional or not?) |
hi @DEBIHOOD thanks so much for your response. I have paid access to >500GB VRAM, I'm initially testing on 100 hours of music, then plan to increase it to 800 hours open source + 200 hours of licensed music that I have exclusive access to. I'm not concerned with conditional generation (i.e. full prompt based generation), but I do need to in some respect, similar to your requirements. I.e. using some combinitation of artist Name, Key, Tempo so that I can prompt those when generating. Since your other reponse I've managed to get the model training, yet to test the output, but it is incredibly slow. The docs show 30mins per epoch is standard, which, if they completed their full 500 would have taken a week to train. I'd love to connect and discuss some of my ideas if you're up for it! |
Sure! Do you think we should create another issue regarding that, or do you have any other ideas in mind? |
Hi @DEBIHOOD , I saw your post about training a new model from scratch with MusicGen. I’m curious about the results you mentioned. How did you generate the result figures you shared? Did you use any specific tool or script for visualization, and could you share some details on how you processed the output data to get those visualizations? Thanks in advance! |
Hi @MaoooMao |
I've been playing around with training my own model, unconditional, initialized from scratch(dim 512, num_heads 8, num_layers 8: Total 33.57 M params) with context size of 6 seconds (300 tokens), on the dataset that i collected(133 hours of music).
But the model progresses absurdly slow, i've been training for the past 20 hours, and generated samples sounds like it's doing almost nothing(or at least awfully bad).
I need to clarify, i don't have much experience of training transformers, but here, in Andrej Karpathy's nanoGPT, he claims that just within 3 minutes of training newly initialized character-level language model, he's been able to get some coherent looking text.
"we're training a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697"
Besides that, i've also trained LM with LLaMA's architecture using popular portable tool llama.cpp, it has train-text-from-scratch example, that allows to train(on CPU) newly initialized model with your settings(dim, heads, layers) on your .txt dataset file, so i've tried to test the same model settings, to train it on basic text dataset, so i've has setup the same architecture (dim 512, num_heads 8, num_layers 8, ctx 300) and after 20 minutes of training on CPU(!), i've got some basic model that was imitating the style of the text that it was trained on(i just used random json file as the dataset).
Here's the command that i'm using to start the training:
dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=my_nano dset=fullen/fullen133h dataset.batch_size=2 conditioner=none optim.ema.use=false dataset.num_workers=4 checkpoint.save_every=10 optim.updates_per_epoch=2000 dataset.valid.num_samples=2 dataset.evaluate.num_samples=2 dataset.generate.num_samples=10 generate.every=5 autocast=false dataset.min_segment_ratio=1.0 dataset.segment_duration=6 schedule.lr_scheduler=null optim.optimizer=adamw optim.lr=3e-4
And this is the samples that it's producing after 120 epochs, alongside with some of the dataset examples.
generated_samples_epoch120.zip
GPU that i'm training on: GTX 1060 6GB. (I know, but still, do 20 hours of training of this small model is "just not enough"?)
I just want to know, why it's evolves so slow, while the other 2 examples of my limited experience of training transformers was doing good progress after just ~30 minutes, and with musicgen, i'm getting nowhere after 20 hours.
Graphs seem to look good, as i said, it's just super slow.
The text was updated successfully, but these errors were encountered: