Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absurdly slow training? #388

Open
DEBIHOOD opened this issue Jan 15, 2024 · 6 comments
Open

Absurdly slow training? #388

DEBIHOOD opened this issue Jan 15, 2024 · 6 comments

Comments

@DEBIHOOD
Copy link

I've been playing around with training my own model, unconditional, initialized from scratch(dim 512, num_heads 8, num_layers 8: Total 33.57 M params) with context size of 6 seconds (300 tokens), on the dataset that i collected(133 hours of music).
But the model progresses absurdly slow, i've been training for the past 20 hours, and generated samples sounds like it's doing almost nothing(or at least awfully bad).

I need to clarify, i don't have much experience of training transformers, but here, in Andrej Karpathy's nanoGPT, he claims that just within 3 minutes of training newly initialized character-level language model, he's been able to get some coherent looking text.
"we're training a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697"

Besides that, i've also trained LM with LLaMA's architecture using popular portable tool llama.cpp, it has train-text-from-scratch example, that allows to train(on CPU) newly initialized model with your settings(dim, heads, layers) on your .txt dataset file, so i've tried to test the same model settings, to train it on basic text dataset, so i've has setup the same architecture (dim 512, num_heads 8, num_layers 8, ctx 300) and after 20 minutes of training on CPU(!), i've got some basic model that was imitating the style of the text that it was trained on(i just used random json file as the dataset).

Here's the command that i'm using to start the training:
dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=my_nano dset=fullen/fullen133h dataset.batch_size=2 conditioner=none optim.ema.use=false dataset.num_workers=4 checkpoint.save_every=10 optim.updates_per_epoch=2000 dataset.valid.num_samples=2 dataset.evaluate.num_samples=2 dataset.generate.num_samples=10 generate.every=5 autocast=false dataset.min_segment_ratio=1.0 dataset.segment_duration=6 schedule.lr_scheduler=null optim.optimizer=adamw optim.lr=3e-4

And this is the samples that it's producing after 120 epochs, alongside with some of the dataset examples.
generated_samples_epoch120.zip
GPU that i'm training on: GTX 1060 6GB. (I know, but still, do 20 hours of training of this small model is "just not enough"?)

I just want to know, why it's evolves so slow, while the other 2 examples of my limited experience of training transformers was doing good progress after just ~30 minutes, and with musicgen, i'm getting nowhere after 20 hours.

Graphs seem to look good, as i said, it's just super slow.
Tensorboard

@tomhaydn
Copy link

Hey @DEBIHOOD I'm just reaching out as a stab in the dark to ask how you structured your dataset. The docs aren't very clear and you seem to have managed to get it training on your own dataset. Please if you have a few minutes could you assist me with my issue:

#462

@DEBIHOOD
Copy link
Author

@tomhaydn Sure thing! I think i should have files for dataset still on my PC, or at least the config files for the dataset, so i will respond with the proper explaination once i figure it out.
Regarding my own problem about it being slow to train, i've not done much more tests after that, but i guess it's just how transformers is in general, and even this config with "about a size of a insects brain" (34M params), it's gonna be slow to train without having a few high end GPUs. And the success of other things that i mentioned in my issue is due to the fact that there, single word was tokenized to maybe 1-2 tokens, so it's easier to get something coherent when we're working on that scale of tokens, but musicgen is operating at 50 tokens per 1 second of audio/music generated, so it should learn a lot more structured and long-dependent relationship of tokens, before it actually starts to sound anything like music/your dataset.

Just out of curiosity, what are your hardware that you're planning to train it on, and about your dataset? (How many hours of data?, conditional or not?)

@tomhaydn
Copy link

hi @DEBIHOOD thanks so much for your response. I have paid access to >500GB VRAM, I'm initially testing on 100 hours of music, then plan to increase it to 800 hours open source + 200 hours of licensed music that I have exclusive access to.

I'm not concerned with conditional generation (i.e. full prompt based generation), but I do need to in some respect, similar to your requirements. I.e. using some combinitation of artist Name, Key, Tempo so that I can prompt those when generating.

Since your other reponse I've managed to get the model training, yet to test the output, but it is incredibly slow. The docs show 30mins per epoch is standard, which, if they completed their full 500 would have taken a week to train.

I'd love to connect and discuss some of my ideas if you're up for it!

@DEBIHOOD
Copy link
Author

@tomhaydn

I'd love to connect and discuss some of my ideas if you're up for it!

Sure! Do you think we should create another issue regarding that, or do you have any other ideas in mind?
There's some stuff about training that i have discovered in my own experiments that probably will be helpful.

@MaoooMao
Copy link

MaoooMao commented Sep 2, 2024

Hi @DEBIHOOD , I saw your post about training a new model from scratch with MusicGen. I’m curious about the results you mentioned. How did you generate the result figures you shared? Did you use any specific tool or script for visualization, and could you share some details on how you processed the output data to get those visualizations?

Thanks in advance!

@DEBIHOOD
Copy link
Author

DEBIHOOD commented Sep 2, 2024

Hi @MaoooMao
I used tensorboard to view the graphs.
It isn't mentioned in the docs, but with every training run, training script also generates tensorboard file that contains all of that.
You can look up the way to open this file in tensorboard docs, it's just one command in the terminal.
If you have any more questions, feel free to ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants