Large dataset running out of vram? #760
Replies: 2 comments 1 reply
-
@12bitmisfit your whole dataset should never go into VRAM at once. All that sits in VRAM is your model and optimizer parameters, a single input batch, and whatever needs to be remembered during a forward/backward pass. If you're running out even with batch size 1 it may be one of or a combination of:
Or bugs like:
BTW you can't limit the VRAM being used explicitly (this just doesn't make sense in this context). You can only do it implicitly by controlling things I mentioned above. Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
@12bitmisfit to add to @alexander-soare 's response, dataset itself should not impact GPU memory usage if used via dataset / dataloader setup as in the However, the number of classes does impact memory use. 357K classes (if that's what you actually have) is... insane. You will have to hit up some literature on how to deal with this situation as it will not be easy to deal with. It'll be approx 400-700M params just for the classifier, and you have to do softmax over a dim of 357K which is also non-trivial. You'll likely need some fancy model parallelism and approximate/hierarchical solutions |
Beta Was this translation helpful? Give feedback.
-
I'm trying to train on a very large dataset. I can train on smaller portions of it by moving them into another directory and pointing train.py at that but if I try to use the whole dataset at once it won't run as I don't have enough vram for it. I have an rtx 3090 with 24GB of vram so I can't really do much better vram wise without getting into ridiculous pricing. I know I have to define num_classes when I'm over 1000 sub folders in train/validation, and it runs just fine with fairly large datasets but I'd like to be able to train against all 357k sub folders I have.
Is there something I'm missing in how to limit the amount of vram being used? Is it trying to load all the images onto vram for training? Is there a way I can limit this? I've tried silly things like reducing batch size to 1 but the problem seems to be that I just can't fit the dataset into vram.
Beta Was this translation helpful? Give feedback.
All reactions