-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this actually work when unet is in cpu? #44
Comments
Well, if you load it into CPU, its simply on CPU and it doesnt work there. But you can load model into GPU and let it offload 100% of it via virtual vram size setting. And for example SDXL based models work even when offloaded fully to system memory. FLUX should too, if you have fast system ram, it can probably work really well. |
@Mescalamba Perhaps I'm missing something - isn't the virtual vram setting available only for distorch, ie gguf? As stated, the model's an FP8 finetune. I'd love to convert it but city's process as it stands is far from trivial. |
fp8 finetune can be probably transformed directly to Q8, just there are no tools for that, think I will make one, cause it happens often that I want to GGUF something and cant cause its fp8 Virtual ram works only on GGUF, you are right. I wasnt exactly sure what you aiming for with loading diffusion model into CPU/RAM. That obviously wont work. |
My goal is not actually to run on cpu, but to temporarily move a model there in order to free memory for other programs (eg ollama) and quickly move it back once they're done.
As far as I can tell, this is only possible when the model is fully connected to an actual gen workflow, even a basic dummy one with 1 step - anything less and the model is completely unloaded.
I tried such a dummy workflow with a flux FP8 finetune as unet on cpu, and clip + t5x fp8 on gpu. The move is performed successfully, however when I reach the ksampler stage, I consistently receive the following errors:
The model can be moved back to the gpu quickly at any point afterwards (so for a separate process it works), but due to the error, this dummy offloading is inadequate as a step in a bigger workflow. Running something else seems to trigger a full reload, though the model doesn't seem to be removed from ram.
What do you suggest?
The text was updated successfully, but these errors were encountered: