feat: add per-model FP8 layerwise casting for VRAM reduction by Pfannkuchensack · Pull Request #8945 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-03-06T16:08:09Z

FP8 Layerwise Casting - Implementation

Summary

Add per-model fp8_storage option to model default settings that enables diffusers' enable_layerwise_casting() to store weights in FP8 (float8_e4m3fn) while casting to fp16/bf16 during inference. This reduces VRAM usage by ~50% per model with minimal quality loss.

Supported: SD1/SD2/SDXL/SD3, Flux, Flux2, CogView4, Z-Image, VAE (diffusers-based), ControlNet, T2IAdapter.
Not applicable: Text Encoders, LoRA, GGUF, BnB, custom classes.

Related Issues / Discussions

[enhancement]: need to support fp8 #7148
Based on approach from A big improvement for dtype casting system with fp8 storage type and manual cast AUTOMATIC1111/stable-diffusion-webui#14031
Uses diffusers' native enable_layerwise_casting() (available in diffusers 0.36.0)

QA Instructions

Set fp8_storage: true in a model's default_settings (via API or Model Manager UI)
Load the model and generate an image
Verify VRAM usage is reduced compared to normal loading
Verify image quality is acceptable (minimal degradation expected)
Verify Text Encoders are NOT affected (excluded by submodel type filter)
Verify non-CUDA devices gracefully ignore the setting

Test Matrix

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Add fp8_storage option to model default settings that enables diffusers' enable_layerwise_casting() to store weights in FP8 (float8_e4m3fn) while casting to fp16/bf16 during inference. This reduces VRAM usage by ~50% per model with minimal quality loss. Supported: SD1/SD2/SDXL/SD3, Flux, Flux2, CogView4, Z-Image, VAE (diffusers-based), ControlNet, T2IAdapter. Not applicable: Text Encoders, LoRA, GGUF, BnB, custom classes

Add per-model FP8 storage toggle in Model Manager default settings for both main models and control adapter models. When enabled, model weights are stored in FP8 format in VRAM (~50% savings) and cast layer-by-layer to compute precision during inference via diffusers' enable_layerwise_casting(). Backend: add fp8_storage field to MainModelDefaultSettings and ControlAdapterDefaultSettings, apply FP8 layerwise casting in all relevant model loaders (SD, SDXL, FLUX, CogView4, Z-Image, ControlNet, T2IAdapter, VAE). Gracefully skips non-ModelMixin models (custom checkpoint loaders, GGUF, BnB). Frontend: add FP8 Storage switch to model default settings panels with InformationalPopover, translation keys, and proper form handling.

Pfannkuchensack added 2 commits March 6, 2026 15:55

github-actions bot added python PRs that change python files backend PRs that change backend files frontend PRs that change frontend files labels Mar 6, 2026

ruff format

afe246e

lstein assigned JPPhoto Mar 7, 2026

lstein added this to Invoke - Community Roadmap Mar 7, 2026

lstein moved this to 6.13.x in Invoke - Community Roadmap Mar 7, 2026

lstein added the v6.13.x label Mar 7, 2026

JPPhoto added 2 commits March 9, 2026 09:13

Merge branch 'main' into feature/fp8-layerwise-casting

2262d8d

Merge branch 'main' into feature/fp8-layerwise-casting

5327df8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-model FP8 layerwise casting for VRAM reduction#8945

feat: add per-model FP8 layerwise casting for VRAM reduction#8945
Pfannkuchensack wants to merge 5 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feature/fp8-layerwise-casting

Pfannkuchensack commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Pfannkuchensack commented Mar 6, 2026

FP8 Layerwise Casting - Implementation

Summary

Related Issues / Discussions

QA Instructions

Test Matrix

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants