A production-ready Docker setup for ComfyUI that unlocks the full potential of NVIDIA Blackwell GPUs (RTX 50 series) through 4-bit quantization with NVFP4.
-
Updated
Jan 28, 2026 - Dockerfile
A production-ready Docker setup for ComfyUI that unlocks the full potential of NVIDIA Blackwell GPUs (RTX 50 series) through 4-bit quantization with NVFP4.
RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.
Sample application generated using Opencode and Ollama
NVFP4 LoRA fine-tuning and serving on a single NVIDIA DGX Spark (GB10, 128 GB UMA). Fused Triton dequant; multi-family (Nemotron-3, Mistral-Small-4, Qwen3.x).
🚀 Accelerate image generation with ComfyUI's Docker for NVIDIA Blackwell GPUs, optimizing speed and memory usage through NVFP4 support.
Empirical characterization of ICICLE NTT on consumer NVIDIA Blackwell (RTX 5070, sm_120), with a prototype for the digit-reversal bottleneck.
Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090
Edge AI inference runtime: scheduler, memory manager, CUDA graph engine, KV cache, MoE dispatch
NCU-driven autonomous kernel optimization agent: profile → identify bottleneck → propose variant → compile → benchmark
Reproducible MoE inference benchmarks for RTX Spark and RTX 5090: flash decode, grouped GEMM, end-to-end generation
Native C++/CUDA and CuTe DSL kernel library for edge MoE inference: flash decode, sync-free GroupGEMM+SwiGLU, head_dim=512 attention
Add a description, image, and links to the nvidia-blackwell topic page so that developers can more easily learn about it.
To associate your repository with the nvidia-blackwell topic, visit your repo's landing page and select "manage topics."