Update blog

cmdr2 · cmdr2 · commit d3c5d1b5cc64 · 2025-08-19T16:18:35.000+05:30
diff --git a/content/blog/2024-09-04-1725463249.md b/content/blog/2024-09-04-1725463249.md
@@ -1,11 +1,14 @@
 ---
-layout: post
 title: "Post from Sep 04, 2024"
-date: 2024-09-04 15:20:49  +0000
+date: 2024-09-04T15:20:49
 slug: 1725463249
-tags: [easydiffusion, ai, lab, performance, featured]
+tags:
+  - easydiffusion
+  - ai
+  - lab
+  - performance
+  - featured
 ---
-
 **tl;dr**: Explored a possible optimization for Flux with `diffusers` when using `enable_sequential_cpu_offload()`. It did not work.
 
 While trying to use Flux (nearly 22 GB of weights) with `diffusers` on a 12 GB graphics card, I noticed that it barely used any GPU memory when using `enable_sequential_cpu_offload()`. And it was super slow. It turns out that the largest module in Flux's transformer model is around 108 MB, so because diffusers streams modules one-at-a-time, the peak VRAM usage never crossed above a few hundred MBs.
diff --git a/content/blog/2024-10-16-1729102225.md b/content/blog/2024-10-16-1729102225.md
@@ -1,11 +1,16 @@
 ---
-layout: post
 title: "Post from Oct 16, 2024"
-date: 2024-10-16 18:10:25  +0000
+date: 2024-10-16T18:10:25
 slug: 1729102225
-tags: [stable-diffusion, c++, cuda, easydiffusion, lab, performance, featured]
+tags:
+  - stable-diffusion
+  - c++
+  - cuda
+  - easydiffusion
+  - lab
+  - performance
+  - featured
 ---
-
 **tl;dr** - *Today, I worked on using stable-diffusion.cpp in a simple C++ program. As a linked library, as well as compiling sd.cpp from scratch (with and without CUDA). The intent was to get a tiny and fast-starting executable UI for Stable Diffusion working. Also, ChatGPT is very helpful!*
 
 ## Part 1: Using sd.cpp as a library
diff --git a/content/blog/2024-11-19-1732043895.md b/content/blog/2024-11-19-1732043895.md
@@ -1,11 +1,11 @@
 ---
-layout: post
 title: "Post from Nov 19, 2024"
-date: 2024-11-19 19:18:15  +0000
+date: 2024-11-19T19:18:15
 slug: 1732043895
-tags: [easydiffusion, stable-diffusion]
+tags:
+  - easydiffusion
+  - stable-diffusion
 ---
-
 Spent a few days getting a C++ based version of Easy Diffusion working, using stable-diffusion.cpp. I'm working with a fork of stable-diffusion.cpp [here](https://github.com/cmdr2/stable-diffusion.cpp), to add a few changes like per-step callbacks, live image previews etc.
 
 It doesn't have a UI yet, and currently hardcodes a model path. It exposes a RESTful API server (written using the `Crow` C++ library), and uses a simple task manager that runs image generation tasks on a thread. The generated images are available at an API endpoint, and it shows the binary JPEG/PNG image (instead of base64 encoding).
diff --git a/content/blog/2024-11-21-1732202276.md b/content/blog/2024-11-21-1732202276.md
@@ -1,11 +1,12 @@
 ---
-layout: post
 title: "Post from Nov 21, 2024"
-date: 2024-11-21 15:17:56  +0000
+date: 2024-11-21T15:17:56
 slug: 1732202276
-tags: [easydiffusion, stable-diffusion, c++]
+tags:
+  - easydiffusion
+  - stable-diffusion
+  - c++
 ---
-
 Spent some more time on the [v4 experiments](https://github.com/cmdr2/easy-diffusion4) for Easy Diffusion (i.e. C++ based, fast-startup, lightweight). `stable-diffusion.cpp` is missing a few features, which will be necessary for Easy Diffusion's typical workflow. I wasn't keen on forking stable-diffusion.cpp, but it's probably faster to work on [a fork](https://github.com/cmdr2/stable-diffusion.cpp) for now.
 
 For now, I've added live preview and per-step progress callbacks (based on a few pending pull-requests on sd.cpp). And protection from `GGML_ASSERT` killing the entire process. I've been looking at the ability to load individual models (like the vae) without needing to reload the entire SD model.
diff --git a/content/blog/2024-12-14-1734205658.md b/content/blog/2024-12-14-1734205658.md
@@ -1,11 +1,13 @@
 ---
-layout: post
 title: "Post from Dec 14, 2024"
-date: 2024-12-14 19:47:38  +0000
+date: 2024-12-14T19:47:38
 slug: 1734205658
-tags: [easydiffusion, ui, design, v4]
+tags:
+  - easydiffusion
+  - ui
+  - design
+  - v4
 ---
-
 Worked on a few UI design ideas for Easy Diffusion v4. I've uploaded the work-in-progress mockups at [https://github.com/easydiffusion/files](https://github.com/easydiffusion/files).
 
 So far, I've mocked out the design for the outer skeleton. That is, the new tabbed interface, the status bar, and the unified main menu. I also worked on how they would look like on mobile devices.
diff --git a/content/blog/2024-12-17-1734433390.md b/content/blog/2024-12-17-1734433390.md
@@ -1,11 +1,12 @@
 ---
-layout: post
 title: "Post from Dec 17, 2024"
-date: 2024-12-17 11:03:10  +0000
+date: 2024-12-17T11:03:10
 slug: 1734433390
-tags: [easydiffusion, v4, ui]
+tags:
+  - easydiffusion
+  - v4
+  - ui
 ---
-
 Notes on two directions for ED4's UI that I'm unlikely to continue on.
 
 One is to start a desktop app with a full-screen webview (for the app UI). The other is writing the tabbed browser-like shell of ED4 in a compiled language (like Go or C++) and loading the contents of the tabs as regular webpages (by using webviews). So it would load URLs like `http://localhost:9000/ui/image_editor` and `http://localhost:9000/ui/settings` etc.
diff --git a/content/blog/2025-01-03-1735918711.md b/content/blog/2025-01-03-1735918711.md
@@ -1,11 +1,12 @@
 ---
-layout: post
 title: "Post from Jan 03, 2025"
-date: 2025-01-03 15:38:31  +0000
+date: 2025-01-03T15:38:31
 slug: 1735918711
-tags: [easydiffusion, ui, v4]
+tags:
+  - easydiffusion
+  - ui
+  - v4
 ---
-
 Spent a few days prototyping a UI for Easy Diffusion v4. Files are at [this repo](https://github.com/easydiffusion/files/blob/main/ED4-ui-design/prototype).
 
 The main focus was to get a simple but pluggable UI, that was backed by a reactive data model, and to allow splitting the codebase into individual components (with their own files). And require only a text editor and a browser to develop, i.e. no compilation or nodejs-based developer experiences.
diff --git a/content/blog/2025-01-04-1736020626.md b/content/blog/2025-01-04-1736020626.md
@@ -1,11 +1,12 @@
 ---
-layout: post
 title: "Post from Jan 04, 2025"
-date: 2025-01-04 19:57:06  +0000
+date: 2025-01-04T19:57:06
 slug: 1736020626
-tags: [easydiffusion, amd, directml]
+tags:
+  - easydiffusion
+  - amd
+  - directml
 ---
-
 Spent most of the day doing some support work for Easy Diffusion, and experimenting with [torch-directml](https://pypi.org/project/torch-directml/) for AMD support on Windows.
 
 From the initial experiments, torch-directml seems to work properly with Easy Diffusion. I ran it on my NVIDIA card, and another user ran it on their AMD Radeon RX 7700 XT.
diff --git a/content/blog/2025-01-13-1736779606.md b/content/blog/2025-01-13-1736779606.md
@@ -1,11 +1,13 @@
 ---
-layout: post
 title: "Post from Jan 13, 2025"
-date: 2025-01-13 14:46:46  +0000
+date: 2025-01-13T14:46:46
 slug: 1736779606
-tags: [easydiffusion, torchruntime, torch, ml]
+tags:
+  - easydiffusion
+  - torchruntime
+  - torch
+  - ml
 ---
-
 Spent the last few days writing [torchruntime](https://github.com/easydiffusion/torchruntime), which will automatically install the correct torch distribution based on the user's OS and graphics card. This package was written by extracting this logic out of Easy Diffusion, and refactoring it into a cleaner implementation (with tests).
 
 It can be installed (on Win/Linux/Mac) using `pip install torchruntime`.
diff --git a/content/blog/2025-01-17-1737134382.md b/content/blog/2025-01-17-1737134382.md
@@ -1,11 +1,13 @@
 ---
-layout: post
 title: "Post from Jan 17, 2025"
-date: 2025-01-17 17:19:42  +0000
+date: 2025-01-17T17:19:42
 slug: 1737134382
-tags: [rocm, pytorch, easydiffusion, torchruntime]
+tags:
+  - rocm
+  - pytorch
+  - easydiffusion
+  - torchruntime
 ---
-
 *Continued in [Part 2](https://cmdr2.github.io/notes/2025/01/1737566382/), where I figured out how to include the required libraries in the wheel.*
 
 Spent all of yesterday trying to compile `pytorch` with the compile-time `PYTORCH_ROCM_ARCH=gfx803` environment variable.
diff --git a/content/blog/2025-01-22-1737566382.md b/content/blog/2025-01-22-1737566382.md
@@ -1,11 +1,13 @@
 ---
-layout: post
 title: "Post from Jan 22, 2025"
-date: 2025-01-22 17:19:42  +0000
+date: 2025-01-22T17:19:42
 slug: 1737566382
-tags: [rocm, pytorch, easydiffusion, torchruntime]
+tags:
+  - rocm
+  - pytorch
+  - easydiffusion
+  - torchruntime
 ---
-
 *Continued from [Part 1](https://cmdr2.github.io/notes/2025/01/1737134382/).*
 
 Spent a few days figuring out how to compile binary wheels of PyTorch and include all the necessary libraries (ROCm libs or CUDA libs).
diff --git a/content/blog/2025-01-27-1738011692.md b/content/blog/2025-01-27-1738011692.md
@@ -1,11 +1,11 @@
 ---
-layout: post
 title: "Post from Jan 27, 2025"
-date: 2025-01-27 21:01:32  +0000
+date: 2025-01-27T21:01:32
 slug: 1738011692
-tags: [easydiffusion, sdkit]
+tags:
+  - easydiffusion
+  - sdkit
 ---
-
 Worked on adding support for DirectML in sdkit. This allows AMD GPUs and Integrated GPUs to generate images on Windows.
 
 DirectML seems like it's really inefficient with memory though. So for now it only manages to generate images using SD 1.5. XL and larger models fail to generate, even though I have a 12 GB of VRAM in my graphics card.
diff --git a/content/blog/2025-01-28-1738102652.md b/content/blog/2025-01-28-1738102652.md
@@ -1,11 +1,13 @@
 ---
-layout: post
 title: "Post from Jan 28, 2025"
-date: 2025-01-28 22:17:32  +0000
+date: 2025-01-28T22:17:32
 slug: 1738102652
-tags: [easydiffusion, sdkit, freebird, worklog]
+tags:
+  - easydiffusion
+  - sdkit
+  - freebird
+  - worklog
 ---
-
 Continued to test and fix issues in sdkit, after the change to support DirectML. The change is fairly intrusive, since it removes direct references to `torch.cuda` with a layer of abstraction.
 
 Fixed a few regressions, and it now passes all the regression tests for CPU and CUDA support (i.e. existing users). Will test for DirectML next, although it will fail (with out-of-memory) for anything but the simplest tests (since DirectML is quirky with memory allocation).
diff --git a/content/blog/2025-02-10-1739186602.md b/content/blog/2025-02-10-1739186602.md
@@ -1,11 +1,12 @@
 ---
-layout: post
 title: "Post from Feb 10, 2025"
-date: 2025-02-10 11:23:22  +0000
+date: 2025-02-10T11:23:22
 slug: 1739186602
-tags: [easydiffusion, torchruntime, sdkit]
+tags:
+  - easydiffusion
+  - torchruntime
+  - sdkit
 ---
-
 Spent the last week or two getting [torchruntime](https://github.com/easydiffusion/torchruntime/) fully integrated into Easy Diffusion, and making sure that it handles all the edge-cases.
 
 Easy Diffusion now uses `torchruntime` to automatically install the best-possible version of `torch` (on the users' computer) and support a wider variety of GPUs (as well as older GPUs). And it uses a GPU-agnostic device API, so Easy Diffusion will automatically support additional GPUs when they are supported by `torchruntime`.
diff --git a/content/blog/2025-02-10-1739186837.md b/content/blog/2025-02-10-1739186837.md
@@ -1,9 +1,15 @@
 ---
-layout: post
 title: "Post from Feb 10, 2025"
-date: 2025-02-10 11:27:17  +0000
+date: 2025-02-10T11:27:17
 slug: 1739186837
-tags: [easydiffusion, sdkit, amd, torchruntime, windows, intel, integrated, directml]
+tags:
+  - easydiffusion
+  - sdkit
+  - amd
+  - torchruntime
+  - windows
+  - intel
+  - integrated
+  - directml
 ---
-
 Easy Diffusion (and `sdkit`) now also support AMD on Windows automatically (using DirectML), thanks to integrating with [torchruntime](https://github.com/easydiffusion/torchruntime/). It also supports integrated GPUs (Intel and AMD) on Windows, making Easy Diffusion faster on PCs without dedicated graphics cards.
diff --git a/content/blog/2025-03-04-1741122446.md b/content/blog/2025-03-04-1741122446.md
@@ -1,11 +1,10 @@
 ---
-layout: post
 title: "Post from Mar 04, 2025"
-date: 2025-03-04 21:07:26  +0000
+date: 2025-03-04T21:07:26
 slug: 1741122446
-tags: [easydiffusion]
+tags:
+  - easydiffusion
 ---
-
 Upgraded the default version of Easy Diffusion to Python 3.9. Newer versions of torch don't support Python 3.8, so this became urgent after the release of NVIDIA's 50xx series GPUs.
 
 I choose 3.9 as a temporary fix (instead of a newer Python version), since it had the least amount of package conflicts. The future direction of Easy Diffusion's backend is unclear right now - there are a bunch of possible paths. So I didn't want to spend too much time on this. I also wanted to minimize the risk to existing users.
diff --git a/content/blog/2025-06-17-1750136474.md b/content/blog/2025-06-17-1750136474.md
@@ -1,11 +1,11 @@
 ---
-layout: post
 title: "Post from Jun 17, 2025"
-date: 2025-06-17 05:01:14  +0000
+date: 2025-06-17T05:01:14
 slug: 1750136474
-tags: [easydiffusion, blog]
+tags:
+  - easydiffusion
+  - blog
 ---
-
 Development update for Easy Diffusion - It's chugging along in starts and stops. Broadly, there are three tracks:
 
 - Maintenance: The past few months have seen increased support for AMD, Intel and integrated GPUs. This includes AMD on Windows. Added support for the new AMD 9060/9070 cards last week, and the new NVIDIA 50xx cards in March.