Update blog

cmdr2 · cmdr2 · commit 1fa7279f999a · 2025-11-19T11:14:14.000+05:30
diff --git a/content/blog/2025-11-19-1763531042.md b/content/blog/2025-11-19-1763531042.md
@@ -0,0 +1,25 @@
+---
+title: "Post from Nov 19, 2025"
+date: 2025-11-19T05:44:02
+slug: "1763531042"
+tags:
+  - sdkit
+  - ggml
+  - compiler
+---
+
+Following up to [the previous post](https://cmdr2.github.io/notes/2025/11/1762336053/) on sdkit v3's design:
+
+The initial experiments with [generating ggml from onnx models](https://cmdr2.github.io/notes/2025/11/1763464399/) were promising, and it looks like a fairly solid path forward. It produces numerically-identical results, and there's a clear path to reach performance-parity with [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) with a few basic optimizations (since both will eventually generate the same underlying ggml graph).
+
+But I think it's better to use the simpler option first, i.e. use `stable-diffusion.cpp` directly. It mostly meets the [design goals for sdkit v3](https://cmdr2.github.io/notes/2025/10/1760085894/) (after a bit of performance tuning). Everything else is premature optimization and scope bloat.
+
+Here's a possible roadmap instead:
+1. **sdkit v3** - Use `stable-diffusion.cpp`, and change whatever's necessary to support Easy Diffusion's requirements. Upstream the changes where possible.
+2. **sdkit v4** - Develop [graph-compiler](https://github.com/cmdr2/graph-compiler) further to generate ggml automatically from onnx (for the required models). This will bypass the need for hand-written models like in sd.cpp, and enable further high-level optimizations that'll run automatically on the graph.
+3. **sdkit v5** - Add automatic GPU kernel code-generation in `graph-compiler`, bypassing the need for hand-written kernels like in ggml. This will enable further low-level optimizations around tiling, fusion, memory transfers etc.
+
+The benefits are:
+1. Saves time. For e.g. I don't need to reimplement LoRA, ControlNet etc right away. Or write my own GPU kernel code-generator.
+2. Keeps delivering value to users ASAP. I don't need to wait for massive projects to finish before delivering value.
+3. Helps gain experience progressively. For e.g. the experience of manually optimizing the `Conv2D` operator bottleneck (and the overall graph) in `stable-diffusion.cpp` will be useful later (when building an automatic optimizer).