Skip to content

Commit 1fa7279

Browse files
committed
Update blog
1 parent 77fc012 commit 1fa7279

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: "Post from Nov 19, 2025"
3+
date: 2025-11-19T05:44:02
4+
slug: "1763531042"
5+
tags:
6+
- sdkit
7+
- ggml
8+
- compiler
9+
---
10+
11+
Following up to [the previous post](https://cmdr2.github.io/notes/2025/11/1762336053/) on sdkit v3's design:
12+
13+
The initial experiments with [generating ggml from onnx models](https://cmdr2.github.io/notes/2025/11/1763464399/) were promising, and it looks like a fairly solid path forward. It produces numerically-identical results, and there's a clear path to reach performance-parity with [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) with a few basic optimizations (since both will eventually generate the same underlying ggml graph).
14+
15+
But I think it's better to use the simpler option first, i.e. use `stable-diffusion.cpp` directly. It mostly meets the [design goals for sdkit v3](https://cmdr2.github.io/notes/2025/10/1760085894/) (after a bit of performance tuning). Everything else is premature optimization and scope bloat.
16+
17+
Here's a possible roadmap instead:
18+
1. **sdkit v3** - Use `stable-diffusion.cpp`, and change whatever's necessary to support Easy Diffusion's requirements. Upstream the changes where possible.
19+
2. **sdkit v4** - Develop [graph-compiler](https://github.com/cmdr2/graph-compiler) further to generate ggml automatically from onnx (for the required models). This will bypass the need for hand-written models like in sd.cpp, and enable further high-level optimizations that'll run automatically on the graph.
20+
3. **sdkit v5** - Add automatic GPU kernel code-generation in `graph-compiler`, bypassing the need for hand-written kernels like in ggml. This will enable further low-level optimizations around tiling, fusion, memory transfers etc.
21+
22+
The benefits are:
23+
1. Saves time. For e.g. I don't need to reimplement LoRA, ControlNet etc right away. Or write my own GPU kernel code-generator.
24+
2. Keeps delivering value to users ASAP. I don't need to wait for massive projects to finish before delivering value.
25+
3. Helps gain experience progressively. For e.g. the experience of manually optimizing the `Conv2D` operator bottleneck (and the overall graph) in `stable-diffusion.cpp` will be useful later (when building an automatic optimizer).

0 commit comments

Comments
 (0)