Skip to content

Commit 627d456

Browse files
committed
Update blog
1 parent 78a1cf4 commit 627d456

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

content/blog/2025-10-27-1761560082.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This analogy can help understand the scale and performance penalty for data tran
1818

1919
For e.g. reading constantly from the Global Memory is like driving between the factory and the warehouse outside the city each time (with the traffic of city roads). This is much slower than going to the shed inside the factory (i.e. Shared Memory), and much much slower than just sticking your hand into the tray next to your stamping machine (i.e. Registers). And reading from the Host Memory (CPU) is like taking an overnight trip to another city.
2020

21-
A bit of detail: the warehouse of each factory (i.e. Shared Memory) isn't a single building, but is actually a collection of individual sheds called 'banks' (e.g. 32 sheds/banks). Each shed has its own entry, and an employee stands at the entrance of the shed to service your request.
21+
A bit of detail: the warehouse of each factory (i.e. Shared Memory) isn't a single building, but is actually a collection of individual sheds called 'banks' (e.g. 32 sheds/banks). Each shed has its own entry, and an employee stands at the entrance of each shed to service your request.
2222

2323
Therefore the job of running a computation graph (like ONNX) efficiently on GPU(s) is like planning the logistics of a manufacturing company. You've got raw materials in the main warehouse that you need to transfer between cities, and store/process/transfer artifacts across different factories and machines. You need to make sure that:
2424
- the production process follows the chart laid out in the computation graph

0 commit comments

Comments
 (0)