Skip to content

Commit 19bb630

Browse files
committed
links to floyd-warshall
1 parent a05f571 commit 19bb630

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

content/english/hpc/algorithms/matmul.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -474,9 +474,9 @@ for (int k = 0; k < n; k++)
474474
d[i][j] = min(d[i][j], d[i][k] + d[k][j]);
475475
```
476476

477-
Interestingly, vectorizing the distance product and executing it $O(\log n)$ times in $O(n^3 \log n)$ total operations is faster than naively executing the Floyd-Warshall algorithm in $O(n^3)$ operations, although not by a lot.
477+
Interestingly, similarly vectorizing the distance product and executing it $O(\log n)$ times ([or possibly fewer](https://arxiv.org/pdf/1904.01210.pdf)) in $O(n^3 \log n)$ total operations is faster than naively executing the Floyd-Warshall algorithm in $O(n^3)$ operations, although not by a lot.
478478

479-
As an exercise, try to speed up this "for-for-for" computation. It is harder to do than in the matrix multiplication case because now there is a logical dependency between the iterations, and you need to perform updates in a particular order, but it is still possible to design a similar kernel and a block iteration order that achieves a 30-50x total speedup.
479+
As an exercise, try to speed up this "for-for-for" computation. It is harder to do than in the matrix multiplication case because now there is a logical dependency between the iterations, and you need to perform updates in a particular order, but it is still possible to design [a similar kernel and a block iteration order](https://github.com/sslotin/amh-code/blob/main/floyd/blocked.cc) that achieves a 30-50x total speedup.
480480

481481
## Acknowledgements
482482

0 commit comments

Comments
 (0)