Skip to content

feat: update spmv-hypersparse to support wse-3#23

Open
david-r-cox wants to merge 1 commit intoCerebras:masterfrom
integrated-reasoning:david-r-cox/hypersparse-spmv-wse3
Open

feat: update spmv-hypersparse to support wse-3#23
david-r-cox wants to merge 1 commit intoCerebras:masterfrom
integrated-reasoning:david-r-cox/hypersparse-spmv-wse3

Conversation

@david-r-cox
Copy link
Copy Markdown

@david-r-cox david-r-cox commented Apr 21, 2026

Summary

Port benchmarks/spmv-hypersparse to WSE-3.

Details

  • Remapped SpMV input queues from {4, 1, 6, 7} to {2, 3, 4, 5} to avoid queue 1, which WSE-3 memcpy uses.
  • Expanded SpMV output queues from 2 to 4 so north/south/west/east TX paths use distinct output queues.
  • Added WSE-3 output queue initialization with the appropriate fabric colors.
  • Removed allreduce2R1E from the layout/kernel path, freeing queue 5 for SpMV.
  • Replaced allreduce-based sync with local timestamp-based sync.
  • Removed deprecated .fabric_color fields from fabout DSDs; WSE-3 colors now come from queue initialization.
  • Replaced commands_wse2.sh with commands_wse3.sh which uses --arch wse3.

Testing

Via simulation on SDK version 2.10.0.

Original WSE-2 spmv-hypersparse host logs:
david in 🌐 nixos-0 in sdk-examples/benchmarks/spmv-hypersparse on  master [$+] via 🐍 v3.13.8 via ❄️  impure (nix-shell-env) on ☁️  (us-west-2) ζ ➜ ./commands_wse2.sh
[INFO] === Beginning compilation ===
[INFO] Compilation successful
[INFO] === Calling SDK python ===
cslc = cslc
width_west_buf = 0
width_east_buf = 0
channels = 1
width = 4, height = 4
infile_mtx = ./data/rmat4.4x4.lb.mtx
Load matrix A, 16-by-16 with 108 nonzeros
prepare the structure for spmv kernel: 0.012461423873901367s
Generating reference y = A*x ...
Total memory use per PE = 200 bytes = 0.1953125 KB
fabric_width = 11, fabric_height = 6
core_fabric_offset_x = 4, core_fabric_offset_y = 1
store ELFs and log files in the folder  out
[csl_compile_core] use pre-compile ELFs
Compilation done in 8.106231689453125e-06s
*** Load done in 3.8623809814453125e-05s
step 1: enable tsc counter to sample the clock
step 2: copy the structure of A and vector x to the device
step 3: sync all PEs to sample the reference clock
step 4: tic() records time_start
step 5: spmv
step 5: toc() records time_end
step 6: prepare (time_start, time_end)
step 7: fetch the timing time_buf_u16[6] = (time_start, time_end), type = u16
step 8: fetch the output vector y of type f32
step 9: prepare reference clock
step 10: D2H reference clock
*** Run done in 4.725285530090332s
cycles_send = 6308 cycles
time_send = 7.421176470588235 us
bandwidth = 118.57958148383005 MB/S
Comparing result with reference...
reference[16]:
[7.4496174 4.7344646 2.8157272 0.9724683 6.1381197 3.54332   3.2193651
 2.5113463 5.151715  4.040394  3.164474  2.6652527 6.0831847 6.714239
 3.694953  1.3090382]
result   [16]:
[7.4496174 4.7344646 2.8157275 0.9724683 6.1381197 3.5433197 3.219365
 2.5113463 5.151715  4.040394  3.164474  2.6652527 6.0831842 6.714239
 3.694953  1.3090382]
[[ Absolute diff: 1.1920928955078125e-06 ]]
[[ Average diff : 7.450580596923828e-08 ]]
[[ Result within tolerance 1e-08: PASS ]]
[[ Result within tolerance 1e-08: PASS ]]
Original WSE-2 spmv-hypersparse simulator logs:
david in 🌐 nixos-0 in sdk-examples/benchmarks/spmv-hypersparse on  master [$+?] via 🐍 v3.13.8 via ❄️  impure (nix-shell-env) on ☁️  (us-west-2) took 10s ζ ➜ cat sim.log
@0 GITREV=4586d3f0d8b1e12bc435ce5c7e6436ee6ae2b9b8, GITCLEAN=1
@0 Create from elfs
@0 sizeof(struct hwtile):     209888 bytes
@0 sizeof(      instr_t):       1584 bytes
@0 sizeof(     decode_t):        148 bytes
@0 sizeof(     memreq_t):         24 bytes
@0 sizeof(        dsr_t):        304 bytes
@0 sizeof(      exops_t):       6584 bytes
@0 sizeof(hwtile_follow):     207840 bytes
@0 sizeof(struct hwtile):     209888 bytes
@0 sizeof(      instr_t):       1584 bytes
@0 sizeof(     decode_t):        148 bytes
@0 sizeof(     memreq_t):         24 bytes
@0 sizeof(        dsr_t):        304 bytes
@0 sizeof(      exops_t):       6584 bytes
@0 sizeof(hwtile_follow):     207840 bytes
@0 dimX=11, dimY=6
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/generated/default.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/generated/coord.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_7_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_15_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_6_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_14_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_4_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_8_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_9_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_3_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_13_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_0_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_12_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_5_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_11_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_2_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_1_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_10_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_5_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_1_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_2_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_3_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_0_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_4_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_6_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_5_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_7_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_1_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_4_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_3_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_2_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_0_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/generated/MEMCPY_XY_ROUTES.elf
@0 Invocation key is: sim404D33EEF8A621135E308BD153B4D518
@0 Plan key is: plan34A3D4139F5F8BE0CD504134808A3823
@0 SimTraceCtf, flags:  wavelets=0 landing=1 inst=1 switch_positions=0 stalls=1
@0 P0.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P2.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P3.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P4.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P5.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P6.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P7.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P8.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P9.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P10.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P0.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P1.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.1 (nul) tile trace flag: inst_trace,landing,stalls
@0 P0.2 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.2 (nul) tile trace flag: inst_trace,landing,stalls
@0 P0.3 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P0.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P1.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P0.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P2.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P3.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P4.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P5.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P6.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P7.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P8.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P9.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P10.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW1 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LW2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW3 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW4 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LW5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE1 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE3 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LE4 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LE5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN1 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN3 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN4 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN6 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN7 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN8 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN9 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN10 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS1 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS3 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS4 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS6 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS7 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS8 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS9 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS10 (nul) tile trace flag: inst_trace,landing,stalls
@0 Architecture is: FYN
@0 Simtile is: hwtile
@0 Sim stats: config-time=0.04, nodes=1, threads=5
@0 Sim stats: tiles=100, simulated_tiles=44, hwtile=40, iotile=4, nultile=56
@24661 50 cycles since an instruction was executed, 81 cycles since a wavelet landed
@24661 No data movement in the last 50 cycles; stopping.
@24661 Simulation used 4.68 seconds, init-time=0.05, total-time=4.72, cycles=24661, cyc/sec=5274.36, tile-cyc/sec=232071.86, tile-cyc/sec-thread=46414.37
@24661 CTF Stats: num_stalls_       =      2332354
@24661 CTF Stats: num_wavelets      =        19682
@24661 CTF Stats: num_inst_dispatch =       297197
@24661 CTF Stats: num_inst_pipe     =       598005
@24661 CTF Stats: num_switch_pos    =            0
Patched WSE-3 spmv-hypersparse host logs:
david in 🌐 nixos-0 in sdk-examples/benchmarks/spmv-hypersparse on  master [$✘!+?] via 🐍 v3.13.8 via ❄️  impure (nix-shell-env) on ☁️  (us-west-2) ζ ➜ ./commands_wse3.sh
[INFO] === Beginning compilation ===
[INFO] Compilation successful
[INFO] === Calling SDK python ===
cslc = cslc
width_west_buf = 0
width_east_buf = 0
channels = 1
width = 4, height = 4
infile_mtx = ./data/rmat4.4x4.lb.mtx
Load matrix A, 16-by-16 with 108 nonzeros
prepare the structure for spmv kernel: 0.011369705200195312s
Generating reference y = A*x ...
Total memory use per PE = 200 bytes = 0.1953125 KB
fabric_width = 11, fabric_height = 6
core_fabric_offset_x = 4, core_fabric_offset_y = 1
store ELFs and log files in the folder  out
[csl_compile_core] use pre-compile ELFs
Compilation done in 5.9604644775390625e-06s
*** Load done in 3.814697265625e-05s
step 1: enable tsc counter to sample the clock
step 2: copy the structure of A and vector x to the device
step 3: sync all PEs to sample the reference clock
step 4: tic() records time_start
step 5: spmv
step 5: toc() records time_end
step 6: prepare (time_start, time_end)
step 7: fetch the timing time_buf_u16[6] = (time_start, time_end), type = u16
step 8: fetch the output vector y of type f32
step 9: prepare reference clock
step 10: D2H reference clock
*** Run done in 4.242101430892944s
cycles_send = 5275 cycles
time_send = 6.205882352941177 us
bandwidth = 141.80094786729856 MB/S
Comparing result with reference...
reference[16]:
[7.4496174 4.7344646 2.8157272 0.9724683 6.1381197 3.54332   3.2193651
 2.5113463 5.151715  4.040394  3.164474  2.6652527 6.0831847 6.714239
 3.694953  1.3090382]
result   [16]:
[7.4496174 4.7344646 2.8157275 0.9724683 6.1381197 3.5433197 3.219365
 2.5113463 5.151715  4.040394  3.164474  2.6652527 6.0831842 6.714239
 3.694953  1.3090382]
[[ Absolute diff: 1.1920928955078125e-06 ]]
[[ Average diff : 7.450580596923828e-08 ]]
[[ Result within tolerance 1e-08: PASS ]]
[[ Result within tolerance 1e-08: PASS ]]
Patched WSE-3 spmv-hypersparse simulator logs:
david in 🌐 nixos-0 in sdk-examples/benchmarks/spmv-hypersparse on  master [$✘!+?] via 🐍 v3.13.8 via ❄️  impure (nix-shell-env) on ☁️  (us-west-2) took 9s ζ ➜ cat sim.log
@0 GITREV=4586d3f0d8b1e12bc435ce5c7e6436ee6ae2b9b8, GITCLEAN=1
@0 Create from elfs
@0 sizeof(struct hwtile):     209888 bytes
@0 sizeof(      instr_t):       1584 bytes
@0 sizeof(     decode_t):        148 bytes
@0 sizeof(     memreq_t):         24 bytes
@0 sizeof(        dsr_t):        304 bytes
@0 sizeof(      exops_t):       6584 bytes
@0 sizeof(hwtile_follow):     207840 bytes
@0 sizeof(struct hwtile):     209888 bytes
@0 sizeof(      instr_t):       1584 bytes
@0 sizeof(     decode_t):        148 bytes
@0 sizeof(     memreq_t):         24 bytes
@0 sizeof(        dsr_t):        304 bytes
@0 sizeof(      exops_t):       6584 bytes
@0 sizeof(hwtile_follow):     207840 bytes
@0 dimX=11, dimY=6
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/generated/default.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/generated/coord.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_7_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_13_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_6_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_15_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_8_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_3_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_4_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_9_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_1_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_11_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_12_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_0_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_14_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_5_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_10_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/bin/out_2_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_1_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_2_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_0_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_3_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_5_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/east/bin/out_4_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_5_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_6_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_1_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_4_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_7_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_0_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_3_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/west/bin/out_2_0.elf
@0 Reading ELF file /tmp/sdk-examples/benchmarks/spmv-hypersparse/out/generated/MEMCPY_XY_ROUTES.elf
@0 Invocation key is: sim9888033D95A46690D08E1F53987F1198
@0 Plan key is: plan34A3D4139F5F8BE0CD504134808A3823
@0 SimTraceCtf, flags:  wavelets=0 landing=1 inst=1 switch_positions=0 stalls=1
@0 P0.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P2.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P3.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P4.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P5.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P6.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P7.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P8.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P9.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P10.0 (nul) tile trace flag: inst_trace,landing,stalls
@0 P0.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P1.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.1 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.1 (nul) tile trace flag: inst_trace,landing,stalls
@0 P0.2 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.2 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.2 (nul) tile trace flag: inst_trace,landing,stalls
@0 P0.3 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.3 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P0.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P1.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P2.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P3.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P4.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P5.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P6.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P7.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P8.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P9.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P10.4 (hwtile) tile trace flag: inst_trace,landing,stalls
@0 P0.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P1.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P2.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P3.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P4.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P5.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P6.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P7.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P8.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P9.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 P10.5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW1 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LW2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW3 (nul) tile trace flag: inst_trace,landing,stalls
@0 LW4 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LW5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE1 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LE3 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LE4 (iotile) tile trace flag: inst_trace,landing,stalls
@0 LE5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN1 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN3 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN4 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN6 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN7 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN8 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN9 (nul) tile trace flag: inst_trace,landing,stalls
@0 LN10 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS0 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS1 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS2 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS3 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS4 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS5 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS6 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS7 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS8 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS9 (nul) tile trace flag: inst_trace,landing,stalls
@0 LS10 (nul) tile trace flag: inst_trace,landing,stalls
@0 Architecture is: SDR
@0 Simtile is: hwtile
@0 Sim stats: config-time=0.03, nodes=1, threads=5
@0 Sim stats: tiles=100, simulated_tiles=44, hwtile=40, iotile=4, nultile=56
@21926 50 cycles since an instruction was executed, 86 cycles since a wavelet landed
@21926 No data movement in the last 50 cycles; stopping.
@21926 Simulation used 4.13 seconds, init-time=0.11, total-time=4.24, cycles=21926, cyc/sec=5307.84, tile-cyc/sec=233545.09, tile-cyc/sec-thread=46709.02
@21926 CTF Stats: num_stalls_       =            0
@21926 CTF Stats: num_wavelets      =        19305
@21926 CTF Stats: num_inst_dispatch =       204729
@21926 CTF Stats: num_inst_pipe     =       812811
@21926 CTF Stats: num_switch_pos    =            0

@david-r-cox
Copy link
Copy Markdown
Author

Hey @leightonw-cerebras and @mathias-cerebras, here are the changes we had to make to get SpMV hypersparse working on the WSE-3, rebased on top of the latest changes from 2.10.0.

Let me know if you have any feedback or questions. Cheers!

@david-r-cox
Copy link
Copy Markdown
Author

BTW these changes only work for WSE-3 and are not backwards compatible with WSE-2.

@mathias-cerebras
Copy link
Copy Markdown
Collaborator

Thank you @david-r-cox !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants