Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1224 commits
Select commit Hold shift + click to select a range
232848d
PYTHONREMU: VOP3P integer operations with constants don't cast to fp1…
sirhcm Feb 5, 2026
e8dace4
clean up UOp.vars [pr] (#14547)
chenyuxyz Feb 5, 2026
c0ca7f9
use more UOp.sum and UOp.prod [pr] (#14549)
chenyuxyz Feb 5, 2026
f9cfb64
test asm_gemm in CI (#14551)
Qazalin Feb 5, 2026
43e7eda
grad_b uses custom gemm (#14550)
geohot Feb 5, 2026
c1ea668
fa: simpler is faster (#14548)
wozeparrot Feb 5, 2026
b398335
assembly/amd: fix saturation in python remu (#14557)
geohot Feb 5, 2026
1900423
llama: faster bf16 matmul / rope backward (#14558)
Qazalin Feb 5, 2026
483bba4
nv: use prof_exec_counter (#14559)
nimlgen Feb 5, 2026
42c18da
add Ops asserts in toposort sched_sink [pr] (#14561)
chenyuxyz Feb 5, 2026
2b47a9a
skip test_xlm_roberta_large (#14563)
chenyuxyz Feb 5, 2026
b47397a
list ml_dtypes as dependency for DSP (#14562)
sirhcm Feb 5, 2026
aa9dc50
dtype decomps don't require bitshifts (#14542)
sirhcm Feb 5, 2026
41a179f
fix test_xlm_roberta_large (#14564)
chenyuxyz Feb 5, 2026
79b7799
clean up linearize schedule [pr] (#14565)
chenyuxyz Feb 5, 2026
cee7ef7
disable threads (#14555)
TheVanadium Feb 5, 2026
b7ef775
more cleanup in create_schedule [pr] (#14566)
chenyuxyz Feb 5, 2026
f73468d
fa: block skipping for fa kv bwd (#14569)
wozeparrot Feb 6, 2026
28c56a7
add CallInfo and viz call toggle (#14570)
geohot Feb 6, 2026
6cbcf98
KernelInfo is required on get_program (#14571)
geohot Feb 6, 2026
d41836f
remove KERNEL special case in realize_assign [pr] (#14573)
chenyuxyz Feb 6, 2026
b09dc64
revert some late_buffer_view change (#14578)
chenyuxyz Feb 6, 2026
50a166a
viz: cleanup amdgpu target mapping (#14579)
Qazalin Feb 6, 2026
15d3344
use int inputs in test_assign (#14580)
chenyuxyz Feb 6, 2026
be77873
llama: contig backward for wk / wv matmul backward (#14581)
Qazalin Feb 6, 2026
cf73d7e
hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582)
Qazalin Feb 6, 2026
3c26ce2
make disk tensor tests process safe (#14584)
geohot Feb 6, 2026
03af240
small changes and test fixes from kernel is call (#14586)
geohot Feb 6, 2026
7cb996e
bottom up earliest rewrites (#14587)
geohot Feb 6, 2026
fbeb978
diff devices for sdma (#14589)
nimlgen Feb 6, 2026
b7e3fbe
llama: add VIZ=-1 to dev_run (#14583)
Qazalin Feb 6, 2026
a80fb4e
viz: better ordering of device engines in profiler (#14590)
Qazalin Feb 6, 2026
fbb67a3
am_smi: fix after regen (#14594)
nimlgen Feb 6, 2026
197ebcb
log seed with flush=True in fuzz_symbolic (#14597)
chenyuxyz Feb 6, 2026
b9fe8b7
fix opt in process replay [pr] (#14599)
chenyuxyz Feb 6, 2026
7d193a6
fix wgsl bitcast (#14600)
chenyuxyz Feb 6, 2026
81f6cdb
delete realize_assign [pr] (#14575)
chenyuxyz Feb 6, 2026
7bb45e7
decompose fp8 to bigger floats [skip_process_replay] (#14554)
sirhcm Feb 7, 2026
ad9e2f0
decompose bf16 (#14601)
sirhcm Feb 7, 2026
d5652e4
new dtype aliases (#14596)
ttomsa Feb 7, 2026
462b455
cleanup linearize (#14523)
ttomsa Feb 7, 2026
d87ae1c
feat: tinyfs load test in benchmark (#14602)
wozeparrot Feb 7, 2026
ca6604e
kernel is call (#14577)
geohot Feb 7, 2026
7a2a3b5
Remove Ops.KERNEL, it's all Ops.CALL now (#14603)
geohot Feb 7, 2026
884592f
pin z3-solver version (#14605)
chenyuxyz Feb 7, 2026
6838b35
mockgpu: hevc (#14606)
nimlgen Feb 7, 2026
c2544e2
viz: remove outdated comment (#14608)
Qazalin Feb 7, 2026
ce7bfc6
nv: use nv_flags for all fields (#14607)
nimlgen Feb 7, 2026
88c3022
amd: kfd iface early exit (#14612)
nimlgen Feb 7, 2026
b7afd44
use arg instead of 3rd op for ASSIGN [pr] (#14613)
chenyuxyz Feb 7, 2026
510b654
style change rangeify assign [pr] (#14616)
chenyuxyz Feb 7, 2026
b10802e
use existing VIZ ContextVar instead of getenv (#14610)
Qazalin Feb 8, 2026
e29a88c
hive_reset respects lock (#14618)
nimlgen Feb 8, 2026
183d38b
remove CUSTOM_KERNEL / directly construct it (#14604)
geohot Feb 8, 2026
087dab4
gemm/asm: split out cdna tests from CI (#14619)
Qazalin Feb 8, 2026
c28f7d0
remove realize in Tensor.svd (#14623)
chenyuxyz Feb 8, 2026
a615b9d
am: f8_mode for gfx94x only (#14620)
nimlgen Feb 8, 2026
01a4ee4
do not hive_reset when amdgpu (#14624)
nimlgen Feb 8, 2026
1667669
fix: python3 -m tinygrad.device reporting on AMD/CPU (#14622)
PhillCli Feb 8, 2026
0e50595
new style NV/CUDA renderers (#14627)
sirhcm Feb 9, 2026
4ad787e
new style CPULLVMRenderer (#14629)
sirhcm Feb 9, 2026
5f2f2cc
Revert "new style NV/CUDA renderers (#14627)" (#14633)
sirhcm Feb 9, 2026
9eef9f3
new style python renderer (#14631)
sirhcm Feb 9, 2026
0ebb508
new style metal compiler (#14632)
sirhcm Feb 9, 2026
efac5b9
new style NV/CUDA renderers, try 2 (#14634)
sirhcm Feb 9, 2026
27f7ea4
new style DSP renderer (#14636)
sirhcm Feb 9, 2026
e087c58
print tables in llama/profile.sh (#14639)
nimlgen Feb 9, 2026
6c0c8e2
setitem push a realize to basic setitem (#14637)
chenyuxyz Feb 9, 2026
2c3e355
remove a contiguous in basic setitem (#14640)
chenyuxyz Feb 9, 2026
a49e038
dont manually broadcast in setitem (#14641)
chenyuxyz Feb 9, 2026
80b0119
llama: add new asm gemm shape (#14611)
Qazalin Feb 9, 2026
8a2c23d
raise RuntimeError for setitem dtype mismatch (#14642)
chenyuxyz Feb 9, 2026
50d3f6c
EVAL_BS=0 in llama profile (#14643)
Qazalin Feb 9, 2026
20a132b
relax atol for test_uop_scan_matmul (#14646)
chenyuxyz Feb 9, 2026
e9f40f4
explicitly check advanced setitem (#14644)
chenyuxyz Feb 9, 2026
205a121
delegate non Tensor src setitem to assign (#14647)
chenyuxyz Feb 9, 2026
0913c06
clean up setitem disk path (#14648)
chenyuxyz Feb 9, 2026
9e3f24d
assign realize fix (#14649)
chenyuxyz Feb 9, 2026
396e132
bump cache version for z3 (#14650)
sirhcm Feb 10, 2026
e6562a5
remove CompilerPair (#14638)
sirhcm Feb 10, 2026
b36b62e
don't push docker cache for PRs (#14652)
sirhcm Feb 10, 2026
0dedf40
minor test_setitem cleanup (#14654)
chenyuxyz Feb 10, 2026
6957454
fix: use correct fa implementation in eval (#14651)
wozeparrot Feb 10, 2026
83f6d28
two less realize in setitem (#14655)
chenyuxyz Feb 10, 2026
cc9bf8c
move more to null/unit tests (#14658)
geohot Feb 10, 2026
cdb7895
better cl compiler name (#14660)
sirhcm Feb 10, 2026
8dc46dd
everything has dtype.long now (#14661)
geohot Feb 10, 2026
8297492
use PARAM in schedule (#14665)
geohot Feb 10, 2026
42ded7c
amd: bind aql (#14666)
nimlgen Feb 10, 2026
494eec2
test_setitem_const_fused (#14668)
chenyuxyz Feb 10, 2026
aafa9dc
eliminate same-device copy self-assigns (#14671)
nimlgen Feb 10, 2026
ebef63d
update test_self_assign_same_device_copy (#14673)
chenyuxyz Feb 10, 2026
3fab43c
add cache to asm gemm (#14675)
geohot Feb 11, 2026
0662c80
transcendental works with long decomp (#14672)
sirhcm Feb 11, 2026
389e2ee
Revert "transcendental works with long decomp" (#14676)
sirhcm Feb 11, 2026
2d4ad9e
add a waitlist for graph rewrite (#14678)
geohot Feb 11, 2026
4565958
some lil speedups (#14679)
geohot Feb 11, 2026
a60220b
llama3: move dl to numpy & jit more (#14677)
wozeparrot Feb 11, 2026
df8b21e
add real self assign test (#14683)
nimlgen Feb 11, 2026
0d215b9
few setitem test cases diff from numpy (#14684)
chenyuxyz Feb 11, 2026
7465b22
handle setitem target in rangeify (#14685)
chenyuxyz Feb 11, 2026
cbbc2fd
update test_assign_slice_then_read (#14687)
chenyuxyz Feb 11, 2026
869083e
nv: pciiface pma (#14686)
nimlgen Feb 11, 2026
0c63f63
recursive resolve assign dependency (#14688)
chenyuxyz Feb 11, 2026
4b5d3bd
llama3: data seed (#14681)
wozeparrot Feb 12, 2026
c331798
move tests to test/backend (#14691)
geohot Feb 12, 2026
befc1e8
assembly/amd: disasm is test only (#14694)
geohot Feb 12, 2026
b1a3876
IMAGE=1 supports FLOAT16=1 (#14693)
sirhcm Feb 12, 2026
025049c
clean up sqtt / update src formatting in viz (#14696)
geohot Feb 12, 2026
14a1991
viz: sort tracks in timeline (#14591)
nimlgen Feb 12, 2026
095a064
test.yml explicitly says backend (#14700)
geohot Feb 12, 2026
d5fc3ea
assembly/amd: mypy+ruff passes (#14701)
geohot Feb 12, 2026
4680247
renderer/amd: move in tree (#14702)
geohot Feb 12, 2026
b7dade2
hotfix: skip test/amd in macpytest
geohot Feb 12, 2026
19e68a1
skip AMD on not AMD (#14703)
geohot Feb 12, 2026
b376bd7
jit: fix raw in same kernel (#14699)
nimlgen Feb 12, 2026
10c94d2
amd: print more info about device hang (#14705)
nimlgen Feb 12, 2026
557134e
model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706)
chenyuxyz Feb 12, 2026
212789e
fix long_decomp with None tag (#14707)
chenyuxyz Feb 12, 2026
8551fa5
support bitcast in sym_infer (#14708)
chenyuxyz Feb 12, 2026
56caf6a
fix Estimate.from_uops for sliced access (#14695)
chenyuxyz Feb 12, 2026
8635298
update test_uops_stats for setitem (#14710)
chenyuxyz Feb 12, 2026
787998f
fix getitem tensor indexing detection (#14712)
chenyuxyz Feb 12, 2026
9b3b597
minor getitem cleanups (#14713)
chenyuxyz Feb 12, 2026
c30bb0f
fix WEBGPU isnan check (#14711)
sirhcm Feb 12, 2026
084d0d0
cleanup macos webgpu tests (#14715)
sirhcm Feb 12, 2026
d4bc5ab
autogen: download linux sources (#14714)
sirhcm Feb 12, 2026
d3adb84
Revert "hotfix: skip test/amd in macpytest" (#14704)
geohot Feb 13, 2026
9e33a08
use more pad_to and shrink_to in tensor.py (#14719)
chenyuxyz Feb 13, 2026
4088d68
remove llvm requirement from amd (#14717)
geohot Feb 13, 2026
5b624b5
viz: better error message for out of range timestamps (#14722)
Qazalin Feb 13, 2026
50cb40b
clean up test/null/test_indexing.py (#14720)
chenyuxyz Feb 13, 2026
0613c0a
hipkittens fa forward (#14692)
wozeparrot Feb 13, 2026
7993f3a
autogen: use snapshot.debian.org for linux src (#14718)
sirhcm Feb 13, 2026
08a555c
skip test_expand_buffer_before_cast on WEBGPU metal (#14724)
sirhcm Feb 13, 2026
5289b4e
renderer/amd: add cdna emulator (#14721)
geohot Feb 13, 2026
c0de4f7
improve mmapeak, print names with sqtt (#14726)
geohot Feb 13, 2026
ba67425
am: reset mi300 with pm4 (#14727)
nimlgen Feb 13, 2026
d054306
viz: wave color is locally scoped (#14728)
Qazalin Feb 13, 2026
c0fe78f
BUG: metadata is lost with partial assign (#14732)
geohot Feb 13, 2026
7d88626
nv: fix pma_bytes to be system memory (#14733)
nimlgen Feb 13, 2026
3bee663
external_test_hive_reset (#14729)
nimlgen Feb 13, 2026
8b205a0
lazy setitem for realized target (#14735)
chenyuxyz Feb 13, 2026
9f607cf
disk setitem does not need realize either (#14736)
chenyuxyz Feb 13, 2026
dca7819
more setitem into unrealized tests (#14737)
chenyuxyz Feb 14, 2026
e8bd432
move amd emulator out of tree (#14740)
geohot Feb 14, 2026
6dc7ea5
make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 (#14742)
Qazalin Feb 14, 2026
f9d2eca
clean up amd/elf.py (#14741)
geohot Feb 14, 2026
c88bb07
hotfix: correct way to get renderer arch (#14743)
Qazalin Feb 14, 2026
9d9ef81
use zip_extract and tar_extract in torch load (#14734)
bautista-garcia Feb 14, 2026
eaa9506
disallow subnormals in emulated test_dtype (#14744)
sirhcm Feb 14, 2026
e35bd96
Revert "use zip_extract and tar_extract in torch load (#14734)" (#14745)
geohot Feb 14, 2026
e1a18da
fix devices for copies (#14747)
nimlgen Feb 14, 2026
4ab51b5
stream pma decoder (#14746)
nimlgen Feb 14, 2026
446909f
more setitem kernel tests (#14748)
chenyuxyz Feb 14, 2026
8f6772f
more setitem kernel mem tests (#14749)
chenyuxyz Feb 14, 2026
0ce4a55
clean up test_setitem_slice (#14750)
chenyuxyz Feb 14, 2026
95f4c7e
fix limit_bufs to not limit index (#14751)
chenyuxyz Feb 14, 2026
d79c63a
test_multi_step_assign_read_write_same_buffer (#14752)
chenyuxyz Feb 14, 2026
043f5db
fix write-after-read tracking (#14754)
chenyuxyz Feb 14, 2026
902dc7c
fix test_numpy_parity_and_backward_2d (#14755)
chenyuxyz Feb 14, 2026
32980c7
hotfix: skip flaky tests, looped many times on tinymac3
geohot Feb 14, 2026
ca68037
lazy basic setitem to unrealized Tensor (#14756)
chenyuxyz Feb 15, 2026
9bb6014
keep existing profile trace in viz cli (#14757)
Qazalin Feb 15, 2026
d176af6
start outerworld call test, fix gate (#14758)
geohot Feb 15, 2026
0e215c4
remove hack from cast (#14760)
geohot Feb 15, 2026
8091661
more more to mixins (#14761)
geohot Feb 15, 2026
42b6bf0
fix sdpa causal failing test on multi (#14762)
Qazalin Feb 15, 2026
9759fd6
dtype mixin (#14763)
geohot Feb 15, 2026
9da7f5e
disable process replay for AMD emulator renderer [pr] (#14766)
Qazalin Feb 15, 2026
713143a
more mixins pt 2 (#14765)
geohot Feb 15, 2026
ceccc8e
unskip now passing multi tests [pr] (#14759)
Qazalin Feb 15, 2026
352845d
update cast to uint tests (#14768)
chenyuxyz Feb 15, 2026
33b31d9
tinykittens flash attention dtype fix, add CI (#14770)
Qazalin Feb 15, 2026
26193cb
nv: prof cpu_access for nvd only (#14769)
nimlgen Feb 15, 2026
17db43a
remove some contiguous call in frontend (#14772)
chenyuxyz Feb 15, 2026
1ded250
remove collapse_nested_assign [pr] (#14775)
chenyuxyz Feb 15, 2026
9c95a11
autogen: handle rocm bump and better error wording (#14776)
sirhcm Feb 16, 2026
ac079e4
ElementwiseMixin (#14777)
geohot Feb 16, 2026
bd18217
add rdna3/rdna4/cdna4 to testamd (#14778)
geohot Feb 16, 2026
3adb506
clean up assign_to_contiguous [pr] (#14779)
chenyuxyz Feb 16, 2026
156b6cb
native bf16 cast in cdna4 (#14574)
Qazalin Feb 16, 2026
33b2ade
Rdna4 emulator test_ops, dtypes pass (#14773)
kevvz Feb 16, 2026
8e7c5f5
remove Tensor.training = True in test_arange (#14781)
Qazalin Feb 16, 2026
0abcb9a
move more to mixins (#14780)
geohot Feb 16, 2026
c2be31e
move Estimates to rewrite rules [pr] (#14782)
Qazalin Feb 16, 2026
55a4dfa
cdna4 asm_gemm tests in CI on the null backend (#14785)
Qazalin Feb 16, 2026
dff9cf3
amd asm emulator fixes + run it in CI (#14786)
geohot Feb 16, 2026
0f1ca8e
torch_load: fix shared storage slicing (#14771)
bautista-garcia Feb 16, 2026
c7a4dbf
viz: get program binary from the UOp (#14787)
Qazalin Feb 16, 2026
401095e
emulator barrier tests (#14789)
geohot Feb 16, 2026
ac62d28
viz: amdgpu arch cleanup (#14790)
Qazalin Feb 16, 2026
20b658b
fuse MULACC after MUL->SHL (#14788)
npinto Feb 16, 2026
45aebe1
hipkittens fa backward (#14723)
wozeparrot Feb 16, 2026
47d39a6
add sqtt support to the emulator (#14791)
geohot Feb 16, 2026
d213fe9
viz: integer ticks on the x axis, fix small cycle numbers (#14792)
Qazalin Feb 16, 2026
2b36708
viz: split all long labels with ... (#14794)
Qazalin Feb 16, 2026
db3db47
viz: add GB/s to SDMA (#14795)
Qazalin Feb 16, 2026
9f8afb5
viz: sdma gb/s in graph (#14798)
nimlgen Feb 16, 2026
7ddc888
am: 48bit for gfx950 (#14799)
nimlgen Feb 16, 2026
131bbbb
am: smu_v13_0_12 (#14800)
nimlgen Feb 16, 2026
e41da0c
use relative address for MOCKGPU rdna4 tracing (#14801)
kevvz Feb 16, 2026
f290af6
test_schedule always test with SPLIT_REDUCEOP=0 (#14802)
chenyuxyz Feb 16, 2026
9b44fbe
update test_assign_add_twice (#14806)
chenyuxyz Feb 16, 2026
ba39a19
viz: remove duplicate Ops.PARAM color (#14808)
ridoy Feb 17, 2026
5bca5be
test slice assign twice retains the buffer (#14807)
chenyuxyz Feb 17, 2026
bc3487d
VIZ display cleanups (#14811)
geohot Feb 17, 2026
f081f15
parameterize the CDNA asm gemm (#14813)
geohot Feb 17, 2026
275319c
IMAGE=1 2d indexing (#14809)
sirhcm Feb 17, 2026
5bd2862
late compile the cdna gemm (#14783)
geohot Feb 17, 2026
f590564
gemm multiple is only for cdna4 asm (#14814)
Qazalin Feb 17, 2026
99a988b
viz: remove ProgramSpec from trace (#14818)
Qazalin Feb 17, 2026
5fc3d81
big sink is on base (#14819)
geohot Feb 17, 2026
d24781f
viz: do not, ever, open devices (#14820)
Qazalin Feb 17, 2026
f8e485e
nvcc/nvdisasm macos shim (#14822)
Qazalin Feb 17, 2026
ff60dab
Revert "big sink is on base (#14819)" (#14825)
geohot Feb 17, 2026
58fa82e
stronger test_assign_add (#14826)
chenyuxyz Feb 17, 2026
f2f039c
fix chained full-buffer assign (#14828)
chenyuxyz Feb 17, 2026
a2586e4
nv: move reset earlier (#14824)
nimlgen Feb 17, 2026
f07898c
move assign chain fix to rangeify (#14829)
chenyuxyz Feb 17, 2026
801677c
am: GCVM_L2_PROTECTION_FAULT_STATUS prints device (#14830)
nimlgen Feb 17, 2026
dda5ccf
hcq: fix usb<->cpu mappings (#14827)
nimlgen Feb 17, 2026
9d4937a
remove assign test @unittest.skip("this test is crashing!") (#14831)
chenyuxyz Feb 17, 2026
f147791
update test to reset and test kernel_count directly (#14832)
chenyuxyz Feb 17, 2026
61867c2
TestRealizeIsRealized (#14834)
chenyuxyz Feb 17, 2026
df7c37f
one run_schedule for assign realize (#14835)
chenyuxyz Feb 17, 2026
aec8a6c
Revert "one run_schedule for assign realize (#14835)" (#14837)
chenyuxyz Feb 17, 2026
72cf603
removed if self.buffer.is_allocated() in realized (#14836)
chenyuxyz Feb 17, 2026
95e97ec
seperate llama optim (#14810)
wozeparrot Feb 17, 2026
5b11519
LLVM actually supports ops (#14843)
sirhcm Feb 17, 2026
7641ed6
remove doublecast in IMAGE=1 (#14839)
sirhcm Feb 17, 2026
e3c120c
exclude 100 in test_assign_add (#14846)
chenyuxyz Feb 18, 2026
ab55e8c
assign should be used as output buffer (#14845)
geohot Feb 18, 2026
d5636fb
assign after copy shouldn't contig (#14847)
geohot Feb 18, 2026
a3d516c
viz: start displaying pma (#14848)
Qazalin Feb 18, 2026
6d301ad
feat: llama wqkv (#14841)
wozeparrot Feb 18, 2026
af839b2
remove all the outerworld stuff, it was too complex (#14852)
geohot Feb 18, 2026
b0110c4
viz: simplify shape clicking (#14853)
Qazalin Feb 18, 2026
a212881
viz: second profiler link goes to source code (#14855)
Qazalin Feb 18, 2026
3b95fa0
am_smi: enable mem usage back (#14858)
nimlgen Feb 18, 2026
5746a60
UOp.axis raises for invalid reshape (#14863)
chenyuxyz Feb 18, 2026
0260406
simplify reshape_multi [pr] (#14864)
chenyuxyz Feb 18, 2026
b3cdb61
clean up expand_multi [pr] (#14865)
chenyuxyz Feb 18, 2026
1c8c17a
am: aca (#14861)
nimlgen Feb 18, 2026
f84a11b
delete uneven shard tests and mentions (#14867)
chenyuxyz Feb 18, 2026
f771de6
gc.collect() to get the correct GlobalCounters.mem_used in tests (#14…
chenyuxyz Feb 18, 2026
0e4cf21
remove handle_allreduce_multirank and group_id [pr] (#14869)
chenyuxyz Feb 18, 2026
4005e9d
Mxfp4 fix (#14866)
Ananta-Ranganathan Feb 18, 2026
d3a0d6c
add llama2 70b lora training
nbarbier-265 Nov 17, 2025
9e3a807
tested on macbook
nbarbier-265 Feb 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/actions/process-replay/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ runs:
git fetch origin $CURRENT_SHA
export COMMIT_MESSAGE=$(git show -s --format=%B "$CURRENT_SHA")
export CURRENT_HEAD=$(git rev-parse HEAD)
cp test/external/process_replay/process_replay.py ./process_replay.py && git fetch origin master && git -c advice.detachedHead=false checkout origin/master && IGNORE_OOB=1 PYTHONPATH=. python3 process_replay.py
cp test/external/process_replay/process_replay.py ./process_replay.py && git fetch origin master && git -c advice.detachedHead=false checkout origin/master && CHECK_OOB=0 PYTHONPATH=. python3 process_replay.py
git checkout $CURRENT_HEAD # restore to branch
61 changes: 42 additions & 19 deletions .github/actions/setup-tinygrad/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,32 +56,40 @@ runs:

# **** Caching packages ****

- name: Cache Python packages (PR)
if: github.event_name == 'pull_request'
id: restore-venv-pr
uses: actions/cache/restore@v4
with:
path: ${{ github.workspace }}/.venv
key: venv-${{ runner.os }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ env.CACHE_VERSION }}
- name: Cache Python packages
if: github.event_name != 'pull_request'
id: restore-venv
uses: actions/cache@v4
with:
path: ${{ github.workspace }}/.venv
key: venv-${{ runner.os }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ hashFiles('**/pyproject.toml') }}-${{ env.CACHE_VERSION }}
key: venv-${{ runner.os }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ env.CACHE_VERSION }}

# **** Caching downloads ****

- name: Cache downloads (Linux)
if: inputs.key != '' && runner.os == 'Linux'
uses: actions/cache@v4
- name: Cache downloads (PR)
if: inputs.key != '' && github.event_name == 'pull_request'
uses: actions/cache/restore@v4
with:
path: ~/.cache/tinygrad/downloads/
key: downloads-cache-${{ inputs.key }}-${{ env.CACHE_VERSION }}
- name: Cache downloads (macOS)
if: inputs.key != '' && runner.os == 'macOS'
path: ${{ runner.os == 'Linux' && '~/.cache/tinygrad/downloads/' || '~/Library/Caches/tinygrad/downloads/' }}
key: downloads-${{ github.job }}-${{ inputs.key }}-${{ env.CACHE_VERSION }}
- name: Cache downloads
if: inputs.key != '' && github.event_name != 'pull_request'
uses: actions/cache@v4
with:
path: ~/Library/Caches/tinygrad/downloads/
key: osx-downloads-cache-${{ inputs.key }}-${{ env.CACHE_VERSION }}
path: ${{ runner.os == 'Linux' && '~/.cache/tinygrad/downloads/' || '~/Library/Caches/tinygrad/downloads/' }}
key: downloads-${{ github.job }}-${{ inputs.key }}-${{ env.CACHE_VERSION }}

# **** Python deps ****

- name: Install dependencies in venv (with extra)
if: inputs.deps != '' && steps.restore-venv.outputs.cache-hit != 'true'
if: inputs.deps != '' && steps.restore-venv-pr.outputs.cache-hit != 'true' && steps.restore-venv.outputs.cache-hit != 'true'
shell: bash
run: |
python -m venv .venv
Expand All @@ -92,7 +100,7 @@ runs:
fi
python -m pip install -e ".[${{ inputs.deps }}]" ${{ inputs.pydeps }} --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/
- name: Install dependencies in venv (without extra)
if: inputs.deps == '' && steps.restore-venv.outputs.cache-hit != 'true'
if: inputs.deps == '' && steps.restore-venv-pr.outputs.cache-hit != 'true' && steps.restore-venv.outputs.cache-hit != 'true'
shell: bash
run: |
python -m venv .venv
Expand Down Expand Up @@ -137,7 +145,7 @@ runs:
run: |
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
sudo tee /etc/apt/sources.list.d/rocm.list <<EOF
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.2 $(lsb_release -cs) main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.1 $(lsb_release -cs) main
EOF
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600

Expand Down Expand Up @@ -182,8 +190,14 @@ runs:
echo "pkgs=$pkgs" >> "$GITHUB_OUTPUT"
echo "hash=$(echo -n "$pkgs" | sha256sum | cut -d' ' -f1)" >> "$GITHUB_OUTPUT"

- name: Cache apt (PR)
if: runner.os == 'Linux' && (inputs.opencl == 'true' || inputs.amd == 'true' || inputs.cuda == 'true' || inputs.webgpu == 'true' || inputs.llvm == 'true') && github.event_name == 'pull_request'
uses: actions/cache/restore@v4
with:
path: /var/cache/apt/archives/
key: ${{ runner.os }}-apt-${{ steps.apt-pkgs.outputs.hash }}-${{ env.CACHE_VERSION }}
- name: Cache apt
if: runner.os == 'Linux' && (inputs.opencl == 'true' || inputs.amd == 'true' || inputs.cuda == 'true' || inputs.webgpu == 'true' || inputs.llvm == 'true')
if: runner.os == 'Linux' && (inputs.opencl == 'true' || inputs.amd == 'true' || inputs.cuda == 'true' || inputs.webgpu == 'true' || inputs.llvm == 'true') && github.event_name != 'pull_request'
uses: actions/cache@v4
with:
path: /var/cache/apt/archives/
Expand Down Expand Up @@ -221,7 +235,7 @@ runs:
sudo mkdir -p /usr/local/lib
curl -s -H "Authorization: token $GH_TOKEN" curl -s https://api.github.com/repos/nimlgen/amdcomgr_dylib/releases/latest | \
jq -r '.assets[] | select(.name == "libamd_comgr.dylib").browser_download_url' | \
sudo xargs curl -L -o /usr/local/lib/libamd_comgr.dylib
sudo xargs curl -fL -o /usr/local/lib/libamd_comgr.dylib
cargo build --release --manifest-path ./extra/remu/Cargo.toml

# **** gpuocelot ****
Expand All @@ -239,8 +253,17 @@ runs:
ln -s /opt/homebrew/opt/boost@1.85 /opt/homebrew/opt/boost || true
ln -s /opt/homebrew/opt/boost/lib/libboost_atomic-mt.dylib /opt/homebrew/opt/boost/lib/libboost_atomic.dylib || true
ln -s /opt/homebrew/opt/boost/lib/libboost_thread-mt.dylib /opt/homebrew/opt/boost/lib/libboost_thread.dylib || true
- name: Cache gpuocelot (PR)
if: inputs.ocelot == 'true' && github.event_name == 'pull_request'
id: cache-build-pr
uses: actions/cache/restore@v4
env:
cache-name: cache-gpuocelot-build-1
with:
path: ${{ github.workspace }}/gpuocelot/ocelot
key: ${{ runner.os }}-gpuocelot-b16039dc940dc6bc4ea0a98380495769ff35ed99-rebuild-${{ env.CACHE_VERSION }}
- name: Cache gpuocelot
if: inputs.ocelot == 'true'
if: inputs.ocelot == 'true' && github.event_name != 'pull_request'
id: cache-build
uses: actions/cache@v4
env:
Expand All @@ -249,7 +272,7 @@ runs:
path: ${{ github.workspace }}/gpuocelot/ocelot
key: ${{ runner.os }}-gpuocelot-b16039dc940dc6bc4ea0a98380495769ff35ed99-rebuild-${{ env.CACHE_VERSION }}
- name: Clone/compile gpuocelot
if: inputs.ocelot == 'true' && steps.cache-build.outputs.cache-hit != 'true'
if: inputs.ocelot == 'true' && steps.cache-build-pr.outputs.cache-hit != 'true' && steps.cache-build.outputs.cache-hit != 'true'
shell: bash
run: |
git clone --recurse-submodules https://github.com/gpuocelot/gpuocelot.git ${{ github.workspace }}/gpuocelot
Expand Down Expand Up @@ -278,7 +301,7 @@ runs:
if: inputs.webgpu == 'true' && runner.os == 'Linux'
shell: bash
run: |
sudo curl -L https://github.com/wpmed92/pydawn/releases/download/v0.1.6/libwebgpu_dawn.so -o /usr/local/lib/libwebgpu_dawn.so
sudo curl -fL https://github.com/wpmed92/pydawn/releases/download/v0.1.6/libwebgpu_dawn.so -o /usr/local/lib/libwebgpu_dawn.so
sudo ldconfig
- name: Install WebGPU dawn (macOS)
if: inputs.webgpu == 'true' && runner.os == 'macOS'
Expand All @@ -298,7 +321,7 @@ runs:
- name: Install mesa (linux)
if: inputs.mesa == 'true' && runner.os == 'Linux'
shell: bash
run: sudo curl -L https://github.com/sirhcm/tinymesa/releases/download/tinymesa-32dc66c/libtinymesa_cpu-mesa-25.2.4-linux-amd64.so -o /usr/lib/libtinymesa_cpu.so
run: sudo curl -fL https://github.com/sirhcm/tinymesa/releases/download/v1/libtinymesa_cpu-mesa-25.2.7-linux-amd64.so -o /usr/lib/libtinymesa_cpu.so
- name: Install mesa (macOS)
if: inputs.mesa == 'true' && runner.os == 'macOS'
shell: bash
Expand Down
167 changes: 65 additions & 102 deletions .github/workflows/autogen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ on:
pull_request:
paths:
- 'tinygrad/runtime/autogen/**/*'
- 'tinygrad/runtime/support/autogen.py'
- '.github/workflows/autogen.yml'
workflow_dispatch:
paths:
- 'tinygrad/runtime/autogen/**/*'
- 'tinygrad/runtime/support/autogen.py'
- '.github/workflows/autogen.yml'

jobs:
autogen:
Expand All @@ -36,105 +40,37 @@ jobs:
mesa: 'true'
pydeps: 'pyyaml mako'
- name: Install autogen support packages
run: sudo apt-get install -y --no-install-recommends libclang-20-dev llvm-20-dev hip-dev libusb-1.0-0-dev
- name: Verify OpenCL autogen
run: sudo apt-get install -y --no-install-recommends libclang-20-dev llvm-20-dev hip-dev libusb-1.0-0-dev libdrm-dev
- name: Regenerate autogen files
run: |
mv tinygrad/runtime/autogen/opencl.py /tmp/opencl.py.bak
find tinygrad/runtime/autogen -type f -name "*.py" -not -path "*/amd/*" -not -name "__init__.py" -not -name "comgr.py" -not -name "metal.py" -not -name "iokit.py" -not -name "corefoundation.py" -not -name "libclang.py" -delete
python3 -c "from tinygrad.runtime.autogen import opencl"
diff /tmp/opencl.py.bak tinygrad/runtime/autogen/opencl.py
- name: Verify CUDA autogen
run: |
mv tinygrad/runtime/autogen/cuda.py /tmp/cuda.py.bak
mv tinygrad/runtime/autogen/nvrtc.py /tmp/nvrtc.py.bak
mv tinygrad/runtime/autogen/nvjitlink.py /tmp/nvjitlink.py.bak
mv tinygrad/runtime/autogen/nv_570.py /tmp/nv_570.py.bak
mv tinygrad/runtime/autogen/nv.py /tmp/nv.py.bak
python3 -c "from tinygrad.runtime.autogen import cuda, nvrtc, nvjitlink, nv_570, nv"
diff /tmp/cuda.py.bak tinygrad/runtime/autogen/cuda.py
diff /tmp/nvrtc.py.bak tinygrad/runtime/autogen/nvrtc.py
diff /tmp/nvjitlink.py.bak tinygrad/runtime/autogen/nvjitlink.py
diff /tmp/nv_570.py.bak tinygrad/runtime/autogen/nv_570.py
diff /tmp/nv.py.bak tinygrad/runtime/autogen/nv.py
- name: Verify AMD autogen
run: |
mv tinygrad/runtime/autogen/comgr.py /tmp/comgr.py.bak
mv tinygrad/runtime/autogen/hsa.py /tmp/hsa.py.bak
mv tinygrad/runtime/autogen/hip.py /tmp/hip.py.bak
mv tinygrad/runtime/autogen/amd_gpu.py /tmp/amd_gpu.py.bak
mv tinygrad/runtime/autogen/sqtt.py /tmp/sqtt.py.bak
mv tinygrad/runtime/autogen/rocprof.py /tmp/rocprof.py.bak
mv tinygrad/runtime/autogen/am/am.py /tmp/am_am.py.bak
mv tinygrad/runtime/autogen/am/pm4_soc15.py /tmp/am_pm4_soc15.py.bak
mv tinygrad/runtime/autogen/am/pm4_nv.py /tmp/am_pm4_nv.py.bak
mv tinygrad/runtime/autogen/am/sdma_4_0_0.py /tmp/am_sdma_4_0_0.py.bak
mv tinygrad/runtime/autogen/am/sdma_5_0_0.py /tmp/am_sdma_5_0_0.py.bak
mv tinygrad/runtime/autogen/am/sdma_6_0_0.py /tmp/am_sdma_6_0_0.py.bak
mv tinygrad/runtime/autogen/am/smu_v13_0_0.py /tmp/am_smu_v13_0_0.py.bak
mv tinygrad/runtime/autogen/am/smu_v14_0_2.py /tmp/am_smu_v14_0_2.py.bak
python3 -c "from tinygrad.runtime.autogen import comgr, hsa, hip, amd_gpu, sqtt, rocprof; from tinygrad.runtime.autogen.am import am, pm4_soc15, pm4_nv, sdma_4_0_0, sdma_5_0_0, sdma_6_0_0, smu_v13_0_0, smu_v14_0_2"
diff /tmp/comgr.py.bak tinygrad/runtime/autogen/comgr.py
diff /tmp/hsa.py.bak tinygrad/runtime/autogen/hsa.py
diff /tmp/hip.py.bak tinygrad/runtime/autogen/hip.py
diff /tmp/amd_gpu.py.bak tinygrad/runtime/autogen/amd_gpu.py
diff /tmp/sqtt.py.bak tinygrad/runtime/autogen/sqtt.py
diff /tmp/rocprof.py.bak tinygrad/runtime/autogen/rocprof.py
diff /tmp/am_am.py.bak tinygrad/runtime/autogen/am/am.py
diff /tmp/am_pm4_soc15.py.bak tinygrad/runtime/autogen/am/pm4_soc15.py
diff /tmp/am_pm4_nv.py.bak tinygrad/runtime/autogen/am/pm4_nv.py
diff /tmp/am_sdma_4_0_0.py.bak tinygrad/runtime/autogen/am/sdma_4_0_0.py
diff /tmp/am_sdma_5_0_0.py.bak tinygrad/runtime/autogen/am/sdma_5_0_0.py
diff /tmp/am_sdma_6_0_0.py.bak tinygrad/runtime/autogen/am/sdma_6_0_0.py
diff /tmp/am_smu_v13_0_0.py.bak tinygrad/runtime/autogen/am/smu_v13_0_0.py
diff /tmp/am_smu_v14_0_2.py.bak tinygrad/runtime/autogen/am/smu_v14_0_2.py
- name: Verify Linux autogen
run: |
mv tinygrad/runtime/autogen/libc.py /tmp/libc.py.bak
mv tinygrad/runtime/autogen/kfd.py /tmp/kfd.py.bak
mv tinygrad/runtime/autogen/io_uring.py /tmp/io_uring.py.bak
mv tinygrad/runtime/autogen/ib.py /tmp/ib.py.bak
mv tinygrad/runtime/autogen/pci.py /tmp/pci.py.bak
mv tinygrad/runtime/autogen/vfio.py /tmp/vfio.py.bak
python3 -c "from tinygrad.runtime.autogen import cuda, nvrtc, nvjitlink, nv_570, nv_580, nv"
python3 -c "from tinygrad.runtime.autogen import comgr_3, hsa, hip, amd_gpu, sqtt, rocprof, amdgpu_kd, amdgpu_drm"
python3 -c "from tinygrad.runtime.autogen.am import am, pm4_soc15, pm4_nv, sdma_4_0_0, sdma_5_0_0, sdma_6_0_0, smu_v13_0_0, smu_v13_0_6, smu_v13_0_12, smu_v14_0_2"
python3 -c "from tinygrad.runtime.autogen import libc, kfd, io_uring, ib, pci, vfio"
diff /tmp/libc.py.bak tinygrad/runtime/autogen/libc.py
diff /tmp/kfd.py.bak tinygrad/runtime/autogen/kfd.py
diff /tmp/io_uring.py.bak tinygrad/runtime/autogen/io_uring.py
diff /tmp/ib.py.bak tinygrad/runtime/autogen/ib.py
diff /tmp/pci.py.bak tinygrad/runtime/autogen/pci.py
diff /tmp/vfio.py.bak tinygrad/runtime/autogen/vfio.py
- name: Verify LLVM autogen
run: |
mv tinygrad/runtime/autogen/llvm.py /tmp/llvm.py.bak
python3 -c "from tinygrad.runtime.autogen import llvm"
diff /tmp/llvm.py.bak tinygrad/runtime/autogen/llvm.py
- name: Verify WebGPU autogen
run: |
mv tinygrad/runtime/autogen/webgpu.py /tmp/webgpu.py.bak
python3 -c "from tinygrad.runtime.autogen import webgpu"
diff /tmp/webgpu.py.bak tinygrad/runtime/autogen/webgpu.py
- name: Verify Qualcomm autogen
run: |
mv tinygrad/runtime/autogen/kgsl.py /tmp/kgsl.py.bak
mv tinygrad/runtime/autogen/adreno.py /tmp/adreno.py.bak
mv tinygrad/runtime/autogen/qcom_dsp.py /tmp/qcom_dsp.py.bak
python3 -c "from tinygrad.runtime.autogen import kgsl, adreno, qcom_dsp"
diff /tmp/kgsl.py.bak tinygrad/runtime/autogen/kgsl.py
diff /tmp/adreno.py.bak tinygrad/runtime/autogen/adreno.py
diff /tmp/qcom_dsp.py.bak tinygrad/runtime/autogen/qcom_dsp.py
- name: Verify libusb autogen
run: |
mv tinygrad/runtime/autogen/libusb.py /tmp/libusb.py.bak
python3 -c "from tinygrad.runtime.autogen import kgsl, qcom_dsp"
python3 -c "from tinygrad.runtime.autogen import libusb"
diff /tmp/libusb.py.bak tinygrad/runtime/autogen/libusb.py
- name: Verify mesa autogen
run: |
mv tinygrad/runtime/autogen/mesa.py /tmp/mesa.py.bak
python3 -c "from tinygrad.runtime.autogen import mesa"
diff /tmp/mesa.py.bak tinygrad/runtime/autogen/mesa.py
- name: Verify libclang autogen
run: |
cp tinygrad/runtime/autogen/libclang.py /tmp/libclang.py.bak
python3 -c "from tinygrad.runtime.autogen import avcodec"
REGEN=1 python3 -c "from tinygrad.runtime.autogen import libclang"
diff /tmp/libclang.py.bak tinygrad/runtime/autogen/libclang.py
- name: Check for differences
run: |
if ! git diff --quiet; then
git diff
git diff > autogen-ubuntu.patch
echo "Autogen mismatch detected. Patch available at: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
exit 1
fi
- name: Upload patch artifact
if: failure()
uses: actions/upload-artifact@v4
with:
name: autogen-ubuntu-patch
path: autogen-ubuntu.patch

autogen-mac:
name: In-tree Autogen (macos)
runs-on: macos-14
Expand All @@ -146,13 +82,27 @@ jobs:
uses: ./.github/actions/setup-tinygrad
with:
llvm: 'true'
- name: Verify macos autogen
- name: Regenerate autogen files
run: |
rm tinygrad/runtime/autogen/metal.py tinygrad/runtime/autogen/iokit.py tinygrad/runtime/autogen/corefoundation.py
python3 -c "from tinygrad.runtime.autogen import metal, iokit, corefoundation"
- name: Check for differences
run: |
mv tinygrad/runtime/autogen/metal.py /tmp/metal.py.bak
LIBCLANG_PATH=/opt/homebrew/opt/llvm@20/lib/libclang.dylib python3 -c "from tinygrad.runtime.autogen import metal"
diff /tmp/metal.py.bak tinygrad/runtime/autogen/metal.py
autogen-comgr-3:
name: In-tree Autogen (comgr 3)
if ! git diff --quiet; then
git diff
git diff > autogen-macos.patch
echo "Autogen mismatch detected. Patch available at: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
exit 1
fi
- name: Upload patch artifact
if: failure()
uses: actions/upload-artifact@v4
with:
name: autogen-macos-patch
path: autogen-macos.patch

autogen-comgr-2:
name: In-tree Autogen (comgr 2)
runs-on: ubuntu-24.04
timeout-minutes: 15
steps:
Expand All @@ -164,13 +114,26 @@ jobs:
run: |
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
sudo tee /etc/apt/sources.list.d/rocm.list <<EOF
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4 $(lsb_release -cs) main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.2 $(lsb_release -cs) main
EOF
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt -qq update || true
sudo apt-get install -y --no-install-recommends libclang-20-dev comgr
- name: Verify comgr (3) autogen
- name: Regenerate autogen files
run: |
mv tinygrad/runtime/autogen/comgr_3.py /tmp/comgr_3.py.bak
python3 -c "from tinygrad.runtime.autogen import comgr_3"
diff /tmp/comgr_3.py.bak tinygrad/runtime/autogen/comgr_3.py
rm tinygrad/runtime/autogen/comgr.py
python3 -c "from tinygrad.runtime.autogen import comgr"
- name: Check for differences
run: |
if ! git diff --quiet; then
git diff
git diff > autogen-comgr2.patch
echo "Autogen mismatch detected. Patch available at: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
exit 1
fi
- name: Upload patch artifact
if: failure()
uses: actions/upload-artifact@v4
with:
name: autogen-comgr2-patch
path: autogen-comgr2.patch
Loading