Skip to content

add llama2 70b lora training#3

Open
nbarbier-265 wants to merge 1224 commits intomasterfrom
llama2-70b-lora
Open

add llama2 70b lora training#3
nbarbier-265 wants to merge 1224 commits intomasterfrom
llama2-70b-lora

Conversation

@nbarbier-265
Copy link
Owner

add llama2 70b lora training

sirhcm and others added 30 commits February 4, 2026 20:10
tinygrad#14546)

* PYTHONREMU: VOP3P integer operations with constants don't cast to fp16

* put that back

* cleaner

* do that once
* test asm_gemm in CI

* default float16

* use a smaller shape for multi

* smaller size

* smaller for CI

* smaller for ci

* need half
* grad_b uses custom gemm

* fix multi backward, acc is in float32

* test_gemm_batched

* square gemm

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: qazal <qazal.software@gmail.com>
* PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp

* fix saturation in PYTHON_REMU

* simpler

* more tests, less lines

---------

Co-authored-by: Christopher Milan <chrismilan@ucla.edu>
symlink model not allowed in latest onnxruntime
* pin onnxruntime to 1.23.2 for DSP

* list ml_dtypes instead

This reverts commit 84bb2cc.
* dtype decomps don't require bitshifts

* simplify shr/shl

* ruff
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
* clean up linearize schedule [pr]

don't mix ScheduleItem and UOp in schedule queue

* ok
fixed wrong comments and simplified queue building
* rangeify always adds KernelInfo

* fix tests

* skip flaky test
* viz: cleanup amdgpu target mapping

* linter

* unwraps
* better

* bottom up earliest rewrites

* fix
* start

* x

* fix

* sdma

* c

* clean

* x

* hm

* cleaer
chenyuxyz and others added 26 commits February 17, 2026 10:30
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
concat schedules. separate out the execution part
automatically fixes is_realized issue for empty
LLVM should support eg, SHL/SHR, but this was never actually rendered
this can crash, not sure why. skip 100 to see if it's better
* assign should be used as buffer

* late removed

* the fix

* better fix

* backward slice
* assign after copy shouldn't contig

* fix assign copy
* viz: start displaying pma

* s

* work

* colors

* cleaner

* max packets

* fine

* work

* pma

* diff cleanup
* setFocus is the more clear name

* do less
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
remove dead assert, also make it more like a view
* double e2m1 values for mxfp4

* check if assert equal works in ci

* Revert "check if assert equal works in ci"

This reverts commit 8cf902c.

* remove unnecessary whitespace change

* add test case that fails for old implementation but passes for new

* add note that the previous test is bad

* clarification on the methodology for the test

* fix the indent problem that happened to skip this test

* for now update mxfp4 block test to similarly use allclose (bad)

* add gist link and clearer explanation of process for computing test data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.