Open
Conversation
tinygrad#14546) * PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 * put that back * cleaner * do that once
* test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half
* grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>
* PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp * fix saturation in PYTHON_REMU * simpler * more tests, less lines --------- Co-authored-by: Christopher Milan <chrismilan@ucla.edu>
symlink model not allowed in latest onnxruntime
* pin onnxruntime to 1.23.2 for DSP * list ml_dtypes instead This reverts commit 84bb2cc.
* dtype decomps don't require bitshifts * simplify shr/shl * ruff
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
* clean up linearize schedule [pr] don't mix ScheduleItem and UOp in schedule queue * ok
fixed wrong comments and simplified queue building
* rangeify always adds KernelInfo * fix tests * skip flaky test
revert tinygrad#14478 which breaks tinyfs
* viz: cleanup amdgpu target mapping * linter * unwraps
int is less flaky
* better * bottom up earliest rewrites * fix
* start * x * fix * sdma * c * clean * x * hm * cleaer
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
concat schedules. separate out the execution part
automatically fixes is_realized issue for empty
LLVM should support eg, SHL/SHR, but this was never actually rendered
this can crash, not sure why. skip 100 to see if it's better
* assign should be used as buffer * late removed * the fix * better fix * backward slice
* assign after copy shouldn't contig * fix assign copy
* viz: start displaying pma * s * work * colors * cleaner * max packets * fine * work * pma * diff cleanup
* setFocus is the more clear name * do less
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
remove dead assert, also make it more like a view
…ygrad#14868) test can be flaky if gc happens in between
leftovers from ops_remote
* double e2m1 values for mxfp4 * check if assert equal works in ci * Revert "check if assert equal works in ci" This reverts commit 8cf902c. * remove unnecessary whitespace change * add test case that fails for old implementation but passes for new * add note that the previous test is bad * clarification on the methodology for the test * fix the indent problem that happened to skip this test * for now update mxfp4 block test to similarly use allclose (bad) * add gist link and clearer explanation of process for computing test data
f4bd71b to
14af697
Compare
14af697 to
9e3a807
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
add llama2 70b lora training