Skip to content

Pull requests: patrick-toulme/axlearn

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

neuron changes for 1B,3B,8B models
#45 opened Jan 2, 2025 by aws-mengchiy Collaborator Loading…
skip previous trained batches
#43 opened Dec 20, 2024 by aws-zhenguo Collaborator Loading…
skip previous trained batches
#42 opened Dec 20, 2024 by aws-zhenguo Collaborator Loading…
imported os and added ckpt scripts
#41 opened Dec 19, 2024 by dgourab-aws Collaborator Loading…
Use default remat policy
#36 opened Dec 13, 2024 by apoorvtintin Collaborator Loading…
resume training with next batch of data
#32 opened Dec 11, 2024 by aws-zhenguo Collaborator Loading…
logit_bias support for NEW_UNSHARDED_ATTN_KERNEL
#24 opened Dec 4, 2024 by HahTK Collaborator Loading…
Jit cache
#23 opened Nov 26, 2024 by amithrm Collaborator Loading…
tokens_per_batch fixed to take into account DP and micro-batch accumu…
#13 opened Oct 25, 2024 by amithrm Collaborator Loading…
Gradient accumulation with single graph and lax.scan
#6 opened Apr 11, 2024 by apoorvtintin Collaborator Loading…
Multi graph gradient accumulation
#5 opened Apr 2, 2024 by apoorvtintin Collaborator Loading…
Refactor neuron changes to make it compatible with FS repo
#4 opened Mar 27, 2024 by aws-mengchiy Collaborator Loading…
gradient accumulation using optax multisteps*
#2 opened Mar 25, 2024 by apoorvtintin Collaborator Loading…
ProTip! Adding no:label will show everything without a label.