nntrainer-causallm-models

Pre-quantized CausalLM model weights for nntrainer CI and benchmarks.

Model Catalog

Directory	Model	Platform	FC	Embedding	LM head	Tied	Source	Bin Size
`qwen3-0.6b-q40-x86`	Qwen3-0.6B	x86_64 Linux	Q4_0	Q4_0	Q4_0	yes*	Qwen/Qwen3-0.6B	~404 MB
`qwen3-0.6b-q40-q6k-x86`	Qwen3-0.6B	x86_64 Linux	Q4_0	Q6_K	Q6_K	yes	Qwen/Qwen3-0.6B	~359 MB

* qwen3-0.6b-q40-x86 is tied in the HuggingFace config but currently cannot run inference with nntrainer's tie_word_embedding layer because the tied path only accepts Q6_K or FP32 weights. Use qwen3-0.6b-q40-q6k-x86 when you need tied inference to work end-to-end (Quick.AI unit tests, nntrainer CausalLM smoke tests, etc.).

Q4_0 Platform Lock

Q4_0 quantization produces platform-specific binary formats. An x86-quantized .bin is NOT compatible with ARM, and vice versa. The directory suffix (-x86, -arm) encodes the target architecture.

Storage Format

Large .bin files are split into ~95 MB parts (.bin.part_aa, .bin.part_ab, ...) to stay under GitHub's 100 MB per-file limit. Each model directory includes a combine.sh script to reassemble and verify the full binary.

Bin parts are not always pre-committed for bandwidth reasons. When a directory ships only metadata (combine.sh, SHA256SUMS, nntr_config.json, tokenizer files) you can rebuild the parts locally by running the matching script under scripts/.

Usage in CI

git clone --depth 1 --branch main \
    https://github.com/eunjuyang/nntrainer-causallm-models.git models

# Reassemble the weight binary
cd models/qwen3-0.6b-q40-x86
chmod +x combine.sh && ./combine.sh

# Verify integrity (optional)
sha256sum -c SHA256SUMS

Then run inference:

./build/Applications/CausalLM/nntr_causallm models/qwen3-0.6b-q40-x86

Reproducing the Models

Directory	Recipe
`qwen3-0.6b-q40-x86`	`scripts/convert_qwen3_0.6b.sh`
`qwen3-0.6b-q40-q6k-x86`	`scripts/convert_qwen3_0.6b_q6k_lmhead.sh`

The Q4_0 recipe requires a locally-built nntrainer with -Denable-transformer=true. The Q6_K-lmhead recipe can use either nntrainer's nntr_quantize or Quick.AI's quick_dot_ai_quantize, both of which accept --fc_dtype, --embd_dtype and --lmhead_dtype.

License

Model weights are subject to their upstream license (see respective HuggingFace model cards). CI tooling in this repository is Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
qwen3-0.6b-q40-q6k-x86		qwen3-0.6b-q40-q6k-x86
qwen3-0.6b-q40-x86		qwen3-0.6b-q40-x86
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nntrainer-causallm-models

Model Catalog

Q4_0 Platform Lock

Storage Format

Usage in CI

Reproducing the Models

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nntrainer-causallm-models

Model Catalog

Q4_0 Platform Lock

Storage Format

Usage in CI

Reproducing the Models

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages