coremlmodels

Convert PyTorch language models to CoreML format.

This library provides utilities to convert HuggingFace transformer models to CoreML, with support for stateful KV caching, chunked models for large architectures, and pre-compiled model caching for faster inference.

Installation

uv sync

Pre-converted Models

Model	Input Length	Context Length	Link
Qwen3-1.7B	8	2048	seba/Qwen3-1.7B-CoreML-input-8-ctx-2048
Qwen3-4B-Instruct-2507	8	2048	seba/Qwen3-4B-Instruct-2507-CoreML-input-8-ctx-2048

Model Conversion

Convert a model with embeddings and LM head export:

uv run python examples/lm_conversion_example.py --model Qwen/Qwen3-1.7B --output qwen3_1.7b --num-chunks 2 --export-embeddings --export-lm-head --cache-compiled

For large models, convert chunks individually to reduce memory usage:

uv run python examples/lm_conversion_example.py --model Qwen/Qwen3-4B-Instruct-2507 --output qwen3_4b_instruct_2507 --num-chunks 4 --chunk-index 2 --skip-model-load

GLM-OCR conversion:

uv run python examples/glm_ocr_text_conversion.py --export-lm-head --export-embeddings
uv run python examples/vision_conversion_example.py
uv run python examples/glm_ocr_mtp_conversion.py

Inference

Run inference with a converted model:

uv run python examples/inference.py --model-dir ./qwen3_4b_instruct_2507/ --model-name Qwen/Qwen3-4B-Instruct-2507 --max-new-tokens 2048 --chunked --num-chunks 4 --cache-compiled

GLM-OCR CoreML inference:

uv run python examples/glm_ocr_coreml_inference.py \
  --image ./assets/realworld.png \
  --vision-model ./glm_ocr_vision.mlpackage \
  --text-model ./glm_ocr_text_seqlen_8.mlpackage \
  --lm-head ./glm_ocr_lm_head.mlpackage \
  --embeddings ./glm_ocr_embeddings.npy --cache-compiled --stream

GLM-OCR with MTP speculative decoding (~2x faster):

uv run python examples/glm_ocr_coreml_inference.py \
  --image ./assets/realworld.png \
  --vision-model ./glm_ocr_vision.mlpackage \
  --text-model ./glm_ocr_text_seqlen_8.mlpackage \
  --lm-head ./glm_ocr_lm_head.mlpackage \
  --embeddings ./glm_ocr_embeddings.npy \
  --mtp-model ./glm_ocr_mtp_seqlen_1.mlpackage \
  --num-spec-steps 3 --cache-compiled --stream

Supported Architectures

Qwen2
Qwen3

Limitations

Fixed cache length: KV cache size is set at conversion time and cannot be changed at runtime
Fixed sequence length: Input sequence length is fixed for both prompt processing and token generation. CoreML multifunction models can address this by providing separate functions for different sequence lengths
Model size limit (~2GB): Neural Engine can only load models up to ~2GB, requiring chunked conversion for larger models
FP16 precision: Computations run in FP16, which may affect numerical precision for some operations

Documentation

For technical details, implementation guides, and development workflows, see docs/AGENTS.md.

Additional documentation:

CONVERSION_GUIDE.md - Detailed conversion options
INFERENCE_GUIDE.md - Inference configuration
AGENT_INDEX.md - Runtime block index for coding agents

Development

# Run tests
uv run pytest tests/ -v

# Lint code
uv run ruff check .

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
docs		docs
examples		examples
src/coremlmodels		src/coremlmodels
tests		tests
tools/agent_index_lib		tools/agent_index_lib
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coremlmodels

Installation

Pre-converted Models

Model Conversion

Inference

Supported Architectures

Limitations

Documentation

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

coremlmodels

Installation

Pre-converted Models

Model Conversion

Inference

Supported Architectures

Limitations

Documentation

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages