-
Notifications
You must be signed in to change notification settings - Fork 21
mlx-community/GLM-4.7-Flash-bf16 won't load #16
Description
Trying to start krasis 0.1.64 with the model mlx-community/GLM-4.7-Flash-bf16, but errors out.
Using with WSL2 on an AMD 5900X with Nvidia 4600 ti 16GB and 64GB system RAM.
Qwen3_Coder-next and Qwen3.5-35B-A3B worked fine.
Error log:
▸ Loading GPU weights
2026-03-27 12:51:20,914 krasis.model INFO Phase 1: Loading GPU weights (streaming INT8)...
2026-03-27 12:51:20,914 krasis.model INFO Resident attention: all 47 layers permanently on GPU0, 1 GPUs for EP
2026-03-27 12:51:20,914 krasis.model INFO Loading full base model to cuda:0...
2026-03-27 12:51:20,914 krasis.weight_loader INFO Loading embedding: model.embed_tokens.weight
2026-03-27 12:51:21,067 krasis.server CRITICAL Uncaught exception
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/server.py", line 1856, in
main()
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/server.py", line 943, in main
_model.load(gpu_only=gpu_only)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/model.py", line 635, in load
self._load_gpu_weights(loader)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/model.py", line 1063, in _load_gpu_weights
weights = loader.load_layer(layer_idx, primary_dev, attn_device=_attn_dev)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 492, in load_layer
result["attention"] = self.load_attention_weights(
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 202, in load_attention_weights
return self._load_mla_attention(layer_idx, device)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 233, in _load_mla_attention
kv_b = self._load_bf16(f"{prefix}.kv_b_proj.weight", device)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 149, in _load_bf16
w = self._read_tensor(name).to(torch.bfloat16)
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 131, in _read_tensor
shard_name = self._weight_map[name]
KeyError: 'model.layers.0.self_attn.kv_b_proj.weight'
2026-03-27 12:51:21,069 krasis.server ERROR [stderr] Traceback (most recent call last):
Traceback (most recent call last):
2026-03-27 12:51:21,070 krasis.server ERROR [stderr] File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2026-03-27 12:51:21,070 krasis.server ERROR [stderr] return _run_code(code, main_globals, None,
return _run_code(code, main_globals, None,
2026-03-27 12:51:21,070 krasis.server ERROR [stderr] File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2026-03-27 12:51:21,070 krasis.server ERROR [stderr] exec(code, run_globals)
exec(code, run_globals)
2026-03-27 12:51:21,070 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/server.py", line 1856, in
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/server.py", line 1856, in
2026-03-27 12:51:21,070 krasis.server ERROR [stderr] main()
main()
2026-03-27 12:51:21,071 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/server.py", line 943, in main
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/server.py", line 943, in main
2026-03-27 12:51:21,071 krasis.server ERROR [stderr] _model.load(gpu_only=gpu_only)
_model.load(gpu_only=gpu_only)
2026-03-27 12:51:21,071 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/model.py", line 635, in load
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/model.py", line 635, in load
2026-03-27 12:51:21,071 krasis.server ERROR [stderr] self._load_gpu_weights(loader)
self._load_gpu_weights(loader)
2026-03-27 12:51:21,071 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/model.py", line 1063, in _load_gpu_weights
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/model.py", line 1063, in _load_gpu_weights
2026-03-27 12:51:21,071 krasis.server ERROR [stderr] weights = loader.load_layer(layer_idx, primary_dev, attn_device=_attn_dev)
weights = loader.load_layer(layer_idx, primary_dev, attn_device=_attn_dev)
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 492, in load_layer
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 492, in load_layer
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] result["attention"] = self.load_attention_weights(
result["attention"] = self.load_attention_weights(
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 202, in load_attention_weights
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 202, in load_attention_weights
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] return self._load_mla_attention(layer_idx, device)
return self._load_mla_attention(layer_idx, device)
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 233, in _load_mla_attention
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 233, in _load_mla_attention
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] kv_b = self._load_bf16(f"{prefix}.kv_b_proj.weight", device)
kv_b = self._load_bf16(f"{prefix}.kv_b_proj.weight", device)
2026-03-27 12:51:21,072 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 149, in _load_bf16
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 149, in _load_bf16
2026-03-27 12:51:21,073 krasis.server ERROR [stderr] w = self._read_tensor(name).to(torch.bfloat16)
w = self._read_tensor(name).to(torch.bfloat16)
2026-03-27 12:51:21,073 krasis.server ERROR [stderr] File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 131, in _read_tensor
File "/home/xxx/.krasis/venv/lib/python3.10/site-packages/krasis/weight_loader.py", line 131, in _read_tensor
2026-03-27 12:51:21,073 krasis.server ERROR [stderr] shard_name = self._weight_map[name]
shard_name = self._weight_map[name]
2026-03-27 12:51:21,073 krasis.server ERROR [stderr] KeyError
KeyError2026-03-27 12:51:21,073 krasis.server ERROR [stderr] :
: 2026-03-27 12:51:21,073 krasis.server ERROR [stderr] 'model.layers.0.self_attn.kv_b_proj.weight'
'model.layers.0.self_attn.kv_b_proj.weight'