Skip to content

Commit 61d0b82

Browse files
committed
Update MLX model patterns and reduce max_tokens in eval script
Added '-mlx-' to the list of MLX model patterns in should_use_mlx for broader matching. Reduced max_tokens from 32768 to 8192 in get_llm_response within eval_math500_benchmark.py to limit token usage.
1 parent e004b2e commit 61d0b82

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

optillm/inference.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,8 @@ def should_use_mlx(model_id: str) -> bool:
189189
# Models that should use MLX
190190
mlx_patterns = [
191191
"mlx-community/",
192-
"mlx-"
192+
"mlx-",
193+
"-mlx-"
193194
]
194195

195196
# Known problematic models that should prefer MLX on Apple Silicon

scripts/eval_math500_benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -692,7 +692,7 @@ def get_llm_response(problem: str, model: str) -> str:
692692
messages=[
693693
{"role": "user", "content": SYSTEM_PROMPT + "\n" + problem}
694694
],
695-
max_tokens=32768, # for thinking models, we need to use a lot more tokens
695+
max_tokens=8192, # for thinking models, we need to use a lot more tokens
696696
# extra_body = {
697697
# "decoding" : "thinkdeeper",
698698
# }

0 commit comments

Comments
 (0)