Skip to content

Use model's generation_config.json for default sampling parameters#1031

Open
eyupcanakman wants to merge 1 commit intoml-explore:mainfrom
eyupcanakman:feat/generation-config-defaults-140
Open

Use model's generation_config.json for default sampling parameters#1031
eyupcanakman wants to merge 1 commit intoml-explore:mainfrom
eyupcanakman:feat/generation-config-defaults-140

Conversation

@eyupcanakman
Copy link
Copy Markdown
Contributor

@eyupcanakman eyupcanakman commented Mar 20, 2026

Fixes #140.

Models like Phi-4 ship a generation_config.json with sampling defaults (temperature, top_p, etc.) but mlx-lm only read eos_token_id from it. Now generate, chat, and server read these values and use them when the user does not specify an explicit override.

The priority chain is: user CLI arg > generation_config.json > hardcoded default.

The server resolves defaults lazily per request so model hot-swapping picks up the new model's config correctly.

Also adds --min-p and --top-k CLI args to chat.py (already present in generate.py) so generation_config values for those keys are not silently ignored.

@Thump604
Copy link
Copy Markdown

The three-tier priority (per-request > generation_config.json > hardcoded defaults) and the lazy resolve_default() on ModelProvider are well designed. One edge case: some HF configs set do_sample: true with temperature: 1.0, which effectively means "use default sampling." The current do_sample: false -> temp: 0.0 mapping is correct, but when do_sample: true the code probably shouldn't inject the config's temperature since 1.0 is just the HF default, not an intentional override.

Models like Phi-4 ship a generation_config.json with sampling defaults
(temperature, top_p, etc.) but mlx-lm only read eos_token_id from it.

Now generate, chat, and server read these values and use them when the
user does not specify an explicit override. The priority chain is:
user CLI arg > generation_config.json > hardcoded default.

The server resolves defaults lazily per request so model hot-swapping
picks up the new model's config correctly.

Also adds --min-p and --top-k CLI args to chat.py (already present in
generate.py) so generation_config values for those keys are not silently
ignored.

Fixes ml-explore#140
@eyupcanakman eyupcanakman force-pushed the feat/generation-config-defaults-140 branch from a23f530 to f9bbfe3 Compare March 22, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use a model's default generation config if it exists

2 participants