Add agent CLI, Qwen3.5 vLLM support, and Docker improvements#7
Merged
zhijian-liu merged 1 commit intomainfrom Mar 8, 2026
Merged
Add agent CLI, Qwen3.5 vLLM support, and Docker improvements#7zhijian-liu merged 1 commit intomainfrom
zhijian-liu merged 1 commit intomainfrom
Conversation
2 tasks
- Add paroquant.cli.agent: interactive agent with MCP tool calling - Unify paroquant.cli.serve: auto-detect vLLM/MLX backend - Fix vLLM plugin for Qwen3.5: pad Marlin partitions to tile boundary, fix modules_to_not_convert for hybrid Mamba architectures - Add warmup request in chat and agent for kernel compilation - Bump Docker vLLM to 0.17.0, add TRITON_PTXAS_BLACKWELL_PATH for Jetson Thor - Update README with Qwen3.5 examples, agent usage, and install notes - Add agent optional dependency group (qwen-agent, mcp, soundfile) Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
paroquant.cli.agentwith MCP tool calling (web fetch, filesystem, time), warmup request for kernel compilationmodules_to_not_convertdetection for hybrid Mamba architectures (leaf-module-only filtering + nesting-agnostic suffix matching)paroquant.cli.serveauto-detecting vLLM/MLX backendTRITON_PTXAS_BLACKWELL_PATHfor Jetson Thoragentoptional dependency groupSupersedes #6.
Test plan
paroquant.cli.servewithz-lab/Qwen3.5-4B-PAROon vLLM 0.17 (RTX PRO 6000 Blackwell)paroquant.cli.servewithz-lab/Qwen3-8B-PAROon vLLM 0.17modules_to_not_convertdetects leaf modules only, no false matches on container modulesMade with Cursor