Skip to content

Add agent CLI, Qwen3.5 vLLM support, and Docker improvements#7

Merged
zhijian-liu merged 1 commit intomainfrom
dev/agent
Mar 8, 2026
Merged

Add agent CLI, Qwen3.5 vLLM support, and Docker improvements#7
zhijian-liu merged 1 commit intomainfrom
dev/agent

Conversation

@zhijian-liu
Copy link
Member

Summary

  • Agent CLI: New paroquant.cli.agent with MCP tool calling (web fetch, filesystem, time), warmup request for kernel compilation
  • Qwen3.5 vLLM fix: Pad Marlin partitions to 64-tile boundary for small output dims; fix modules_to_not_convert detection for hybrid Mamba architectures (leaf-module-only filtering + nesting-agnostic suffix matching)
  • Serve CLI: Unified paroquant.cli.serve auto-detecting vLLM/MLX backend
  • Docker: Bump vLLM to 0.17.0, add TRITON_PTXAS_BLACKWELL_PATH for Jetson Thor
  • README: Qwen3.5 examples, agent section, install notes
  • pyproject.toml: Add agent optional dependency group

Supersedes #6.

Test plan

  • paroquant.cli.serve with z-lab/Qwen3.5-4B-PARO on vLLM 0.17 (RTX PRO 6000 Blackwell)
  • paroquant.cli.serve with z-lab/Qwen3-8B-PARO on vLLM 0.17
  • Verified Marlin tile padding triggers and trims correctly
  • Verified modules_to_not_convert detects leaf modules only, no false matches on container modules
  • Agent CLI tested with MCP tools on MLX backend

Made with Cursor

- Add paroquant.cli.agent: interactive agent with MCP tool calling
- Unify paroquant.cli.serve: auto-detect vLLM/MLX backend
- Fix vLLM plugin for Qwen3.5: pad Marlin partitions to tile boundary,
  fix modules_to_not_convert for hybrid Mamba architectures
- Add warmup request in chat and agent for kernel compilation
- Bump Docker vLLM to 0.17.0, add TRITON_PTXAS_BLACKWELL_PATH for Jetson Thor
- Update README with Qwen3.5 examples, agent usage, and install notes
- Add agent optional dependency group (qwen-agent, mcp, soundfile)

Made-with: Cursor
@zhijian-liu zhijian-liu merged commit 2be645f into main Mar 8, 2026
@zhijian-liu zhijian-liu deleted the dev/agent branch March 8, 2026 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant