Skip to content

Conversation

@noeyy-mino
Copy link
Contributor

@noeyy-mino noeyy-mino commented Jan 29, 2026

What does this PR do?

Type of change: new tests

Overview: Add new test cases for the newly added checkpoints on HuggingFace.

Usage

pytest test_deploy.py --run-release

None

Testing

None

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

None

Summary by CodeRabbit

  • New Features

    • Added support for NVFP4 model variants across multiple model families (DeepSeek, Llama, Qwen, and others).
  • Improvements

    • Enhanced backend availability detection to automatically identify and manage supported deployment backends at runtime.
  • Tests

    • Improved test infrastructure for better reproducibility and backend compatibility handling.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: noeyy-mino <174223378+noeyy-mino@users.noreply.github.com>
Signed-off-by: noeyy-mino <174223378+noeyy-mino@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Test utilities and test files updated to support backend availability detection with filtering and model parameterization changes. Deploy utilities now cache available backends at import time and filter configured backends accordingly. Test files updated to use temporary path parameters and NVFP4 model variants instead of FP4.

Changes

Cohort / File(s) Summary
Backend Detection and Filtering
tests/_test_utils/deploy_utils.py
Added global backend availability cache with get_available_backends() function; replaced per-backend ImportError handling with centralized checks; added NVFP4 model_id variants for TRTLLM and SGLang; implemented backend filtering in ModelDeployer and ModelDeployerList to exclude unavailable backends; added guard to prevent deployer generation when no backends remain available.
Test Path Refactoring
tests/examples/gpt_oss/test_gpt_oss_qat.py
Replaced hardcoded current directory references with tmp_path parameter across GPT-OSS 20B/120B training pipeline steps (SFT, QAT, conversion, deployment); removed pathlib import.
Model Parameterization Updates
tests/examples/llm_ptq/test_deploy.py
Updated test parameterizations to replace FP4 model IDs with NVFP4 equivalents (DeepSeek-R1, Llama-3.1/3.3, Qwen models) and added new NVFP4 variant entries across multiple model families.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Detect as Backend Detection
    participant Deploy as ModelDeployer
    participant Filter as Backend Filtering
    participant Gen as Deployer Generation
    participant Backends as Individual Backends<br/>(trtllm, vllm, sglang)

    App->>Detect: Import deploy_utils
    Detect->>Backends: Check availability
    Backends-->>Detect: Available set
    Detect->>App: Cache & print detected backends
    
    App->>Deploy: Create ModelDeployer with config
    Deploy->>Filter: Filter configured backends
    Filter->>Detect: Get available backends cache
    Detect-->>Filter: Available set
    Filter-->>Deploy: Filtered backends (unavailable removed)
    
    alt Backends available
        Deploy->>Gen: Generate deployers
        Gen->>Backends: Initialize each backend
        Backends-->>Gen: Deployer instances
        Gen-->>Deploy: Deployer list
        Deploy-->>App: ModelDeployer instance
    else No backends available
        Deploy->>Deploy: Return early (no deployers)
        Deploy-->>App: Empty or skipped
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title mentions adding test cases for newly added HF checkpoints, which aligns with the PR objectives and changes adding NVFP4 model variants and test cases.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@noeyy-mino noeyy-mino self-assigned this Jan 29, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/examples/llm_ptq/test_deploy.py`:
- Around line 481-488: The model_id value in the ModelDeployerList entry
(ModelDeployerList with base_model="Qwen/Qwen3-235B-A22B-Thinking-2507")
references a non-existent HF repo
("nvidia/Qwen3-235B-A22B-Thinking-2507-FP4-Eagle3"); update the model_id field
to a real HuggingFace repository—either use the Eagle3 head under nvidia
("nvidia/Qwen3-235B-A22B-Eagle3") or the FP4 Thinking-2507 checkpoint under
NVFP4 ("NVFP4/Qwen3-235B-A22B-Thinking-2507-FP4") depending on whether you need
the Eagle3 head or the FP4-quantized base, and keep the rest of the
ModelDeployerList fields (backend, tensor_parallel_size, mini_sm,
eagle3_one_model) unchanged.

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (e6e4efd) to head (9a92642).
⚠️ Report is 28 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #827      +/-   ##
==========================================
- Coverage   74.24%   73.82%   -0.43%     
==========================================
  Files         192      193       +1     
  Lines       19033    19745     +712     
==========================================
+ Hits        14132    14577     +445     
- Misses       4901     5168     +267     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant