Skip to content

Conversation

yuanjingx87
Copy link
Collaborator

@yuanjingx87 yuanjingx87 commented Sep 29, 2025

Summary by CodeRabbit

  • Chores

    • Updated CI configuration for the L0 test pipeline to use a different shared library version. This change is limited to internal build/test environments and does not affect product functionality.
  • Tests

    • Removed one waiver entry, re-enabling a previously skipped integration test in automated runs. This impacts test execution only, with no user-facing behavior changes.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@yuanjingx87 yuanjingx87 requested review from a team as code owners September 29, 2025 19:04
@yuanjingx87 yuanjingx87 marked this pull request as draft September 29, 2025 19:04
@yuanjingx87
Copy link
Collaborator Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1"

Copy link
Contributor

coderabbitai bot commented Sep 29, 2025

📝 Walkthrough

Walkthrough

Updated Jenkins pipeline to load a different Bloom shared library branch. Removed one waiver entry from the integration test skip list.

Changes

Cohort / File(s) Summary
CI pipeline config
jenkins/L0_Test.groovy
Switches Bloom shared library reference from bloom-jenkins-shared-lib@main to bloom-jenkins-shared-lib@dev-yuanjingx-block_failed_nodes. trtllm-jenkins-shared-lib@main unchanged. No other code edits.
Test waivers list
tests/integration/test_lists/waives.txt
Removes skip entry: accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus_online_eplb[mtp_nextn=2].

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The PR title "test gb200" does not accurately reflect the main changes in this pull request. The changeset modifies a Jenkins shared library reference from the main branch to a development branch (dev-yuanjingx-block_failed_nodes) and removes a waived test entry for DeepSeekV3Lite. While "gb200" might refer to hardware being tested, the title doesn't describe the actual code changes: switching to a dev version of the Jenkins library and enabling a previously skipped test. The title appears to describe an intent or goal rather than what was actually changed in the code. Update the PR title to accurately reflect the actual changes. Consider a title that mentions the Jenkins library branch change and/or the test enablement, such as "[None][infra] Switch to dev Jenkins library branch for blocked nodes handling" or "[None][test] Enable DeepSeekV3Lite test and update Jenkins library reference". If this is specifically for GB200 testing, clarify how these changes relate to that hardware in both the title and description.
Description Check ⚠️ Warning The PR description is entirely unfilled and consists only of the template with placeholder comments. None of the required sections have been completed: the Description section (explaining the issue and solution) is empty, the Test Coverage section (listing relevant tests) is empty, and the PR Checklist remains unchecked. The description provides no information about what changes were made, why they were necessary, or how they were tested. This makes it impossible for reviewers to understand the context and purpose of the pull request. Complete all required sections of the PR description template. In the Description section, explain why the Jenkins library reference was changed to the dev branch and why the DeepSeekV3Lite test was re-enabled. In the Test Coverage section, list the specific tests that validate these changes (at minimum, mention the DeepSeekV3Lite test that was un-waived). Check the appropriate boxes in the PR Checklist after verifying each item applies to this PR.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 334e2ca and 80739e5.

📒 Files selected for processing (2)
  • jenkins/L0_Test.groovy (1 hunks)
  • tests/integration/test_lists/waives.txt (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@@ -1,4 +1,4 @@
@Library(['bloom-jenkins-shared-lib@main', 'trtllm-jenkins-shared-lib@main']) _
@Library(['bloom-jenkins-shared-lib@dev-yuanjingx-block_failed_nodes', 'trtllm-jenkins-shared-lib@main']) _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify this dev branch change is intentional and temporary.

The shared library bloom-jenkins-shared-lib has been switched from @main to @dev-yuanjingx-block_failed_nodes. This appears to be a development/testing change.

Risks:

  • Dev branches may be unstable, deleted, or force-pushed
  • Pipeline behavior becomes unpredictable if the branch diverges
  • Other users running this pipeline will use your experimental branch

If this is for testing the "block_failed_nodes" feature:

  • Consider using a feature flag or parameter instead
  • Document why this change is needed in the PR description
  • Plan to revert to @main before merging, or ensure the dev branch is production-ready and will be maintained

🏁 Script executed:

#!/bin/bash
# Check if this dev branch reference appears in other pipeline files
rg -n "bloom-jenkins-shared-lib@dev-yuanjingx" --type=groovy

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Check all @Library declarations to understand normal patterns
rg -n "@Library" --type=groovy -C2

Length of output: 1011


🏁 Script executed:

#!/bin/bash
# Check git log for this specific file to see the previous state
git log -n 5 --oneline -- jenkins/L0_Test.groovy

Length of output: 306


🏁 Script executed:

#!/bin/bash
# Show the diff for this file to see what changed
git diff HEAD~1 HEAD -- jenkins/L0_Test.groovy

Length of output: 428


Do not merge this dev branch reference to production.

The bloom-jenkins-shared-lib has been changed from @main to @dev-yuanjingx-block_failed_nodes. All other pipeline files in this repository use @main:

  • jenkins/BuildDockerImage.groovy uses @main
  • jenkins/Build.groovy uses @main
  • jenkins/L0_MergeRequest.groovy uses @main

This dev branch reference will cause all users running the L0_Test pipeline to execute your experimental code. If testing the "block_failed_nodes" feature is necessary, revert this line to @main before merging, or merge your changes to the shared library's main branch first.

🤖 Prompt for AI Agents
In jenkins/L0_Test.groovy around line 1, the shared library reference was
changed from @main to @dev-yuanjingx-block_failed_nodes which will cause the
pipeline to run experimental code for all users; revert the library reference
back to @main (or merge your feature branch into the shared library's main) so
the file uses @main like the other pipeline files, and ensure you do not leave
any dev-branch references before merging.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20292 [ run ] triggered by Bot

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
@yuanjingx87 yuanjingx87 force-pushed the user/yuanjingx/test_gb200 branch from 80739e5 to d7ac37a Compare September 29, 2025 20:17
@yuanjingx87
Copy link
Collaborator Author

/bot run --stage-list "GB200-4_GPUs-PyTorch-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20292 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15303 (Partly Tested) completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20297 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20297 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15307 (Partly Tested) completed with status: 'FAILURE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants