Skip to content

Conversation

yossiovadia
Copy link
Collaborator

@yossiovadia yossiovadia commented Sep 18, 2025

What type of PR is this?
test: improve e2e test suite reliability and system validation

What this PR does / why we need it:
Improves e2e test suite to properly validate core system components including semantic routing, jailbreak detection, PII policy enforcement, tool selection, and model selection. Tests now expose real
system issues instead of providing false positives.

Which issue(s) this PR fixes:
Fixes #

Release Notes:
No

Note that as backend, i was using ollama ( localhost:11434 ) and i was using OLLAMA_KEEP_ALIVE=0 to avoid models stay in memory.

Copy link

netlify bot commented Sep 18, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 2ab2523
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68d18431e616a90008a30e83
😎 Deploy Preview https://deploy-preview-173--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@yossiovadia
Copy link
Collaborator Author

Current status :

✅ PASSED - 00-client-request-test.py
✅ PASSED - 01-envoy-extproc-test.py
✅ PASSED - 02-router-classification-test.py
❌ FAILED - 03-jailbreak-test.py
✅ PASSED - 04-cache-test.py
✅ PASSED - 05-pii-policy-test.py
✅ PASSED - 06-tools-test.py
✅ PASSED - 07-model-selection-test.py
❌ FAILED - 08-metrics-test.py
❌ FAILED - 09-error-handling-test.py
✅ PASSED - test_base.py

❌ Some tests failed

I'll create issue per failure shortly.

Copy link

github-actions bot commented Sep 18, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 e2e-tests

Owners: @yossiovadia
Files changed:

  • e2e-tests/05-pii-policy-test.py
  • e2e-tests/06-tools-test.py
  • e2e-tests/07-model-selection-test.py
  • e2e-tests/TEST_STATUS_REPORT.md
  • e2e-tests/00-client-request-test.py
  • e2e-tests/01-envoy-extproc-test.py
  • e2e-tests/02-router-classification-test.py
  • e2e-tests/04-cache-test.py

📁 config

Owners: @rootfs
Files changed:

  • config/config.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

ENVOY_URL = "http://localhost:8801"
OPENAI_ENDPOINT = "/v1/chat/completions"
DEFAULT_MODEL = "qwen2.5:32b" # Changed to match other tests
DEFAULT_MODEL = "gemma3:27b" # Use configured model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossiovadia can you follow up with a PR to retrieve the models from v1/models endpoint?

rootfs
rootfs previously approved these changes Sep 18, 2025
Copy link
Collaborator

@rootfs rootfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge it for now and use this to test against the integration setup.

@rootfs
Copy link
Collaborator

rootfs commented Sep 18, 2025

@yossiovadia can you sign the DCO?

@rootfs
Copy link
Collaborator

rootfs commented Sep 19, 2025

@yossiovadia can you sign DCO?

In your local branch, run: git rebase HEAD~13 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin revive-e2e-tests

yossiovadia and others added 16 commits September 22, 2025 10:15
- Add new e2e test files: jailbreak, pii-policy, tools, model-selection, metrics, error-handling tests
- Update existing e2e tests: client-request, envoy-extproc, router-classification, cache tests
- Add CLAUDE.md with project documentation and instructions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Increase timeouts from 10s to 30s in failing test files
- Update config health check from /health to /api/version for Ollama compatibility
- Fix metrics naming expectations in jailbreak, PII, and general metrics tests

Co-Authored-By: Claude <noreply@anthropic.com>
Updated timeout values in 5 test files to prevent timeout failures:
- 00-client-request-test.py
- 04-cache-test.py
- 06-tools-test.py
- 07-model-selection-test.py
- 09-error-handling-test.py

This should resolve the remaining timeout issues seen in local testing.
…ailbreak blocking tests

- Remove permissive 503 acceptance from benign request tests
- Add new test_jailbreak_attempts_blocked() to test actual security
- Require 200 status for benign requests (proper service validation)
- Require 4xx status for jailbreak attempts (proper security blocking)
- This will expose real security vulnerabilities instead of hiding them

These changes make tests fail when they should, revealing actual system issues.

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
…ection

- Add test_auto_routing_intelligence() to verify semantic routing works
- Use model='auto' to trigger intelligent routing (not fixed model)
- Test that math problems route to phi4 (highest score: 1.0)
- Test that creative writing routes to different model than phi4
- Validate that different query types get different models
- This will expose if routing intelligence is actually working

This test will fail if the router just returns the same model for all queries.

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
…comprehensive status report

- Remove HTTP 503 acceptance from PII, tools, model selection, and error handling tests
- Tests now require 200 status codes for successful operations
- Service failures now properly fail tests instead of false positives
- Add comprehensive TEST_STATUS_REPORT.md documenting all test improvements
- Expose real system bugs: input validation gaps, jailbreak blocking issues
- 6/11 test files now hardened and provide reliable system health assessment

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
- Simplified TEST_STATUS_REPORT.md to focus only on current system issues
- Remove resolved/fixed sections that are no longer relevant
- Replace (200, 200) ranges with simple 200 expected status in error handling
- Remove redundant 'no 503 accepted' comments from all test files
- Clean up unnecessary verbosity while maintaining test functionality

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
- Replace 'DAN role-play jailbreak' with clear 'Role-play jailbreak attempt'
- Improve readability by removing technical jargon from failure descriptions

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Add DCO signoffs to 00, 01, 04, and 08 test files to complete
DCO compliance for all files in the PR.

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
* metrics: Add request-level token histograms

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

* add unknown const

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

---------

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Fix the copy command for tools directory in Dockerfile.

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
* feat: add basic cache eviction policy: LRU/LFU/FIFO

Signed-off-by: Alex Wang <yesterda9@gmail.com>

* use EvictionPolicyType

Signed-off-by: Alex Wang <yesterda9@gmail.com>

* update doc

Signed-off-by: Alex Wang <yesterda9@gmail.com>

---------

Signed-off-by: Alex Wang <yesterda9@gmail.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
rootfs and others added 7 commits September 22, 2025 10:15
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: bitliu <bitliu@tencent.com>
* infra: update Dockerfile.extproc

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add precommit container, make it easier to run precommit

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Remove 03-jailbreak-test.py, 08-metrics-test.py, and 09-error-handling-test.py
to be implemented in separate PRs with full backend functionality.

This keeps the current PR focused on passing tests for clean merge.

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
@yossiovadia
Copy link
Collaborator Author

Closing this PR due to DCO complications from mixed commits. Creating fresh PR with clean e2e tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants