-
Notifications
You must be signed in to change notification settings - Fork 139
Revive e2e tests #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revive e2e tests #173
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Current status : ✅ PASSED - 00-client-request-test.py ❌ Some tests failed I'll create issue per failure shortly. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
ENVOY_URL = "http://localhost:8801" | ||
OPENAI_ENDPOINT = "/v1/chat/completions" | ||
DEFAULT_MODEL = "qwen2.5:32b" # Changed to match other tests | ||
DEFAULT_MODEL = "gemma3:27b" # Use configured model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yossiovadia can you follow up with a PR to retrieve the models from v1/models
endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's merge it for now and use this to test against the integration setup.
@yossiovadia can you sign the DCO? |
b1395a8
to
4c6715e
Compare
@yossiovadia can you sign DCO?
|
- Add new e2e test files: jailbreak, pii-policy, tools, model-selection, metrics, error-handling tests - Update existing e2e tests: client-request, envoy-extproc, router-classification, cache tests - Add CLAUDE.md with project documentation and instructions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Increase timeouts from 10s to 30s in failing test files - Update config health check from /health to /api/version for Ollama compatibility - Fix metrics naming expectations in jailbreak, PII, and general metrics tests Co-Authored-By: Claude <noreply@anthropic.com>
Updated timeout values in 5 test files to prevent timeout failures: - 00-client-request-test.py - 04-cache-test.py - 06-tools-test.py - 07-model-selection-test.py - 09-error-handling-test.py This should resolve the remaining timeout issues seen in local testing.
…ailbreak blocking tests - Remove permissive 503 acceptance from benign request tests - Add new test_jailbreak_attempts_blocked() to test actual security - Require 200 status for benign requests (proper service validation) - Require 4xx status for jailbreak attempts (proper security blocking) - This will expose real security vulnerabilities instead of hiding them These changes make tests fail when they should, revealing actual system issues. Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
…ection - Add test_auto_routing_intelligence() to verify semantic routing works - Use model='auto' to trigger intelligent routing (not fixed model) - Test that math problems route to phi4 (highest score: 1.0) - Test that creative writing routes to different model than phi4 - Validate that different query types get different models - This will expose if routing intelligence is actually working This test will fail if the router just returns the same model for all queries. Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
…comprehensive status report - Remove HTTP 503 acceptance from PII, tools, model selection, and error handling tests - Tests now require 200 status codes for successful operations - Service failures now properly fail tests instead of false positives - Add comprehensive TEST_STATUS_REPORT.md documenting all test improvements - Expose real system bugs: input validation gaps, jailbreak blocking issues - 6/11 test files now hardened and provide reliable system health assessment Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
- Simplified TEST_STATUS_REPORT.md to focus only on current system issues - Remove resolved/fixed sections that are no longer relevant - Replace (200, 200) ranges with simple 200 expected status in error handling - Remove redundant 'no 503 accepted' comments from all test files - Clean up unnecessary verbosity while maintaining test functionality Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
- Replace 'DAN role-play jailbreak' with clear 'Role-play jailbreak attempt' - Improve readability by removing technical jargon from failure descriptions Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Add DCO signoffs to 00, 01, 04, and 08 test files to complete DCO compliance for all files in the PR. Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
* metrics: Add request-level token histograms Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * add unknown const Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> --------- Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Fix the copy command for tools directory in Dockerfile. Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
* feat: add basic cache eviction policy: LRU/LFU/FIFO Signed-off-by: Alex Wang <yesterda9@gmail.com> * use EvictionPolicyType Signed-off-by: Alex Wang <yesterda9@gmail.com> * update doc Signed-off-by: Alex Wang <yesterda9@gmail.com> --------- Signed-off-by: Alex Wang <yesterda9@gmail.com> Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: bitliu <bitliu@tencent.com>
* infra: update Dockerfile.extproc Signed-off-by: yuluo-yx <yuluo08290126@gmail.com> * feat: add precommit container, make it easier to run precommit Signed-off-by: yuluo-yx <yuluo08290126@gmail.com> --------- Signed-off-by: yuluo-yx <yuluo08290126@gmail.com> Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Remove 03-jailbreak-test.py, 08-metrics-test.py, and 09-error-handling-test.py to be implemented in separate PRs with full backend functionality. This keeps the current PR focused on passing tests for clean merge. Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
5ff3e56
to
2ab2523
Compare
Closing this PR due to DCO complications from mixed commits. Creating fresh PR with clean e2e tests. |
What type of PR is this?
test: improve e2e test suite reliability and system validation
What this PR does / why we need it:
Improves e2e test suite to properly validate core system components including semantic routing, jailbreak detection, PII policy enforcement, tool selection, and model selection. Tests now expose real
system issues instead of providing false positives.
Which issue(s) this PR fixes:
Fixes #
Release Notes:
No
Note that as backend, i was using ollama ( localhost:11434 ) and i was using OLLAMA_KEEP_ALIVE=0 to avoid models stay in memory.