-
Notifications
You must be signed in to change notification settings - Fork 22
test(evals): expand eval_data Q&A coverage #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test(evals): expand eval_data Q&A coverage #279
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ItzikEzra-rh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @ItzikEzra-rh. Thanks for your PR. I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
WalkthroughAdds many new evaluation conversation groups to Changes
Sequence Diagram(s)(No sequence diagrams generated — changes are primarily test data additions and a small utility script without substantial multi-component control flow requiring visualization.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/lgtm |
|
@eliajahshan: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation
- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation
50b8b49 to
631ee14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@test/evals/eval_data.yaml`:
- Around line 749-752: The test's eval_query contains an incorrectly formatted
UUID (last segment has 10 hex chars) which can make the test fail for format
validation instead of intent handling; update the cluster ID in the eval_query
string to a syntactically valid UUID (e.g., change
12345678-1234-1234-1234-1234567890 to a 36-char UUID such as
12345678-1234-1234-1234-1234567890ab) so the test exercises the intended
non-existent-cluster boundary case rather than a malformed-ID error.
🧹 Nitpick comments (2)
test/evals/eval_data.yaml (2)
579-593: Consider adding a verification script for version filtering.The test expects the response to list versions "compatible with the s390x CPU architecture," but there's no verification script to confirm the returned versions actually support s390x. A past review comment suggested adding validation to check that the version list fits the CPU architecture.
595-608: Consider integratingverify_latest_version.shfor validation.The PR adds
test/scripts/verify_latest_version.shwhich appears designed to support this test case. However, the test doesn't useeval_verify_scriptto validate that the returned version is actually the latest stable version. Consider adding:eval_verify_script: ../scripts/verify_latest_version.shThis would ensure the response is validated against the actual latest stable version from the API, rather than relying solely on intent matching.
| eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890? | ||
| eval_types: | ||
| - response_eval:intent | ||
| expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify cluster ID format in test query.
The cluster ID 12345678-1234-1234-1234-1234567890 in the boundary test query is missing a digit (should be 12 hex characters in the last segment, but has only 10). While this is intentionally testing a boundary case with a non-existent cluster, an invalid UUID format might cause the test to fail for the wrong reason.
Suggested fix
- eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
+ eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-123456789012?📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890? | |
| eval_types: | |
| - response_eval:intent | |
| expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations). | |
| eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-123456789012? | |
| eval_types: | |
| - response_eval:intent | |
| expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations). |
🤖 Prompt for AI Agents
In `@test/evals/eval_data.yaml` around lines 749 - 752, The test's eval_query
contains an incorrectly formatted UUID (last segment has 10 hex chars) which can
make the test fail for format validation instead of intent handling; update the
cluster ID in the eval_query string to a syntactically valid UUID (e.g., change
12345678-1234-1234-1234-1234567890 to a 36-char UUID such as
12345678-1234-1234-1234-1234567890ab) so the test exercises the intended
non-existent-cluster boundary case rather than a malformed-ID error.
- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation
631ee14 to
11706f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@test/scripts/verify_latest_version.sh`:
- Around line 12-14: The fetch_versions function uses curl without timeouts,
which can hang CI; update the curl invocation in fetch_versions to add
connection and total timeouts (e.g., --connect-timeout and --max-time) and a
small retry (e.g., --retry and --retry-delay) so network stalls fail fast and
deterministically when calling "${ASSISTED_SERVICE_URL}/openshift-versions" with
Authorization: Bearer ${OCM_TOKEN}; keep the same output piping to jq and
existing empty-response handling.
| fetch_versions() { | ||
| curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \ | ||
| "${ASSISTED_SERVICE_URL}/openshift-versions" | jq -r 'keys[]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the file in question
find . -name "verify_latest_version.sh" -type f 2>/dev/null | head -5Repository: rh-ecosystem-edge/assisted-chat
Length of output: 114
🏁 Script executed:
# Get the actual content around the mentioned lines
if [ -f "test/scripts/verify_latest_version.sh" ]; then
cat -n "test/scripts/verify_latest_version.sh" | head -30
fiRepository: rh-ecosystem-edge/assisted-chat
Length of output: 1067
🏁 Script executed:
# Search for other curl usages in test scripts to see if there's a pattern
rg "curl" test/scripts/ -n --context 2 | head -50Repository: rh-ecosystem-edge/assisted-chat
Length of output: 3343
🏁 Script executed:
# Check if this script is used in CI/CD pipelines
find . -name "*.yml" -o -name "*.yaml" -o -name ".github" -type d 2>/dev/null | xargs grep -l "verify_latest_version" 2>/dev/null || echo "No CI references found"Repository: rh-ecosystem-edge/assisted-chat
Length of output: 97
🏁 Script executed:
# Look for any timeout configurations in similar test scripts
rg "(timeout|--connect-timeout|--max-time)" test/scripts/ -n || echo "No timeouts found in test scripts"Repository: rh-ecosystem-edge/assisted-chat
Length of output: 108
🏁 Script executed:
# Check the complete verify_latest_version.sh file
cat -n "test/scripts/verify_latest_version.sh"Repository: rh-ecosystem-edge/assisted-chat
Length of output: 1790
🏁 Script executed:
# Look for CI/CD configuration files
find . -type d \( -name ".github" -o -name ".gitlab-ci" \) 2>/dev/null
ls -la | grep -E "\.(yml|yaml)$|\.github"Repository: rh-ecosystem-edge/assisted-chat
Length of output: 213
🏁 Script executed:
# Check if there's a Makefile or similar with timeout settings
find . -maxdepth 2 -name "Makefile" -o -name "*.mk" | xargs cat 2>/dev/null | head -50Repository: rh-ecosystem-edge/assisted-chat
Length of output: 2116
🏁 Script executed:
# Look at common.sh to see if there are any utility functions for timeouts
cat -n "test/scripts/common.sh" | head -60Repository: rh-ecosystem-edge/assisted-chat
Length of output: 2538
Add curl timeouts to prevent CI hangs.
A stalled network call (DNS timeout, connection hang, or slow/unresponsive server) would block the job indefinitely. The current curl lacks timeout protections, and error handling on lines 34-37 only catches empty responses, not hangs. Add connect and max-time timeouts with a small retry to make failures deterministic and improve test reliability.
🔧 Suggested change
fetch_versions() {
- curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \
+ curl -sSf --connect-timeout 10 --max-time 30 --retry 3 --retry-delay 2 \
+ -H "Authorization: Bearer ${OCM_TOKEN}" \
"${ASSISTED_SERVICE_URL}/openshift-versions" | jq -r 'keys[]'
}🤖 Prompt for AI Agents
In `@test/scripts/verify_latest_version.sh` around lines 12 - 14, The
fetch_versions function uses curl without timeouts, which can hang CI; update
the curl invocation in fetch_versions to add connection and total timeouts
(e.g., --connect-timeout and --max-time) and a small retry (e.g., --retry and
--retry-delay) so network stalls fail fast and deterministically when calling
"${ASSISTED_SERVICE_URL}/openshift-versions" with Authorization: Bearer
${OCM_TOKEN}; keep the same output piping to jq and existing empty-response
handling.
- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation
11706f5 to
f9f69a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@test/evals/eval_data.yaml`:
- Around line 552-559: The YAML has incorrect indentation causing `conversation`
and its nested items (`- eval_id`, `eval_query`, `eval_types`,
`expected_intent`) to be parsed as top-level entries; fix by indenting the list
under `conversation:` and the `eval_types:` list items so they are children of
the `conversation` mapping (e.g., make `- eval_id: add_operator_virtualization`
and its `eval_query`, `eval_types` and `expected_intent` lines indented one
level under `conversation:`), and apply the same indentation pattern to all
subsequent blocks that follow this structure.
♻️ Duplicate comments (1)
test/evals/eval_data.yaml (1)
749-751: Malformed UUID in boundary test query.The cluster ID ends with only 10 hex characters, so it may fail format validation rather than the intended boundary behavior.
Suggested fix
- eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890? + eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890ab?
| - conversation_group: add_operator_virtualization | ||
| description: Tests adding the OpenShift Virtualization operator. | ||
| conversation: | ||
| - eval_id: add_operator_virtualization | ||
| eval_query: I want to run VMs on my cluster. How do I enable that? | ||
| eval_types: | ||
| - response_eval:intent | ||
| expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix YAML indentation for conversation and nested lists (currently parsed as top-level).
As written, the list item under conversation: and the eval_types list are not indented, which makes the YAML invalid for this group and likely many of the following groups. Please indent list items under their parent keys and apply the same fix to the subsequent newly added blocks that follow this pattern.
✅ Proposed fix (apply same pattern to following groups)
- conversation:
- - eval_id: add_operator_virtualization
- eval_query: I want to run VMs on my cluster. How do I enable that?
- eval_types:
- - response_eval:intent
- expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.
+ conversation:
+ - eval_id: add_operator_virtualization
+ eval_query: I want to run VMs on my cluster. How do I enable that?
+ eval_types:
+ - response_eval:intent
+ expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - conversation_group: add_operator_virtualization | |
| description: Tests adding the OpenShift Virtualization operator. | |
| conversation: | |
| - eval_id: add_operator_virtualization | |
| eval_query: I want to run VMs on my cluster. How do I enable that? | |
| eval_types: | |
| - response_eval:intent | |
| expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided. | |
| - conversation_group: add_operator_virtualization | |
| description: Tests adding the OpenShift Virtualization operator. | |
| conversation: | |
| - eval_id: add_operator_virtualization | |
| eval_query: I want to run VMs on my cluster. How do I enable that? | |
| eval_types: | |
| - response_eval:intent | |
| expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided. |
🤖 Prompt for AI Agents
In `@test/evals/eval_data.yaml` around lines 552 - 559, The YAML has incorrect
indentation causing `conversation` and its nested items (`- eval_id`,
`eval_query`, `eval_types`, `expected_intent`) to be parsed as top-level
entries; fix by indenting the list under `conversation:` and the `eval_types:`
list items so they are children of the `conversation` mapping (e.g., make `-
eval_id: add_operator_virtualization` and its `eval_query`, `eval_types` and
`expected_intent` lines indented one level under `conversation:`), and apply the
same indentation pattern to all subsequent blocks that follow this structure.
|
@ItzikEzra-rh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Added 51 evaluation queries across 47 conversation groups.
In five local runs, the suite achieved a 96–100% pass rate.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.