Skip to content

Conversation

@AnilSorathiya
Copy link
Contributor

@AnilSorathiya AnilSorathiya commented Jan 27, 2026

Pull Request Description

What and why?

We don't have hard dependancy of langgraph library. Only this notebook uses to build an agent.

  • Add langgraph lib with version in the lib.
  • Change system prompt to generate output with 500 words. This will help failthfulness test to break sentences and analyse them
  • Remove toxicity test from notebook
  • Slim the size of test dataset to reduce execution time of tests

How to test

Run the notebook

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@AnilSorathiya AnilSorathiya added bug Something isn't working internal Not to be externalized in the release notes labels Jan 27, 2026
@cachafla
Copy link
Contributor

Merging this PR should fix the dependency failures: #470

@validbeck
Copy link
Collaborator

validbeck commented Jan 28, 2026

@AnilSorathiya Before you merge, I think the "AI Agent Evaluation Metrics > Execution Layer" is missing the scorer/score assignment for StepEfficiencyMetric — can you please add an example?

Edit

I see we're actually missing that test entirely. I've added it (and a section for the metric score in the working new version of the notebook) in my WIP branch here:

When I have the PR up for review someone will need to confirm the test does what we want it to do as I'm not the expert there.

@github-actions
Copy link
Contributor

PR Summary

This PR introduces several changes aimed at refining the banking-related notebooks and test datasets, along with minor code style improvements in the test loading module. Notable changes include:

  1. In the banking test dataset file, one of the test cases for credit risk and additional tests for fraud detection were removed. This streamlines the dataset to focus on core account management and credit risk scenarios.

  2. Within the banking demo notebook, the installation command now includes a fixed version for the 'langgraph' dependency, ensuring compatibility and predictability. Instructions within a code cell have been expanded to request that responses be detailed, user-friendly, and include a concise execution plan. This change is intended to guide users in producing more comprehensive output from the banking assistance agent.

  3. The data handling in the notebook has been updated by replacing a sample filter with using the complete dataset, which may provide more consistent tests and performance evaluation.

  4. Formatting improvements across the notebooks involve adjustments in the display of data frames (e.g., using vm_test_dataset._df instead of vm_test_dataset._df.head()) to better showcase the underlying data.

  5. In the tests loading file, the function signature in _get_test_function_from_provider was reformatted and the error message was consolidated into one line for improved readability.

The overall functional changes focus on refining the test cases, clarifying output expectations of the LLM-driven banking responses, and ensuring that dependency versions are strictly managed for predictable behavior during agent execution.

Test Suggestions

  • Run the notebook cells to confirm that all cells execute successfully, particularly after the dependency version change and dataset modifications.
  • Verify that the removal of extra test cases from the banking dataset does not affect other modules dependent on the dataset.
  • Write unit tests for the _get_test_function_from_provider function to ensure that the correct exception is raised when a test provider is not found.
  • Test the LLM output to ensure that the newly added instructions (detailed explanation, 500+ words, concise execution plan) are adhered to in the generated responses.

@AnilSorathiya
Copy link
Contributor Author

AnilSorathiya commented Jan 28, 2026

@AnilSorathiya Before you merge, I think the "AI Agent Evaluation Metrics > Execution Layer" is missing the scorer/score assignment for StepEfficiencyMetric — can you please add an example?

Edit

I see we're actually missing that test entirely. I've added it (and a section for the metric score in the working new version of the notebook) in my WIP branch here:

When I have the PR up for review someone will need to confirm the test does what we want it to do as I'm not the expert there.

@validbeck StepEfficiency metric has been removed from our codebase as there is an underline bug in the deepeval.
Latest version of vm-lib doesn't have stepefficiency test file. I have cleanup descriptions in the notebooks that was missed by me.

@AnilSorathiya AnilSorathiya merged commit 031c591 into main Jan 28, 2026
17 checks passed
@AnilSorathiya AnilSorathiya deleted the anilsorathiya/sc-14175/fix-langgraph-version-in-agentic-demo-notebook branch January 28, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants