Add Support for LLM Agentic Model in VM Library by AnilSorathiya · Pull Request #384 · validmind/validmind-library

AnilSorathiya · 2025-06-24T10:20:16Z

Pull Request Description

This PR provides the initial proof-of-concept (PoC) implementation for LLM agent model documentation support

What and why?

This PR introduces the first proof-of-concept (PoC) implementation for LLM agent model documentation support. As this has become a common use case, adding native support is both timely and valuable for internal and external use cases.

How to test

run notebook notebooks/agents/langgraph_agent_demo.ipynb

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

…gentic-model-in-vm-library

validmind/client.py

pyproject.toml

cachafla · 2025-07-04T07:24:18Z

Some feedback on the notebook:

Given that we want to demonstrate how to use write and run VM tests that take an agent (LLM with tools) as input, we should simplify the complexity in the agent construction code at least as an introductory notebook, so the user doesn't have to invest a lot of time trying to understand the agent code and instead focuses on how VM enables that. I think LangGraph with many different tools and manual routing could be overwhelming. An idea to simplify this:

Use Langchain with the following adjustments:

Use only 2 tools: document search engine and task assistant. No need to use that many tools
Leverage the bind_tools interface that Langchain leverages for function calling instead of having to manually inject tool context in the prompt and doing manual tool routing. This is an example of extra overhead that we shouldn't distract the VM user with
- Using bind_tools: https://python.langchain.com/docs/concepts/tool_calling/

The rest of the code can stay the same but we if we don't use LangGraph for this demo then we won't need the example LangGraphVisualization test.

cachafla · 2025-07-04T07:26:58Z

Given this is initial exploratory work I'd suggest we don't add any init_agent() method until we have validated different use cases once we have more demo notebooks.

johnwalz97

very nice!! lgtm

notebooks/agents/langchain_agent_simple_demo.ipynb

juanmleng

Great work! Just left a couple of cosmetic comments.

cachafla

Looks good! Some notes to address before merging:

notebooks/agents/langchain_agent_simple_demo.ipynb

The Prepare Sample Test Dataset paragraph talks about multi tool requests but we don't have multi tool examples in the dataset
Some possible outputs have .pdf, .txt and .doc file extensions. It seems like this is breaking the tests? Given that we know these extensions don't exist in the search engine tool. Maybe we just need to remove the extensions?
It seems the Tool Call Accuracy Test section has text from the previous more complex version of the code? It's mentioning router intelligence, multi tool handling and other things that the test is not doing at all. Maybe we just need to simplify the test here

github-actions · 2025-07-18T12:16:35Z

PR Summary

This PR introduces significant enhancements to the agent demonstration notebooks and testing frameworks. The key changes include:

New Notebook Implementations for LangChain and LangGraph Agents
- Two main types of agent demos have been added: one using LangChain with direct tool calling and one using LangGraph with an LLM-powered router. Both demonstrations provide detailed documentation on how the agents work, what tools they offer (such as search_engine and task_assistant for the simplified version, and a broader list including advanced_calculator, weather_service, etc. for the full LangGraph agent), and how they integrate with ValidMind for testing and monitoring.
- The notebooks include instructions for setup, installation, and environment variable loading along with rich docstrings and inline examples.
Integration with ValidMind and Comprehensive Testing
- The agents are wrapped with a standardized interface (agent_fn) to integrate with ValidMind. This allows running validation tests that assess various performance metrics such as accuracy, tool call accuracy, and several RAGAS (Retrieval-Augmented Generation Assessment) dimensions including Faithfulness, Response Relevancy, Context Recall, and AspectCritic.
- Test datasets are created using Pandas with unique session IDs and expected tools, and tests are run using ValidMind’s test runner, with adjustments to dataframe columns and visualization of workflow graphs.
Utility Functions and Enhancements
- New utility functions in the langchain_utils.py and utils.py modules extract and format tool outputs, capture metadata from agent responses, and display a summarized report. These functions aid in the debugging and evaluation of agent performance.
Dependency Updates in Poetry Lock File
- The poetry.lock file has been updated to newer versions of key libraries such as LangChain (upgraded from v0.2.x to v0.3.x), langchain-core, langchain-text-splitters, and langsmith among others. These updates may include minor breaking changes and improvements which need to be verified in integration tests.

Overall, the PR combines functional improvements for agent demonstrations, robust testing integration with ValidMind, and dependency maintenance which collectively improve the codebase's quality, maintainability, and testability.

Test Suggestions

Run all ValidMind tests (accuracy, tool call accuracy, RAGAS tests including Faithfulness, Response Relevancy, Context Recall, and AspectCritic) to verify agent performance and output consistency.
Manually test the interactive notebooks to ensure that environments load correctly, the LLM tool bindings work as expected, and that printed outputs and visualizations (e.g., Mermaid diagrams) render properly.
Execute edge case tests for the advanced_calculator tool by providing malformed or potentially unsafe expressions to verify that the regex sanitization prevents code injection.
Perform integration tests across both LangChain and LangGraph agents to ensure that session management and state transitions work seamlessly.

support agent use case

1b3f67a

AnilSorathiya added the internal Not to be externalized in the release notes label Jun 24, 2025

AnilSorathiya added 8 commits June 24, 2025 11:31

wrapper function for agent

723fcab

ragas metrics

28d9fbb

update ragas metrics

ecf8e09

fix lint error

53e8879

create helper functions

1662368

Merge branch 'main' into anilsorathiya/sc-10863/add-support-for-llm-a…

cc84cbc

…gentic-model-in-vm-library

delete old notebook

6f09780

update description for each section

0bb731e

AnilSorathiya requested review from cachafla, johnwalz97 and juanmleng July 3, 2025 12:48

johnwalz97 requested changes Jul 3, 2025

View reviewed changes

validmind/client.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

AnilSorathiya added 3 commits July 9, 2025 14:48

simplify agent

e758979

simple demo notebook using langchain agent

7c35cfe

Update description of the simplified langgraph agent demo notebook

9bb70e9

AnilSorathiya requested a review from johnwalz97 July 11, 2025 13:28

johnwalz97 approved these changes Jul 11, 2025

View reviewed changes

johnwalz97 requested review from hunner and nateshim and removed request for hunner and nateshim July 11, 2025 15:49

juanmleng reviewed Jul 11, 2025

View reviewed changes

notebooks/agents/langchain_agent_simple_demo.ipynb Show resolved Hide resolved

juanmleng reviewed Jul 11, 2025

View reviewed changes

notebooks/agents/langchain_agent_simple_demo.ipynb Outdated Show resolved Hide resolved

juanmleng reviewed Jul 11, 2025

View reviewed changes

notebooks/agents/langchain_agent_simple_demo.ipynb Outdated Show resolved Hide resolved

juanmleng approved these changes Jul 11, 2025

View reviewed changes

add brief description to tests

894d52a

add brief description to tests

d86a9af

cachafla approved these changes Jul 17, 2025

View reviewed changes

update the description in the notebook

c95f47b

AnilSorathiya merged commit e20db06 into main Jul 18, 2025
8 checks passed

AnilSorathiya deleted the anilsorathiya/sc-10863/add-support-for-llm-agentic-model-in-vm-library branch July 18, 2025 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for LLM Agentic Model in VM Library#384

Add Support for LLM Agentic Model in VM Library#384
AnilSorathiya merged 15 commits intomainfrom
anilsorathiya/sc-10863/add-support-for-llm-agentic-model-in-vm-library

AnilSorathiya commented Jun 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cachafla commented Jul 4, 2025

Uh oh!

cachafla commented Jul 4, 2025

Uh oh!

johnwalz97 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juanmleng left a comment

Uh oh!

cachafla left a comment

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AnilSorathiya commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

What and why?

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cachafla commented Jul 4, 2025

Uh oh!

cachafla commented Jul 4, 2025

Uh oh!

johnwalz97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juanmleng left a comment

Choose a reason for hiding this comment

Uh oh!

cachafla left a comment

Choose a reason for hiding this comment

notebooks/agents/langchain_agent_simple_demo.ipynb

Uh oh!

github-actions bot commented Jul 18, 2025

PR Summary

Test Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AnilSorathiya commented Jun 24, 2025 •

edited

Loading

johnwalz97 left a comment •

edited

Loading