Skip to content

Add Support for LLM Agentic Model in VM Library#384

Merged
AnilSorathiya merged 15 commits intomainfrom
anilsorathiya/sc-10863/add-support-for-llm-agentic-model-in-vm-library
Jul 18, 2025
Merged

Add Support for LLM Agentic Model in VM Library#384
AnilSorathiya merged 15 commits intomainfrom
anilsorathiya/sc-10863/add-support-for-llm-agentic-model-in-vm-library

Conversation

@AnilSorathiya
Copy link
Contributor

@AnilSorathiya AnilSorathiya commented Jun 24, 2025

Pull Request Description

This PR provides the initial proof-of-concept (PoC) implementation for LLM agent model documentation support

What and why?

This PR introduces the first proof-of-concept (PoC) implementation for LLM agent model documentation support. As this has become a common use case, adding native support is both timely and valuable for internal and external use cases.

How to test

run notebook notebooks/agents/langgraph_agent_demo.ipynb

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@AnilSorathiya AnilSorathiya added the internal Not to be externalized in the release notes label Jun 24, 2025
@cachafla
Copy link
Contributor

cachafla commented Jul 4, 2025

Some feedback on the notebook:

Given that we want to demonstrate how to use write and run VM tests that take an agent (LLM with tools) as input, we should simplify the complexity in the agent construction code at least as an introductory notebook, so the user doesn't have to invest a lot of time trying to understand the agent code and instead focuses on how VM enables that. I think LangGraph with many different tools and manual routing could be overwhelming. An idea to simplify this:

Use Langchain with the following adjustments:

  • Use only 2 tools: document search engine and task assistant. No need to use that many tools
  • Leverage the bind_tools interface that Langchain leverages for function calling instead of having to manually inject tool context in the prompt and doing manual tool routing. This is an example of extra overhead that we shouldn't distract the VM user with

The rest of the code can stay the same but we if we don't use LangGraph for this demo then we won't need the example LangGraphVisualization test.

@cachafla
Copy link
Contributor

cachafla commented Jul 4, 2025

Given this is initial exploratory work I'd suggest we don't add any init_agent() method until we have validated different use cases once we have more demo notebooks.

@AnilSorathiya AnilSorathiya requested a review from johnwalz97 July 11, 2025 13:28
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!! lgtm

@johnwalz97 johnwalz97 requested review from hunner and nateshim and removed request for hunner and nateshim July 11, 2025 15:49
Copy link
Contributor

@juanmleng juanmleng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just left a couple of cosmetic comments.

Copy link
Contributor

@cachafla cachafla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some notes to address before merging:

notebooks/agents/langchain_agent_simple_demo.ipynb

  • The Prepare Sample Test Dataset paragraph talks about multi tool requests but we don't have multi tool examples in the dataset
  • Some possible outputs have .pdf, .txt and .doc file extensions. It seems like this is breaking the tests? Given that we know these extensions don't exist in the search engine tool. Maybe we just need to remove the extensions?
  • It seems the Tool Call Accuracy Test section has text from the previous more complex version of the code? It's mentioning router intelligence, multi tool handling and other things that the test is not doing at all. Maybe we just need to simplify the test here

@github-actions
Copy link
Contributor

PR Summary

This PR introduces significant enhancements to the agent demonstration notebooks and testing frameworks. The key changes include:

  1. New Notebook Implementations for LangChain and LangGraph Agents

    • Two main types of agent demos have been added: one using LangChain with direct tool calling and one using LangGraph with an LLM-powered router. Both demonstrations provide detailed documentation on how the agents work, what tools they offer (such as search_engine and task_assistant for the simplified version, and a broader list including advanced_calculator, weather_service, etc. for the full LangGraph agent), and how they integrate with ValidMind for testing and monitoring.
    • The notebooks include instructions for setup, installation, and environment variable loading along with rich docstrings and inline examples.
  2. Integration with ValidMind and Comprehensive Testing

    • The agents are wrapped with a standardized interface (agent_fn) to integrate with ValidMind. This allows running validation tests that assess various performance metrics such as accuracy, tool call accuracy, and several RAGAS (Retrieval-Augmented Generation Assessment) dimensions including Faithfulness, Response Relevancy, Context Recall, and AspectCritic.
    • Test datasets are created using Pandas with unique session IDs and expected tools, and tests are run using ValidMind’s test runner, with adjustments to dataframe columns and visualization of workflow graphs.
  3. Utility Functions and Enhancements

    • New utility functions in the langchain_utils.py and utils.py modules extract and format tool outputs, capture metadata from agent responses, and display a summarized report. These functions aid in the debugging and evaluation of agent performance.
  4. Dependency Updates in Poetry Lock File

    • The poetry.lock file has been updated to newer versions of key libraries such as LangChain (upgraded from v0.2.x to v0.3.x), langchain-core, langchain-text-splitters, and langsmith among others. These updates may include minor breaking changes and improvements which need to be verified in integration tests.

Overall, the PR combines functional improvements for agent demonstrations, robust testing integration with ValidMind, and dependency maintenance which collectively improve the codebase's quality, maintainability, and testability.

Test Suggestions

  • Run all ValidMind tests (accuracy, tool call accuracy, RAGAS tests including Faithfulness, Response Relevancy, Context Recall, and AspectCritic) to verify agent performance and output consistency.
  • Manually test the interactive notebooks to ensure that environments load correctly, the LLM tool bindings work as expected, and that printed outputs and visualizations (e.g., Mermaid diagrams) render properly.
  • Execute edge case tests for the advanced_calculator tool by providing malformed or potentially unsafe expressions to verify that the regex sanitization prevents code injection.
  • Perform integration tests across both LangChain and LangGraph agents to ensure that session management and state transitions work seamlessly.

@AnilSorathiya AnilSorathiya merged commit e20db06 into main Jul 18, 2025
8 checks passed
@AnilSorathiya AnilSorathiya deleted the anilsorathiya/sc-10863/add-support-for-llm-agentic-model-in-vm-library branch July 18, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants