Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
1b3f67a
support agent use case
AnilSorathiya Jun 24, 2025
723fcab
wrapper function for agent
AnilSorathiya Jun 24, 2025
28d9fbb
ragas metrics
AnilSorathiya Jun 30, 2025
ecf8e09
update ragas metrics
AnilSorathiya Jun 30, 2025
53e8879
fix lint error
AnilSorathiya Jun 30, 2025
1662368
create helper functions
AnilSorathiya Jul 1, 2025
cc84cbc
Merge branch 'main' into anilsorathiya/sc-10863/add-support-for-llm-a…
AnilSorathiya Jul 2, 2025
6f09780
delete old notebook
AnilSorathiya Jul 2, 2025
0bb731e
update description for each section
AnilSorathiya Jul 2, 2025
e758979
simplify agent
AnilSorathiya Jul 9, 2025
7c35cfe
simple demo notebook using langchain agent
AnilSorathiya Jul 10, 2025
9bb70e9
Update description of the simplified langgraph agent demo notebook
AnilSorathiya Jul 10, 2025
894d52a
add brief description to tests
AnilSorathiya Jul 14, 2025
d86a9af
add brief description to tests
AnilSorathiya Jul 14, 2025
884000f
Allow dict return type predict_fn
AnilSorathiya Jul 17, 2025
fbd5aa9
update notebook and refactor utils
AnilSorathiya Jul 18, 2025
daceabf
lint fix
AnilSorathiya Jul 18, 2025
5f8823a
Merge branch 'main' into anilsorathiya/sc-11324/extend-the-predict-fn…
AnilSorathiya Jul 18, 2025
70a5636
fix the test failure
AnilSorathiya Jul 18, 2025
33b06fb
new unit tests for multiple columns return in assign_predictions
AnilSorathiya Jul 18, 2025
8e12bd2
update notebooks to return multiple values in predict_fn
AnilSorathiya Jul 18, 2025
e38929d
general plotting and stats tests
AnilSorathiya Jul 23, 2025
e900a65
clear output
AnilSorathiya Jul 23, 2025
a08e881
Merge branch 'main' into anilsorathiya/sc-11380/add-generlize-plots-a…
AnilSorathiya Jul 24, 2025
16f4700
remove duplicate tests
AnilSorathiya Jul 24, 2025
bb9f9af
update notebook
AnilSorathiya Jul 24, 2025
5078a7a
Integration between deepeval and validmind
AnilSorathiya Jul 25, 2025
2eb6abb
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya Aug 12, 2025
ad0b719
add MetricValues class for metric return type
AnilSorathiya Aug 15, 2025
94ca006
Return MetricValues in the unit tests
AnilSorathiya Aug 15, 2025
c4c885a
update all the unit metric tests
AnilSorathiya Aug 15, 2025
a1f3220
add unit tests for MetricValues class
AnilSorathiya Aug 15, 2025
1a7d0b6
update result to support MetricValues for unit metric tests
AnilSorathiya Aug 15, 2025
1d785ba
add copyright statement
AnilSorathiya Aug 15, 2025
271e85b
add deepeval lib as an extra dependency
AnilSorathiya Aug 15, 2025
f806fc6
fix the error
AnilSorathiya Aug 15, 2025
61c7ef6
demo draft change
AnilSorathiya Aug 18, 2025
b646d0b
demo draft change
AnilSorathiya Aug 18, 2025
dda4ced
fix api issue
AnilSorathiya Aug 18, 2025
dd8e0df
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya Aug 21, 2025
81249c2
separate unit metrics and row metrics
AnilSorathiya Aug 22, 2025
794a322
draft notebook
AnilSorathiya Aug 22, 2025
a27bc48
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya Aug 22, 2025
84dfa2f
update assign_score notebook
AnilSorathiya Aug 22, 2025
7aa2acc
update assign score notebook
AnilSorathiya Sep 1, 2025
247eacc
rename notebook
AnilSorathiya Sep 1, 2025
394c57c
update deepeval and VM integration notebook
AnilSorathiya Sep 1, 2025
a2ca13c
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya Sep 4, 2025
5ebe51f
rename row metrics to scorer
AnilSorathiya Sep 4, 2025
15df53b
add scorer decorator
AnilSorathiya Sep 4, 2025
e28ba37
remove UnitMetricValue and RowMetricValues as they are not needed any…
AnilSorathiya Sep 4, 2025
d8a48c8
remove MetricValue class
AnilSorathiya Sep 5, 2025
d425576
support complex output for scorer
AnilSorathiya Sep 5, 2025
9c7e7e9
remove simple testcases
AnilSorathiya Sep 9, 2025
bbd6cd4
fix the list_scorers
AnilSorathiya Sep 9, 2025
c7b83f3
update notebook
AnilSorathiya Sep 9, 2025
a33f2a4
remove circular dependency of load_test
AnilSorathiya Sep 9, 2025
30c3abc
remove circular dependency of load_test
AnilSorathiya Sep 9, 2025
e91e6e4
move the AnswerRelevancy scorer in deepeval namespace
AnilSorathiya Sep 9, 2025
a284cd1
unit metric can return int and float only
AnilSorathiya Sep 9, 2025
1ec1c75
update notebook
AnilSorathiya Sep 9, 2025
427ddf5
fix lint error
AnilSorathiya Sep 9, 2025
917831c
remove scores listing from list_tests interface
AnilSorathiya Sep 10, 2025
58b3bde
add custom scorer support
AnilSorathiya Sep 10, 2025
cb52104
full path required to run scorer
AnilSorathiya Sep 11, 2025
36f2f96
remove circular dependency
AnilSorathiya Sep 11, 2025
439bd1d
make model parameter option in the assign_scores function
AnilSorathiya Sep 11, 2025
66dde16
fix lint error
AnilSorathiya Sep 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
963 changes: 963 additions & 0 deletions notebooks/code_sharing/deepeval_integration_demo.ipynb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new assign_scores interface, is it required to provide a predit_fn? This notebook has this:

def agent_fn(input):
    """
    Invoke the simplified agent with the given input.
    """

    return 1.23


vm_model = vm.init_model(
    predict_fn=agent_fn,
    input_id="test_model",
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a small issue in this section:

# Initialize ValidMind
print("Integrating with ValidMind framework...")

try:
    # Initialize ValidMind
    vm.init()
    print("ValidMind initialized")

Error:

ERROR: ValidMind integration failed: Model ID must be provided either as an environment variable or as an argument to init.
Note: Some ValidMind features may require additional setup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the Custom Metrics with G-Eval section not running any tests? If not we should clarify with the user what we are trying to demonstrate on that section.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1,230 changes: 996 additions & 234 deletions poetry.lock

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dependencies = [
"scikit-learn",
"seaborn",
"sentry-sdk (>=1.24.0,<2.0.0)",
"tabulate (>=0.8.9,<0.9.0)",
"tabulate (>=0.9.0,<0.10.0)",
"tiktoken",
"tqdm",
"anywidget",
Expand Down Expand Up @@ -66,6 +66,7 @@ llm = [
"ragas (>=0.2.3,<=0.2.7)",
"sentencepiece (>=0.2.0,<0.3.0)",
"langchain-openai (>=0.1.8)",
"deepeval (>3.3.9)",
]
nlp = [
"langdetect",
Expand Down
412 changes: 351 additions & 61 deletions tests/test_dataset.py

Large diffs are not rendered by default.

89 changes: 81 additions & 8 deletions tests/test_results.py
Original file line number Diff line number Diff line change
@@ -1,38 +1,36 @@
import asyncio
import json
import unittest
from unittest.mock import MagicMock, Mock, patch
from unittest.mock import patch
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objs as go
from ipywidgets import HTML, VBox

from validmind.vm_models.result import (
Result,
TestResult,
ErrorResult,
TextGenerationResult,
ResultTable,
RawData,
)

from validmind.vm_models.figure import Figure
from validmind.errors import InvalidParameterError

loop = asyncio.new_event_loop()


class MockAsyncResponse:
def __init__(self, status, text=None, json=None):
def __init__(self, status, text=None, json_data=None):
self.status = status
self.status_code = status
self._text = text
self._json = json
self._json_data = json_data

async def text(self):
return self._text

async def json(self):
return self._json
return self._json_data

async def __aexit__(self, exc_type, exc, tb):
pass
Expand All @@ -50,7 +48,7 @@ def run_async(self, func, *args, **kwargs):

def test_raw_data_initialization(self):
"""Test RawData initialization and methods"""
raw_data = RawData(log=True, dataset_duplicates=pd.DataFrame({"col1": [1, 2]}))
raw_data = RawData(log=True, dataset_duplicates=pd.DataFrame({'col1': [1, 2]}))

self.assertTrue(raw_data.log)
self.assertIsInstance(raw_data.dataset_duplicates, pd.DataFrame)
Expand Down Expand Up @@ -238,6 +236,81 @@ async def test_metadata_update_content_id_handling(self, mock_update_metadata):
content_id="test_description:test_1::ai", text="Test description"
)

def test_test_result_metric_values_integration(self):
"""Test metric values integration with TestResult"""
test_result = TestResult(result_id="test_metric_values")

# Test setting metric with scalar using set_metric
test_result.set_metric(0.85)
self.assertEqual(test_result.metric, 0.85)
self.assertIsNone(test_result.scorer)
self.assertEqual(test_result._get_metric_display_value(), 0.85)
self.assertEqual(test_result._get_metric_serialized_value(), 0.85)

# Test setting metric with list using set_metric
test_result.set_metric([0.1, 0.2, 0.3])
self.assertEqual(test_result.scorer, [0.1, 0.2, 0.3])
self.assertIsNone(test_result.metric)
self.assertEqual(test_result._get_metric_display_value(), [0.1, 0.2, 0.3])
self.assertEqual(test_result._get_metric_serialized_value(), [0.1, 0.2, 0.3])

def test_test_result_metric_type_detection(self):
"""Test metric type detection for both metric and scorer fields"""
test_result = TestResult(result_id="test_metric_type")

# Test unit metric type
test_result.set_metric(42.0)
self.assertEqual(test_result._get_metric_type(), "unit_metric")

# Test row metric type
test_result.set_metric([1.0, 2.0, 3.0])
self.assertEqual(test_result._get_metric_type(), "scorer")

# Test with no metric
test_result.metric = None
test_result.scorer = None
self.assertIsNone(test_result._get_metric_type())

def test_test_result_backward_compatibility(self):
"""Test backward compatibility with direct metric assignment"""
test_result = TestResult(result_id="test_backward_compat")

# Direct assignment of raw values (old style)
test_result.metric = 42.0
self.assertEqual(test_result._get_metric_display_value(), 42.0)
self.assertEqual(test_result._get_metric_serialized_value(), 42.0)

# Direct assignment of list (old style)
test_result.metric = [1.0, 2.0, 3.0]
self.assertEqual(test_result._get_metric_display_value(), [1.0, 2.0, 3.0])
self.assertEqual(test_result._get_metric_serialized_value(), [1.0, 2.0, 3.0])

# Mixed usage - set with set_metric then access display value
test_result.set_metric(100)
self.assertEqual(test_result.metric, 100)
self.assertEqual(test_result._get_metric_display_value(), 100)

def test_test_result_metric_values_widget_display(self):
"""Test MetricValues display in TestResult widgets"""
# Test scalar metric display
test_result_scalar = TestResult(result_id="test_scalar_widget")
test_result_scalar.set_metric(0.95)

widget_scalar = test_result_scalar.to_widget()
self.assertIsInstance(widget_scalar, HTML)
# Check that the metric value appears in the HTML
self.assertIn("0.95", widget_scalar.value)

# Test list metric display
test_result_list = TestResult(result_id="test_list_widget")
test_result_list.set_metric([0.1, 0.2, 0.3])

widget_list = test_result_list.to_widget()
# Even with lists, when no tables/figures exist, it returns HTML
self.assertIsInstance(widget_list, HTML)
# Check that the list values appear in the HTML
self.assertIn("[0.1, 0.2, 0.3]", widget_list.value)


if __name__ == "__main__":
unittest.main()
Loading