-
Notifications
You must be signed in to change notification settings - Fork 3
Support for the Deepeval dataset (LLMTestCase) for LLM tests #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
AnilSorathiya
merged 68 commits into
main
from
anilsorathiya/sc-11452/support-for-the-deepeval-dataset-llmtestcase
Sep 12, 2025
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
1b3f67a
support agent use case
AnilSorathiya 723fcab
wrapper function for agent
AnilSorathiya 28d9fbb
ragas metrics
AnilSorathiya ecf8e09
update ragas metrics
AnilSorathiya 53e8879
fix lint error
AnilSorathiya 1662368
create helper functions
AnilSorathiya cc84cbc
Merge branch 'main' into anilsorathiya/sc-10863/add-support-for-llm-a…
AnilSorathiya 6f09780
delete old notebook
AnilSorathiya 0bb731e
update description for each section
AnilSorathiya e758979
simplify agent
AnilSorathiya 7c35cfe
simple demo notebook using langchain agent
AnilSorathiya 9bb70e9
Update description of the simplified langgraph agent demo notebook
AnilSorathiya 894d52a
add brief description to tests
AnilSorathiya d86a9af
add brief description to tests
AnilSorathiya 884000f
Allow dict return type predict_fn
AnilSorathiya fbd5aa9
update notebook and refactor utils
AnilSorathiya daceabf
lint fix
AnilSorathiya 5f8823a
Merge branch 'main' into anilsorathiya/sc-11324/extend-the-predict-fn…
AnilSorathiya 70a5636
fix the test failure
AnilSorathiya 33b06fb
new unit tests for multiple columns return in assign_predictions
AnilSorathiya 8e12bd2
update notebooks to return multiple values in predict_fn
AnilSorathiya e38929d
general plotting and stats tests
AnilSorathiya e900a65
clear output
AnilSorathiya a08e881
Merge branch 'main' into anilsorathiya/sc-11380/add-generlize-plots-a…
AnilSorathiya 16f4700
remove duplicate tests
AnilSorathiya bb9f9af
update notebook
AnilSorathiya 5078a7a
Integration between deepeval and validmind
AnilSorathiya 2eb6abb
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya ad0b719
add MetricValues class for metric return type
AnilSorathiya 94ca006
Return MetricValues in the unit tests
AnilSorathiya c4c885a
update all the unit metric tests
AnilSorathiya a1f3220
add unit tests for MetricValues class
AnilSorathiya 1a7d0b6
update result to support MetricValues for unit metric tests
AnilSorathiya 1d785ba
add copyright statement
AnilSorathiya 271e85b
add deepeval lib as an extra dependency
AnilSorathiya f806fc6
fix the error
AnilSorathiya 61c7ef6
demo draft change
AnilSorathiya b646d0b
demo draft change
AnilSorathiya dda4ced
fix api issue
AnilSorathiya dd8e0df
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya 81249c2
separate unit metrics and row metrics
AnilSorathiya 794a322
draft notebook
AnilSorathiya a27bc48
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya 84dfa2f
update assign_score notebook
AnilSorathiya 7aa2acc
update assign score notebook
AnilSorathiya 247eacc
rename notebook
AnilSorathiya 394c57c
update deepeval and VM integration notebook
AnilSorathiya a2ca13c
Merge branch 'main' into anilsorathiya/sc-11452/support-for-the-deepe…
AnilSorathiya 5ebe51f
rename row metrics to scorer
AnilSorathiya 15df53b
add scorer decorator
AnilSorathiya e28ba37
remove UnitMetricValue and RowMetricValues as they are not needed any…
AnilSorathiya d8a48c8
remove MetricValue class
AnilSorathiya d425576
support complex output for scorer
AnilSorathiya 9c7e7e9
remove simple testcases
AnilSorathiya bbd6cd4
fix the list_scorers
AnilSorathiya c7b83f3
update notebook
AnilSorathiya a33f2a4
remove circular dependency of load_test
AnilSorathiya 30c3abc
remove circular dependency of load_test
AnilSorathiya e91e6e4
move the AnswerRelevancy scorer in deepeval namespace
AnilSorathiya a284cd1
unit metric can return int and float only
AnilSorathiya 1ec1c75
update notebook
AnilSorathiya 427ddf5
fix lint error
AnilSorathiya 917831c
remove scores listing from list_tests interface
AnilSorathiya 58b3bde
add custom scorer support
AnilSorathiya cb52104
full path required to run scorer
AnilSorathiya 36f2f96
remove circular dependency
AnilSorathiya 439bd1d
make model parameter option in the assign_scores function
AnilSorathiya 66dde16
fix lint error
AnilSorathiya File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
232 changes: 174 additions & 58 deletions
232
...w_to/assign_score_complete_tutorial.ipynb → ..._to/assign_scores_complete_tutorial.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new
assign_scoresinterface, is it required to provide apredit_fn? This notebook has this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a small issue in this section:
Error:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the
Custom Metrics with G-Evalsection not running any tests? If not we should clarify with the user what we are trying to demonstrate on that section.