Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions site/releases/validmind-library-releases.qmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "{{< var validmind.developer >}} releases"
subtitle: "Latest: `v2.9.5`"
date: "October 7, 2025"
subtitle: "Latest: `v2.10.0`"
date: "October 16, 2025"
listing:
contents: ../releases/validmind-library/**/pr-*.qmd
fields: [date, title, categories, description]
Expand Down
35 changes: 34 additions & 1 deletion site/validmind/_sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ website:
- text: "---"
- text: "Python API"
# Root level items from validmind.qmd
- text: "<span class='version'>`2.8.25`</span>"
- text: "<span class='version'>`2.10.0`</span>"
file: validmind/validmind.qmd#version__
- text: "init<span class='suffix'></span>"
file: validmind/validmind.qmd#init
Expand Down Expand Up @@ -40,6 +40,8 @@ website:
file: validmind/validmind.qmd#tasks
- text: "test<span class='suffix'></span>"
file: validmind/validmind.qmd#test
- text: "scorer_decorator<span class='suffix'></span>"
file: validmind/validmind.qmd#scorer_decorator
- text: "log_text<span class='suffix'></span>"
file: validmind/validmind.qmd#log_text
- text: "experimental_agent<span class='suffix'></span>"
Expand Down Expand Up @@ -75,6 +77,13 @@ website:
file: validmind/validmind/datasets/credit_risk/lending_club.qmd
- text: "lending_club_bias"
file: validmind/validmind/datasets/credit_risk/lending_club_bias.qmd
- text: "llm"
file: validmind/validmind/datasets/llm.qmd
contents:
- text: "rag"
file: validmind/validmind/datasets/llm/rag.qmd
- text: "rfp"
file: validmind/validmind/datasets/llm/rag/rfp.qmd
- text: "nlp"
file: validmind/validmind/datasets/nlp.qmd
contents:
Expand All @@ -91,6 +100,8 @@ website:
file: validmind/validmind/datasets/regression/lending_club.qmd
- text: "errors"
file: validmind/validmind/errors.qmd
- text: "scorer"
file: validmind/validmind/scorer.qmd
- text: "test_suites"
file: validmind/validmind/test_suites.qmd
contents:
Expand Down Expand Up @@ -407,6 +418,17 @@ website:
file: validmind/validmind/tests/model_validation/statsmodels.qmd
- text: "statsutils"
file: validmind/validmind/tests/model_validation/statsmodels/statsutils.qmd
- text: "plots"
file: validmind/validmind/tests/plots.qmd
contents:
- text: "BoxPlot"
file: validmind/validmind/tests/plots/BoxPlot.qmd
- text: "CorrelationHeatmap"
file: validmind/validmind/tests/plots/CorrelationHeatmap.qmd
- text: "HistogramPlot"
file: validmind/validmind/tests/plots/HistogramPlot.qmd
- text: "ViolinPlot"
file: validmind/validmind/tests/plots/ViolinPlot.qmd
- text: "prompt_validation"
file: validmind/validmind/tests/prompt_validation.qmd
contents:
Expand All @@ -426,6 +448,17 @@ website:
file: validmind/validmind/tests/prompt_validation/Specificity.qmd
- text: "ai_powered_test"
file: validmind/validmind/tests/prompt_validation/ai_powered_test.qmd
- text: "stats"
file: validmind/validmind/tests/stats.qmd
contents:
- text: "CorrelationAnalysis"
file: validmind/validmind/tests/stats/CorrelationAnalysis.qmd
- text: "DescriptiveStats"
file: validmind/validmind/tests/stats/DescriptiveStats.qmd
- text: "NormalityTests"
file: validmind/validmind/tests/stats/NormalityTests.qmd
- text: "OutlierDetection"
file: validmind/validmind/tests/stats/OutlierDetection.qmd
- text: "unit_metrics"
file: validmind/validmind/unit_metrics.qmd
- text: "vm_models"
Expand Down
66 changes: 61 additions & 5 deletions site/validmind/validmind.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ To initialize the ValidMind Library, paste the code snippet with the model ident
import validmind as vm

vm.init(
api_host = "https://api.dev.vm.validmind.ai/api/v1/tracking/tracking",
api_host = "https://app.prod.validmind.ai/api/v1/tracking/tracking",
api_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
api_secret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
project = "<project-identifier>"
Expand All @@ -44,7 +44,7 @@ After you have pasted the code snippet into your development source code and exe

::: {.signature}

<span class="name">2.8.25</span>
<span class="name">2.10.0</span>

:::

Expand All @@ -66,7 +66,7 @@ If the API key and secret are not provided, the client will attempt to retrieve

**Arguments**

- `project (str, optional)`: The project CUID. Alias for model. Defaults to None. \[DEPRECATED\]
- `project (str, optional)`: The project CUID. Alias for model. Defaults to None. [DEPRECATED]
- `model (str, optional)`: The model CUID. Defaults to None.
- `api_key (str, optional)`: The API key. Defaults to None.
- `api_secret (str, optional)`: The API secret. Defaults to None.
Expand Down Expand Up @@ -213,7 +213,7 @@ This function provides an interface to retrieve the TestSuite instance for the c

::: {.signature}

<span class="kw">def</span><span class="name">log_metric</span>(<span class="params"><span class="n">key</span><span class="p">:</span><span class="nb">str</span><span class="muted">,</span></span><span class="params"><span class="n">value</span><span class="p">:</span><span class="nb">float</span><span class="muted">,</span></span><span class="params"><span class="n">inputs</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="n">List</span><span class="p">\[</span><span class="nb">str</span><span class="p">\]</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">params</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="n">Dict</span><span class="p">\[</span><span class="nb">str</span><span class="p">, </span><span class="n">Any</span><span class="p">\]</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">recorded_at</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="nb">str</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">thresholds</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="n">Dict</span><span class="p">\[</span><span class="nb">str</span><span class="p">, </span><span class="n">Any</span><span class="p">\]</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">passed</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="nb">bool</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span></span>):
<span class="kw">def</span><span class="name">log_metric</span>(<span class="params"><span class="n">key</span><span class="p">:</span><span class="nb">str</span><span class="muted">,</span></span><span class="params"><span class="n">value</span><span class="p">:</span><span class="n">Union</span><span class="p">\[</span><span class="nb">int</span><span class="p">, </span><span class="nb">float</span><span class="p">\]</span><span class="muted">,</span></span><span class="params"><span class="n">inputs</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="n">List</span><span class="p">\[</span><span class="nb">str</span><span class="p">\]</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">params</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="n">Dict</span><span class="p">\[</span><span class="nb">str</span><span class="p">, </span><span class="n">Any</span><span class="p">\]</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">recorded_at</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="nb">str</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">thresholds</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="n">Dict</span><span class="p">\[</span><span class="nb">str</span><span class="p">, </span><span class="n">Any</span><span class="p">\]</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span><span class="muted">,</span></span><span class="params"><span class="n">passed</span><span class="p">:</span><span class="n">Optional</span><span class="p">\[</span><span class="nb">bool</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span></span>):

:::

Expand All @@ -226,11 +226,12 @@ Unit metrics are key-value pairs where the key is the metric name and the value
**Arguments**

- `key (str)`: The metric key
- `value (Union[int, float])`: The metric value
- `value (Union[int, float])`: The metric value (scalar)
- `inputs (List[str])`: List of input IDs
- `params (Dict[str, Any])`: Parameters used to generate the metric
- `recorded_at (str)`: Timestamp when the metric was recorded
- `thresholds (Dict[str, Any])`: Thresholds for the metric
- `passed (bool)`: Whether the metric passed validation thresholds

## preview_template<span class="suffix"></span>

Expand Down Expand Up @@ -422,6 +423,61 @@ The function may also include a docstring. This docstring will be used and logge

- The decorated function.

## scorer_decorator<span class="suffix"></span>

<!-- signatures.jinja2 -->

::: {.signature}

<span class="kw">def</span><span class="name">scorer</span>(<span class="param"><span class="n">func_or_id</span><span class="p">:</span><span class="n">Union</span><span class="p">\[</span><span class="n">Callable</span><span class="p">\[</span><span class="n">...</span><span class="p">, </span><span class="n">Any</span><span class="p">\]</span><span class="p">, </span><span class="nb">str</span><span class="p">, </span><span class="n">None</span><span class="p">\]</span><span class="o">=</span><span class="kc">None</span></span>)<span class="p"> → </span><span class="return-annotation"><span class="n">Callable</span><span class="p">\[</span><span class="p">\[</span><a href="/validmind/validmind/vm_models.qmd#f">validmind.vm_models.F</a><span class="p">\]</span><span class="p">, </span><a href="/validmind/validmind/vm_models.qmd#f">validmind.vm_models.F</a><span class="p">\]</span></span>:

:::

<!-- docstring.jinja2 -->

Decorator for creating and registering custom scorers

This decorator registers the function it wraps as a scorer function within ValidMind under the provided ID. Once decorated, the function can be run using the `validmind.scorer.run_scorer` function.

The scorer ID can be provided in three ways:

1. Explicit ID: `@scorer("validmind.scorer.classification.BrierScore")`
1. Auto-generated from path: `@scorer()` - automatically generates ID from file path
1. Function name only: `@scorer` - uses function name with validmind.scorer prefix

The function can take two different types of arguments:

- Inputs: ValidMind model or dataset (or list of models/datasets). These arguments must use the following names: `model`, `models`, `dataset`, `datasets`.
- Parameters: Any additional keyword arguments of any type (must have a default value) that can have any name.

The function should return one of the following types:

- Table: Either a list of dictionaries or a pandas DataFrame
- Plot: Either a matplotlib figure or a plotly figure
- Scalar: A single number (int or float)
- Boolean: A single boolean value indicating whether the test passed or failed
- List: A list of values (for row-level metrics) or a list of dictionaries with consistent keys
- Any other type: The output will be stored as raw data for use by calling code

When returning a list of dictionaries:

- All dictionaries must have the same keys
- The list length must match the number of rows in the dataset
- Each dictionary key will become a separate column when using assign_scores
- Column naming follows the pattern: {model_id}_{metric_name}_{dict_key}

Note: Scorer outputs are not logged to the backend and are intended for use by other parts of the system (e.g., assign_scores method).

The function may also include a docstring. This docstring will be used and logged as the scorer's description.

**Arguments**

- `func_or_id (Union[Callable[..., Any], str, None], optional)`: Either the function to decorate or the scorer ID. If None or empty string, the ID is auto-generated from the file path. Defaults to None.

**Returns**

- The decorated function.

## log_text<span class="suffix"></span>

<!-- signatures.jinja2 -->
Expand Down
1 change: 1 addition & 0 deletions site/validmind/validmind/datasets.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ Example datasets that can be used with the ValidMind Library.

- [classification](datasets/classification.qmd)
- [credit_risk](datasets/credit_risk.qmd)
- [llm](datasets/llm.qmd)
- [nlp](datasets/nlp.qmd)
- [regression](datasets/regression.qmd)
Loading
Loading