diff --git a/notebooks/code_sharing/deepeval_integration_demo.ipynb b/notebooks/code_sharing/deepeval_integration_demo.ipynb new file mode 100644 index 000000000..78b9ce0ff --- /dev/null +++ b/notebooks/code_sharing/deepeval_integration_demo.ipynb @@ -0,0 +1,963 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "# DeepEval Integration with ValidMind\n", + "\n", + "Learn how to integrate [DeepEval](https://github.com/confident-ai/deepeval) with the ValidMind Library to evaluate Large Language Models (LLMs) and AI agents. This notebook demonstrates the complete integration through the new `LLMAgentDataset` class, enabling you to leverage DeepEval's 30+ evaluation metrics within ValidMind's testing infrastructure.\n", + "\n", + "To integrate DeepEval with ValidMind, we'll:\n", + "\n", + "1. Set up both frameworks and install required dependencies\n", + "2. Create and evaluate LLM test cases for different scenarios\n", + "3. Work with RAG systems and agent evaluations\n", + "4. Use Golden templates for standardized testing\n", + "5. Create custom evaluation metrics with G-Eval\n", + "6. Integrate everything with ValidMind's testing framework\n", + "7. Apply production-ready evaluation patterns\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "## Contents \n", + "- [Introduction](#toc1_) \n", + "- [About DeepEval Integration](#toc2_) \n", + " - [Before you begin](#toc2_1_) \n", + " - [Key concepts](#toc2_2_) \n", + "- [Setting up](#toc3_) \n", + " - [Install required packages](#toc3_1_) \n", + " - [Initialize ValidMind](#toc3_2_) \n", + "- [Basic Usage - Simple Q&A Evaluation](#toc4_) \n", + "- [RAG System Evaluation](#toc5_) \n", + "- [LLM Agent Evaluation](#toc6_) \n", + "- [Working with Golden Templates](#toc7_) \n", + "- [ValidMind Integration](#toc8_) \n", + "- [Custom Metrics with G-Eval](#toc9_) \n", + "- [In summary](#toc10_) \n", + "- [Next steps](#toc11_) \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## Introduction\n", + "\n", + "Large Language Model (LLM) evaluation is critical for understanding model performance across different tasks and scenarios. This notebook demonstrates how to integrate DeepEval's comprehensive evaluation framework with ValidMind's testing infrastructure to create a robust LLM evaluation pipeline.\n", + "\n", + "DeepEval provides over 30 evaluation metrics specifically designed for LLMs, covering scenarios from simple Q&A to complex agent interactions. By integrating with ValidMind, you can leverage these metrics within a structured testing framework that supports documentation, collaboration, and compliance requirements.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## About DeepEval Integration\n", + "\n", + "DeepEval is a comprehensive evaluation framework for LLMs that provides metrics for various scenarios including hallucination detection, answer relevancy, faithfulness, and custom evaluation criteria. ValidMind is a platform for managing model risk and documentation through automated testing.\n", + "\n", + "Together, these tools enable comprehensive LLM evaluation within a structured, compliant framework.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "### Before you begin\n", + "\n", + "This notebook assumes you have basic familiarity with Python and Large Language Models. You'll need:\n", + "\n", + "- Python 3.8 or higher\n", + "- Access to OpenAI API (for DeepEval metrics evaluation)\n", + "- ValidMind account and model registration\n", + "\n", + "If you encounter errors due to missing modules, install them with `pip install` and re-run the notebook.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "### Key concepts\n", + "\n", + "**LLMTestCase**: A DeepEval object that represents a single test case with input, expected output, actual output, and optional context.\n", + "\n", + "**Golden Templates**: Pre-defined test templates with inputs and expected outputs that can be converted to test cases by generating actual outputs.\n", + "\n", + "**G-Eval**: Generative evaluation using LLMs to assess response quality based on custom criteria.\n", + "\n", + "**LLMAgentDataset**: A ValidMind dataset class that bridges DeepEval test cases with ValidMind's testing infrastructure.\n", + "\n", + "**RAG Evaluation**: Testing retrieval-augmented generation systems that combine document retrieval with generation.\n", + "\n", + "**Agent Evaluation**: Testing LLM agents that can use tools and perform multi-step reasoning.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## Setting up\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "### Install required packages\n", + "\n", + "First, let's install the required packages and set up our environment.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -q validmind" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "### Initialize ValidMind\n", + "\n", + "ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the ValidMind Library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.\n", + "\n", + "
For access to all features available in this notebook, you'll need access to a ValidMind account.\n", + "

\n", + "Register with ValidMind
\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load your model identifier credentials from an `.env` file\n", + "%load_ext dotenv\n", + "%dotenv .env\n", + "\n", + "# Or replace with your code snippet\n", + "import validmind as vm\n", + "\n", + "vm.init(\n", + " api_host=\"...\",\n", + " api_key=\"...\",\n", + " api_secret=\"...\",\n", + " model=\"...\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Core imports\n", + "import pandas as pd\n", + "import warnings\n", + "from deepeval.test_case import LLMTestCase, ToolCall, LLMTestCaseParams\n", + "from deepeval.dataset import Golden\n", + "from deepeval.metrics import GEval\n", + "from validmind.datasets.llm import LLMAgentDataset\n", + "\n", + "warnings.filterwarnings('ignore')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## Basic Usage - Simple Q&A Evaluation\n", + "\n", + "Let's start with the simplest use case: evaluating a basic question-and-answer interaction with an LLM. This demonstrates how to create LLMTestCase objects and integrate them with ValidMind's dataset infrastructure.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Step 1: Create a simple LLM test case\n", + "print(\"Creating a simple Q&A test case...\")\n", + "\n", + "simple_test_cases = [\n", + "LLMTestCase(\n", + " input=\"What is machine learning?\",\n", + " actual_output=\"\"\"Machine learning is a subset of artificial intelligence (AI) that enables \n", + " computers to learn and make decisions from data without being explicitly programmed for every task. \n", + " It uses algorithms to find patterns in data and make predictions or decisions based on those patterns.\"\"\",\n", + " expected_output=\"\"\"Machine learning is a method of data analysis that automates analytical \n", + " model building. It uses algorithms that iteratively learn from data, allowing computers to find \n", + " hidden insights without being explicitly programmed where to look.\"\"\",\n", + " context=[\"Machine learning is a branch of AI that focuses on algorithms that can learn from data.\"],\n", + " retrieval_context=[\"Machine learning is a branch of AI that focuses on algorithms that can learn from data.\"]\n", + "),\n", + "LLMTestCase(\n", + " input=\"What is deep learning?\",\n", + " actual_output=\"\"\"Bananas are yellow fruits that grow on trees in tropical climates. \n", + " They are rich in potassium and make a great healthy snack. You can also use them \n", + " in smoothies and baking.\"\"\",\n", + " expected_output=\"\"\"Deep learning is an advanced machine learning technique that uses neural networks\n", + " with many layers to automatically learn representations of data with multiple levels of abstraction.\n", + " It has enabled major breakthroughs in AI applications.\"\"\",\n", + " context=[\"Deep learning is a specialized machine learning approach that uses deep neural networks to learn from data.\"],\n", + " retrieval_context=[\"Deep learning is a specialized machine learning approach that uses deep neural networks to learn from data.\"]\n", + ")]\n", + "\n", + "\n", + "# Step 2: Create LLMAgentDataset from the test case\n", + "print(\"\\nCreating ValidMind dataset...\")\n", + "\n", + "simple_dataset = LLMAgentDataset.from_test_cases(\n", + " test_cases=simple_test_cases,\n", + " input_id=\"simple_qa_dataset\"\n", + ")\n", + "\n", + "# Display the dataset\n", + "print(\"\\nDataset preview:\")\n", + "display(simple_dataset.df)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def agent_fn(input):\n", + " \"\"\"\n", + " Invoke the simplified agent with the given input.\n", + " \"\"\"\n", + " \n", + " return 1.23\n", + "\n", + " \n", + "vm_model = vm.init_model(\n", + " predict_fn=agent_fn,\n", + " input_id=\"test_model\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "simple_dataset._df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "simple_dataset.assign_scores(metrics = \"validmind.scorer.llm.deepeval.AnswerRelevancy\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "simple_dataset._df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## RAG System Evaluation\n", + "\n", + "Now let's evaluate a more complex use case: a Retrieval-Augmented Generation (RAG) system that retrieves relevant documents and generates responses based on them. RAG systems combine document retrieval with text generation, requiring specialized evaluation approaches.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create multiple RAG test cases\n", + "print(\"Creating RAG evaluation test cases...\")\n", + "\n", + "rag_test_cases = [\n", + " LLMTestCase(\n", + " input=\"How do I return a product that doesn't fit?\",\n", + " actual_output=\"\"\"You can return any product within 30 days of purchase for a full refund. \n", + " Simply visit our returns page on the website and follow the step-by-step instructions. \n", + " You'll need your order number and email address. No questions asked!\"\"\",\n", + " expected_output=\"We offer a 30-day return policy for full refunds. Visit our returns page to start the process.\",\n", + " context=[\"Company policy allows 30-day returns for full refund with no restocking fees.\"],\n", + " retrieval_context=[\n", + " \"Return Policy: All items can be returned within 30 days of purchase for a full refund.\",\n", + " \"Return Process: Visit our website's returns page and enter your order details.\",\n", + " \"Customer Service: Available 24/7 to help with returns and refunds.\",\n", + " \"No restocking fees apply to returns within the 30-day window.\"\n", + " ]\n", + " ),\n", + " LLMTestCase(\n", + " input=\"What are your shipping options and costs?\",\n", + " actual_output=\"\"\"We offer three shipping options: Standard (5-7 days, $5.99), \n", + " Express (2-3 days, $9.99), and Overnight (next day, $19.99). \n", + " Free shipping is available on orders over $50 with Standard delivery.\"\"\",\n", + " expected_output=\"Multiple shipping options available with costs ranging from $5.99 to $19.99. Free shipping on orders over $50.\",\n", + " context=[\"Shipping information includes various speed and cost options.\"],\n", + " retrieval_context=[\n", + " \"Standard Shipping: 5-7 business days, $5.99\",\n", + " \"Express Shipping: 2-3 business days, $9.99\", \n", + " \"Overnight Shipping: Next business day, $19.99\",\n", + " \"Free Standard Shipping on orders over $50\"\n", + " ]\n", + " ),\n", + " LLMTestCase(\n", + " input=\"Do you have a warranty on electronics?\",\n", + " actual_output=\"\"\"Yes, all electronics come with a manufacturer's warranty. \n", + " Most items have a 1-year warranty, while premium products may have 2-3 years. \n", + " We also offer extended warranty options for purchase.\"\"\",\n", + " expected_output=\"Electronics include manufacturer warranty, typically 1-year, with extended options available.\",\n", + " context=[\"Electronics warranty information varies by product type and manufacturer.\"],\n", + " retrieval_context=[\n", + " \"Electronics Warranty: Manufacturer warranty included with all electronic items\",\n", + " \"Standard Coverage: 1 year for most electronics\",\n", + " \"Premium Products: May include 2-3 year coverage\",\n", + " \"Extended Warranty: Available for purchase at checkout\"\n", + " ]\n", + " )\n", + "]\n", + "\n", + "print(f\"Created {len(rag_test_cases)} RAG test cases\")\n", + "\n", + "# Create RAG dataset\n", + "rag_dataset = LLMAgentDataset.from_test_cases(\n", + " test_cases=rag_test_cases,\n", + " input_id=\"rag_evaluation_dataset\"\n", + ")\n", + "\n", + "print(f\"RAG Dataset: {rag_dataset}\")\n", + "print(f\"Shape: {rag_dataset.df.shape}\")\n", + "\n", + "# Show dataset structure\n", + "print(\"\\nRAG Dataset Preview:\")\n", + "display(rag_dataset.df[['input', 'actual_output', 'context', 'retrieval_context']].head())\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## LLM Agent Evaluation\n", + "\n", + "Let's evaluate LLM agents that can use tools to accomplish tasks. This is one of the most advanced evaluation scenarios, requiring assessment of both response quality and tool usage appropriateness.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create LLM Agent test cases with tool usage\n", + "print(\"Creating Agent evaluation test cases...\")\n", + "\n", + "agent_test_cases = [\n", + " LLMTestCase(\n", + " input=\"What's the weather like in New York City today?\",\n", + " actual_output=\"\"\"Based on current weather data, New York City is experiencing partly cloudy skies \n", + " with a temperature of 72°F (22°C). The humidity is at 60% and there's a light breeze from the west at 8 mph. \n", + " No precipitation is expected today.\"\"\",\n", + " expected_output=\"Current weather in New York shows mild temperatures with partly cloudy conditions.\",\n", + " tools_called=[\n", + " ToolCall(\n", + " name=\"WeatherAPI\",\n", + " description=\"Fetches current weather information for a specified location\",\n", + " input_parameters={\"city\": \"New York City\", \"units\": \"fahrenheit\", \"include_forecast\": False},\n", + " output={\n", + " \"temperature\": 72,\n", + " \"condition\": \"partly_cloudy\", \n", + " \"humidity\": 60,\n", + " \"wind_speed\": 8,\n", + " \"wind_direction\": \"west\"\n", + " },\n", + " reasoning=\"User asked for current weather in NYC, so I need to call the weather API\"\n", + " )\n", + " ],\n", + " expected_tools=[\n", + " ToolCall(\n", + " name=\"WeatherAPI\",\n", + " description=\"Should fetch weather information for New York City\",\n", + " input_parameters={\"city\": \"New York City\"}\n", + " )\n", + " ]\n", + " ),\n", + " LLMTestCase(\n", + " input=\"Calculate the compound interest on $1000 invested at 5% annually for 3 years\",\n", + " actual_output=\"\"\"Using the compound interest formula A = P(1 + r)^t, where P = $1000, r = 0.05, and t = 3:\n", + " A = 1000(1 + 0.05)^3 = 1000(1.05)^3 = 1000 × 1.157625 = $1,157.63\n", + " The investment will grow to $1,157.63, earning $157.63 in compound interest.\"\"\",\n", + " expected_output=\"The compound interest calculation shows the investment grows to approximately $1,157.63.\",\n", + " tools_called=[\n", + " ToolCall(\n", + " name=\"Calculator\",\n", + " description=\"Performs mathematical calculations\",\n", + " input_parameters={\"expression\": \"1000 * (1.05 ** 3)\", \"precision\": 2},\n", + " output=1157.63,\n", + " reasoning=\"Need to calculate compound interest using the standard formula\"\n", + " )\n", + " ],\n", + " expected_tools=[\n", + " ToolCall(\n", + " name=\"Calculator\", \n", + " description=\"Should perform compound interest calculation\",\n", + " input_parameters={\"calculation_type\": \"compound_interest\"}\n", + " )\n", + " ]\n", + " ),\n", + " LLMTestCase(\n", + " input=\"Send an email to john@example.com about our meeting tomorrow at 2 PM\",\n", + " actual_output=\"\"\"I've sent an email to john@example.com with the subject \"Meeting Reminder - Tomorrow at 2 PM\". \n", + " The email confirms your meeting scheduled for tomorrow at 2:00 PM and asks John to confirm his attendance. \n", + " The email was successfully delivered to the recipient.\"\"\",\n", + " expected_output=\"Email sent successfully to john@example.com about the 2 PM meeting tomorrow.\",\n", + " tools_called=[\n", + " ToolCall(\n", + " name=\"EmailSender\",\n", + " description=\"Sends emails to specified recipients\",\n", + " input_parameters={\n", + " \"to\": \"john@example.com\",\n", + " \"subject\": \"Meeting Reminder - Tomorrow at 2 PM\", \n", + " \"body\": \"Hi John,\\n\\nThis is a reminder about our meeting scheduled for tomorrow at 2:00 PM. Please confirm your attendance.\\n\\nBest regards\"\n", + " },\n", + " output={\"status\": \"sent\", \"message_id\": \"msg_12345\", \"timestamp\": \"2024-01-15T10:30:00Z\"},\n", + " reasoning=\"User requested to send email, so I need to use the email tool with appropriate content\"\n", + " )\n", + " ],\n", + " expected_tools=[\n", + " ToolCall(\n", + " name=\"EmailSender\",\n", + " description=\"Should send an email about the meeting\",\n", + " input_parameters={\"recipient\": \"john@example.com\"}\n", + " )\n", + " ]\n", + " )\n", + "]\n", + "\n", + "print(f\"Created {len(agent_test_cases)} Agent test cases\")\n", + "\n", + "# Create Agent dataset\n", + "agent_dataset = LLMAgentDataset.from_test_cases(\n", + " test_cases=agent_test_cases,\n", + " input_id=\"agent_evaluation_dataset\"\n", + ")\n", + "\n", + "print(f\"Agent Dataset: {agent_dataset}\")\n", + "print(f\"Shape: {agent_dataset.df.shape}\")\n", + "\n", + "# Analyze tool usage\n", + "tool_usage = {}\n", + "for case in agent_test_cases:\n", + " if case.tools_called:\n", + " for tool in case.tools_called:\n", + " tool_usage[tool.name] = tool_usage.get(tool.name, 0) + 1\n", + "\n", + "print(\"\\nTool Usage Analysis:\")\n", + "for tool, count in tool_usage.items():\n", + " print(f\" - {tool}: {count} times\")\n", + "\n", + "print(\"\\nAgent Dataset Preview:\")\n", + "display(agent_dataset.df[['input', 'actual_output', 'tools_called']].head())\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## Working with Golden Templates\n", + "\n", + "Golden templates are a powerful feature of DeepEval that allow you to define test inputs and expected outputs, then generate actual outputs at evaluation time. This approach enables systematic testing across multiple scenarios.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create Golden templates\n", + "print(\"Creating Golden templates...\")\n", + "\n", + "goldens = [\n", + " Golden(\n", + " input=\"Explain the concept of neural networks in simple terms\",\n", + " expected_output=\"Neural networks are computing systems inspired by biological neural networks that constitute animal brains.\",\n", + " context=[\"Neural networks are a key component of machine learning and artificial intelligence.\"]\n", + " ),\n", + " Golden(\n", + " input=\"What are the main benefits of cloud computing for businesses?\", \n", + " expected_output=\"Cloud computing offers scalability, cost-effectiveness, accessibility, and reduced infrastructure maintenance.\",\n", + " context=[\"Cloud computing provides on-demand access to computing resources over the internet.\"]\n", + " ),\n", + " Golden(\n", + " input=\"How does password encryption protect user data?\",\n", + " expected_output=\"Password encryption converts passwords into unreadable formats using cryptographic algorithms, protecting against unauthorized access.\",\n", + " context=[\"Encryption is a fundamental security technique used to protect sensitive information.\"]\n", + " ),\n", + " Golden(\n", + " input=\"What is the difference between machine learning and deep learning?\",\n", + " expected_output=\"Machine learning is a broad field of AI, while deep learning is a subset that uses neural networks with multiple layers.\",\n", + " context=[\"Both are important areas of artificial intelligence with different approaches and applications.\"]\n", + " )\n", + "]\n", + "\n", + "print(f\"Created {len(goldens)} Golden templates\")\n", + "\n", + "# Create dataset from goldens\n", + "golden_dataset = LLMAgentDataset.from_goldens(\n", + " goldens=goldens,\n", + " input_id=\"golden_templates_dataset\"\n", + ")\n", + "\n", + "print(f\"Golden Dataset: {golden_dataset}\")\n", + "print(f\"Shape: {golden_dataset.df.shape}\")\n", + "\n", + "print(\"\\nGolden Templates Preview:\")\n", + "display(golden_dataset.df[['input', 'expected_output', 'context', 'type']].head())\n", + "\n", + "# Mock LLM application function for demonstration\n", + "def mock_llm_application(input_text: str) -> str:\n", + " \"\"\"\n", + " Simulate an LLM application generating responses.\n", + " In production, this would be your actual LLM application.\n", + " \"\"\"\n", + " \n", + " responses = {\n", + " \"neural networks\": \"\"\"Neural networks are computational models inspired by the human brain. \n", + " They consist of interconnected nodes (neurons) that process information by learning patterns from data. \n", + " These networks can recognize complex patterns and make predictions, making them useful for tasks like \n", + " image recognition, natural language processing, and decision-making.\"\"\",\n", + " \n", + " \"cloud computing\": \"\"\"Cloud computing provides businesses with flexible, scalable access to computing resources \n", + " over the internet. Key benefits include reduced upfront costs, automatic scaling based on demand, \n", + " improved collaboration through shared access, enhanced security through professional data centers, \n", + " and reduced need for internal IT maintenance.\"\"\",\n", + " \n", + " \"password encryption\": \"\"\"Password encryption protects user data by converting passwords into complex, \n", + " unreadable strings using mathematical algorithms. When you enter your password, it's immediately encrypted \n", + " before storage or transmission. Even if data is intercepted, the encrypted password appears as random characters, \n", + " making it virtually impossible for attackers to determine the original password.\"\"\",\n", + " \n", + " \"machine learning\": \"\"\"Machine learning is a broad approach to artificial intelligence where computers learn \n", + " to make predictions or decisions by finding patterns in data. Deep learning is a specialized subset that uses \n", + " artificial neural networks with multiple layers (hence 'deep') to process information in ways that mimic \n", + " human brain function, enabling more sophisticated pattern recognition and decision-making.\"\"\"\n", + " }\n", + " \n", + " # Simple keyword matching for demonstration\n", + " input_lower = input_text.lower()\n", + " for keyword, response in responses.items():\n", + " if keyword in input_lower:\n", + " return response.strip()\n", + " \n", + " return f\"Thank you for your question about: {input_text}. I'd be happy to provide a comprehensive answer based on current knowledge and best practices.\"\n", + "\n", + "print(f\"\\nMock LLM application ready - will generate responses for {len(goldens)} templates\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Convert goldens to test cases by generating actual outputs\n", + "print(\"Converting Golden templates to test cases...\")\n", + "\n", + "print(\"Before conversion:\")\n", + "print(f\" - Test cases: {len(golden_dataset.test_cases)}\")\n", + "print(f\" - Goldens: {len(golden_dataset.goldens)}\")\n", + "\n", + "# Convert goldens to test cases using our mock LLM\n", + "golden_dataset.convert_goldens_to_test_cases(mock_llm_application)\n", + "\n", + "print(\"\\nAfter conversion:\")\n", + "print(f\" - Test cases: {len(golden_dataset.test_cases)}\")\n", + "print(f\" - Goldens: {len(golden_dataset.goldens)}\")\n", + "\n", + "print(\"\\nConversion completed!\")\n", + "\n", + "# Show the updated dataset\n", + "print(\"\\nUpdated Dataset with Generated Outputs:\")\n", + "dataset_df = golden_dataset.df\n", + "# Filter for rows with actual output\n", + "mask = pd.notna(dataset_df['actual_output']) & (dataset_df['actual_output'] != '')\n", + "converted_df = dataset_df[mask]\n", + "\n", + "if not converted_df.empty:\n", + " display(converted_df[['input', 'actual_output', 'expected_output']])\n", + " \n", + " # Analyze output lengths using pandas string methods\n", + " actual_lengths = pd.Series([len(str(x)) for x in converted_df['actual_output']])\n", + " expected_lengths = pd.Series([len(str(x)) for x in converted_df['expected_output']])\n", + "else:\n", + " print(\"No converted test cases found\")\n", + "\n", + "print(f\"\\nOutput Analysis:\")\n", + "print(f\"Average actual output length: {actual_lengths.mean():.0f} characters\")\n", + "print(f\"Average expected output length: {expected_lengths.mean():.0f} characters\")\n", + "print(f\"Ratio (actual/expected): {(actual_lengths.mean() / expected_lengths.mean()):.2f}x\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## ValidMind Integration\n", + "\n", + "Now let's demonstrate how to integrate our LLMAgentDataset with ValidMind's testing framework, enabling comprehensive documentation and compliance features.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize ValidMind\n", + "print(\"Integrating with ValidMind framework...\")\n", + "\n", + "try:\n", + " # Initialize ValidMind\n", + " vm.init()\n", + " print(\"ValidMind initialized\")\n", + " \n", + " # Register our datasets with ValidMind\n", + " datasets_to_register = [\n", + " (simple_dataset, \"simple_qa_dataset\"),\n", + " (rag_dataset, \"rag_evaluation_dataset\"),\n", + " (agent_dataset, \"agent_evaluation_dataset\"),\n", + " (golden_dataset, \"golden_templates_dataset\")\n", + " ]\n", + " \n", + " for dataset, dataset_id in datasets_to_register:\n", + " try:\n", + " vm.init_dataset(\n", + " dataset=dataset.df,\n", + " input_id=dataset_id,\n", + " text_column=\"input\",\n", + " target_column=\"expected_output\"\n", + " )\n", + " print(f\"Registered: {dataset_id}\")\n", + " except Exception as e:\n", + " print(f\"WARNING: Failed to register {dataset_id}: {e}\")\n", + " \n", + " # Note: ValidMind datasets are now registered and can be used in test suites\n", + " print(\"\\nValidMind Integration Complete:\")\n", + " print(\" - Datasets registered successfully\")\n", + " print(\" - Ready for use in ValidMind test suites\")\n", + " print(\" - Can be referenced by their input_id in test configurations\")\n", + " \n", + "except Exception as e:\n", + " print(f\"ERROR: ValidMind integration failed: {e}\")\n", + " print(\"Note: Some ValidMind features may require additional setup\")\n", + "\n", + "# Demonstrate dataset compatibility\n", + "print(f\"\\nDataset Compatibility Check:\")\n", + "print(f\"All datasets inherit from VMDataset: SUCCESS\")\n", + "\n", + "for dataset, name in [(simple_dataset, \"Simple Q&A\"), (rag_dataset, \"RAG\"), (agent_dataset, \"Agent\"), (golden_dataset, \"Golden\")]:\n", + " print(f\"\\n{name} Dataset:\")\n", + " print(f\" - Type: {type(dataset).__name__}\")\n", + " print(f\" - Inherits VMDataset: {hasattr(dataset, 'df')}\")\n", + " print(f\" - Has text_column: {hasattr(dataset, 'text_column')}\")\n", + " print(f\" - Has target_column: {hasattr(dataset, 'target_column')}\")\n", + " print(f\" - DataFrame shape: {dataset.df.shape}\")\n", + " print(f\" - Columns: {len(dataset.columns)}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## Custom Metrics with G-Eval\n", + "\n", + "One of DeepEval's most powerful features is the ability to create custom evaluation metrics using G-Eval (Generative Evaluation). This enables domain-specific evaluation criteria tailored to your use case.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create custom evaluation metrics using G-Eval\n", + "print(\"Creating custom evaluation metrics...\")\n", + "\n", + "# Custom metric 1: Technical Accuracy\n", + "technical_accuracy_metric = GEval(\n", + " name=\"Technical Accuracy\",\n", + " criteria=\"\"\"Evaluate whether the response is technically accurate and uses appropriate \n", + " terminology for the domain. Consider if the explanations are scientifically sound \n", + " and if technical concepts are explained correctly.\"\"\",\n", + " evaluation_params=[\n", + " LLMTestCaseParams.INPUT,\n", + " LLMTestCaseParams.ACTUAL_OUTPUT,\n", + " LLMTestCaseParams.CONTEXT\n", + " ],\n", + " threshold=0.8\n", + ")\n", + "\n", + "# Custom metric 2: Clarity and Comprehensiveness \n", + "clarity_metric = GEval(\n", + " name=\"Clarity and Comprehensiveness\",\n", + " criteria=\"\"\"Assess whether the response is clear, well-structured, and comprehensive. \n", + " The response should be easy to understand, logically organized, and address all \n", + " aspects of the user's question without being overly verbose.\"\"\",\n", + " evaluation_params=[\n", + " LLMTestCaseParams.INPUT,\n", + " LLMTestCaseParams.ACTUAL_OUTPUT\n", + " ],\n", + " threshold=0.75\n", + ")\n", + "\n", + "# Custom metric 3: Business Context Appropriateness\n", + "business_context_metric = GEval(\n", + " name=\"Business Context Appropriateness\", \n", + " criteria=\"\"\"Evaluate whether the response is appropriate for a business context. \n", + " Consider if the tone is professional, if the content is relevant to business needs, \n", + " and if it provides actionable information that would be valuable to a business user.\"\"\",\n", + " evaluation_params=[\n", + " LLMTestCaseParams.INPUT,\n", + " LLMTestCaseParams.ACTUAL_OUTPUT,\n", + " LLMTestCaseParams.EXPECTED_OUTPUT\n", + " ],\n", + " threshold=0.7\n", + ")\n", + "\n", + "# Custom metric 4: Tool Usage Appropriateness (for agents)\n", + "tool_usage_metric = GEval(\n", + " name=\"Tool Usage Appropriateness\",\n", + " criteria=\"\"\"Evaluate whether the agent used appropriate tools for the given task. \n", + " Consider if the tools were necessary, if they were used correctly, and if the \n", + " agent's reasoning for tool selection was sound.\"\"\",\n", + " evaluation_params=[\n", + " LLMTestCaseParams.INPUT,\n", + " LLMTestCaseParams.ACTUAL_OUTPUT\n", + " ],\n", + " threshold=0.8\n", + ")\n", + "\n", + "custom_metrics = [\n", + " technical_accuracy_metric,\n", + " clarity_metric, \n", + " business_context_metric,\n", + " tool_usage_metric\n", + "]\n", + "\n", + "print(\"Custom metrics created:\")\n", + "for metric in custom_metrics:\n", + " print(f\" - {metric.name}: threshold {metric.threshold}\")\n", + "\n", + "# Demonstrate metric application to different dataset types\n", + "print(f\"\\nMetric-Dataset Matching:\")\n", + "metric_dataset_pairs = [\n", + " (\"Technical Accuracy\", \"golden_templates_dataset (tech questions)\"),\n", + " (\"Clarity and Comprehensiveness\", \"simple_qa_dataset (general Q&A)\"),\n", + " (\"Business Context Appropriateness\", \"rag_evaluation_dataset (business support)\"),\n", + " (\"Tool Usage Appropriateness\", \"agent_evaluation_dataset (agent actions)\")\n", + "]\n", + "\n", + "for metric_name, dataset_name in metric_dataset_pairs:\n", + " print(f\" - {metric_name} → {dataset_name}\")\n", + "\n", + "print(f\"\\nEvaluation Setup (Demo Mode):\")\n", + "print(\"Note: Actual evaluation requires OpenAI API key\")\n", + "print(\"These metrics would evaluate:\")\n", + "print(\" - Technical accuracy of AI/ML explanations\") \n", + "print(\" - Clarity of business support responses\")\n", + "print(\" - Appropriateness of agent tool usage\")\n", + "print(\" - Overall comprehensiveness across all domains\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## In summary\n", + "\n", + "This notebook demonstrated the comprehensive integration between DeepEval and ValidMind for LLM evaluation:\n", + "\n", + "**Key Achievements:**\n", + "- Successfully created and evaluated different types of LLM test cases (Q&A, RAG, Agents)\n", + "- Integrated DeepEval metrics with ValidMind's testing infrastructure\n", + "- Demonstrated Golden template workflows for systematic testing\n", + "- Created custom evaluation metrics using G-Eval\n", + "- Showed how to handle complex agent scenarios with tool usage\n", + "\n", + "**Integration Benefits:**\n", + "- **Comprehensive Coverage**: Evaluate LLMs across 30+ specialized metrics\n", + "- **Structured Documentation**: Leverage ValidMind's compliance and documentation features\n", + "- **Flexibility**: Support for custom metrics and domain-specific evaluation criteria\n", + "- **Production Ready**: Handle real-world LLM evaluation scenarios at scale\n", + "\n", + "The `LLMAgentDataset` class provides a seamless bridge between DeepEval's evaluation capabilities and ValidMind's testing infrastructure, enabling robust LLM evaluation within a structured, compliant framework.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "\n", + "\n", + "## Next steps\n", + "\n", + "**Explore Advanced Features:**\n", + "- **Continuous Evaluation**: Set up automated LLM evaluation pipelines\n", + "- **A/B Testing**: Compare different LLM models and configurations\n", + "- **Metrics Customization**: Create domain-specific evaluation criteria\n", + "- **Integration Patterns**: Embed evaluation into your LLM development workflow\n", + "\n", + "**Additional Resources:**\n", + "- [ValidMind Library Documentation](https://docs.validmind.ai/developer/validmind-library.html) - Complete API reference and tutorials\n", + "\n", + "**Try These Examples:**\n", + "- Implement custom business-specific evaluation metrics\n", + "- Create automated evaluation pipelines for model deployment\n", + "- Integrate with your existing ML infrastructure and workflows\n", + "- Explore multi-modal evaluation scenarios (text, code, images)\n", + "\n", + "Start building comprehensive LLM evaluation workflows that combine the power of DeepEval's specialized metrics with ValidMind's structured testing and documentation framework.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "ValidMind Library", + "language": "python", + "name": "validmind" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/how_to/assign_score_complete_tutorial.ipynb b/notebooks/how_to/assign_scores_complete_tutorial.ipynb similarity index 69% rename from notebooks/how_to/assign_score_complete_tutorial.ipynb rename to notebooks/how_to/assign_scores_complete_tutorial.ipynb index cbb1d14bd..66904ce3d 100644 --- a/notebooks/how_to/assign_score_complete_tutorial.ipynb +++ b/notebooks/how_to/assign_scores_complete_tutorial.ipynb @@ -19,31 +19,32 @@ } }, "source": [ - "The `assign_scores()` method is a powerful feature that allows you to compute and add unit metric scores as new columns in your dataset. This method takes a model and metric(s) as input, computes the specified metrics from the ValidMind unit_metrics library, and adds them as new columns. The computed metrics can be scalar values that apply to the entire dataset or per-row values, providing flexibility in how performance is measured and tracked.\n", + "The `assign_scores()` method is a powerful feature that allows you to compute and add scorer scores as new columns in your dataset. This method takes a model and metric(s) as input, computes the specified metrics from the ValidMind scorer library, and adds them as new columns. The computed metrics provide per-row values, giving you granular insights into model performance at the individual prediction level.\n", "\n", - "In this interactive notebook, we demonstrate how to use the `assign_scores()` method effectively. We'll walk through a complete example using a customer churn dataset, showing how to compute and assign both dataset-level metrics (like overall F1 score) and row-level metrics (like prediction probabilities). You'll learn how to work with single and multiple unit metrics, pass custom parameters, and handle different metric types - all while maintaining a clean, organized dataset structure. Currently, assign_scores() supports all metrics available in the validmind.unit_metrics module.\n", + "In this interactive notebook, we demonstrate how to use the `assign_scores()` method effectively. We'll walk through a complete example using a customer churn dataset, showing how to compute and assign row-level metrics (like Brier Score and Log Loss) that provide detailed performance insights for each prediction. You'll learn how to work with single and multiple scorers, pass custom parameters, and handle different metric types - all while maintaining a clean, organized dataset structure. Currently, assign_scores() supports all metrics available in the validmind.scorer module.\n", "\n", - "**The Power of Integrated Scoring**\n", + "**The Power of Row-Level Scoring**\n", "\n", - "Traditional model evaluation workflows often involve computing metrics separately from your core dataset, leading to fragmented analysis and potential data misalignment. The `assign_scores()` method addresses this challenge by:\n", + "Traditional model evaluation workflows often focus on aggregate metrics that provide overall performance summaries. The `assign_scores()` method complements this by providing granular, row-level insights that help you:\n", "\n", - "- **Seamless Integration**: Directly embedding computed metrics as dataset columns using a consistent naming convention\n", - "- **Enhanced Traceability**: Maintaining clear links between model predictions and performance metrics\n", - "- **Simplified Analysis**: Enabling straightforward comparison of metrics across different models and datasets\n", - "- **Standardized Workflow**: Providing a unified approach to metric computation and storage\n", + "- **Identify Problematic Predictions**: Spot individual cases where your model performs poorly\n", + "- **Understand Model Behavior**: Analyze how model performance varies across different types of inputs\n", + "- **Enable Detailed Analysis**: Perform targeted investigations on specific subsets of your data\n", + "- **Support Model Debugging**: Pinpoint exactly where and why your model makes errors\n", "\n", "**Understanding assign_scores()**\n", "\n", - "The `assign_scores()` method computes unit metrics for a given model-dataset combination and adds the results as new columns to your dataset. Each new column follows the naming convention: `{model.input_id}_{metric_name}`, ensuring clear identification of which model and metric combination generated each score.\n", + "The `assign_scores()` method computes row metrics for a given model-dataset combination and adds the results as new columns to your dataset. Each new column follows the naming convention: `{model.input_id}_{metric_name}`, ensuring clear identification of which model and metric combination generated each score.\n", "\n", "Key features:\n", "\n", + "- **Row-Level Focus**: Computes per-prediction metrics rather than aggregate scores\n", "- **Flexible Input**: Accepts single metrics or lists of metrics\n", "- **Parameter Support**: Allows passing additional parameters to underlying metric implementations\n", "- **Multi-Model Support**: Can assign scores from multiple models to the same dataset\n", "- **Type Agnostic**: Works with classification, regression, and other model types\n", "\n", - "This approach streamlines your model evaluation workflow, making performance metrics an integral part of your dataset rather than external calculations.\n" + "This approach provides detailed insights into your model's performance at the individual prediction level, enabling more sophisticated analysis and debugging workflows." ] }, { @@ -67,13 +68,14 @@ "- [Assign predictions](#toc7_) \n", "- [Using assign_scores()](#toc8_) \n", " - [Basic Usage](#toc8_1_) \n", - " - [Single Metric Assignment](#toc8_2_) \n", - " - [Multiple Metrics Assignment](#toc8_3_) \n", - " - [Passing Parameters to Metrics](#toc8_4_) \n", - " - [Working with Different Metric Types](#toc8_5_) \n", + " - [Single Scorer Assignment](#toc8_2_) \n", + " - [A Scorer returns complex object](#toc8_2_1) \n", + " - [Multiple Scorers Assignment](#toc8_3_) \n", + " - [Passing Parameters to Scorer](#toc8_4_) \n", "- [Advanced assign_scores() Usage](#toc9_) \n", - " - [Multi-Model Scoring](#toc9_1_) \n", - " - [Individual Metrics](#toc9_2_) \n", + " - [Multi-Model scorers](#toc9_1_) \n", + " - [Scorer Metrics](#toc9_2_) \n", + " - [Custom Scorer](#toc9_2_) \n", "- [Next steps](#toc12_) \n", " - [Work with your model documentation](#toc12_1_) \n", " - [Discover more learning resources](#toc12_2_) \n", @@ -148,7 +150,7 @@ "metadata": {}, "outputs": [], "source": [ - "%pip install -q validmind\n" + "%pip install -q validmind" ] }, { @@ -203,10 +205,10 @@ "import validmind as vm\n", "\n", "vm.init(\n", - " # api_host=\"...\",\n", - " # api_key=\"...\",\n", - " # api_secret=\"...\",\n", - " # model=\"...\",\n", + " api_host=\"...\",\n", + " api_key=\"...\",\n", + " api_secret=\"...\",\n", + " model=\"...\",\n", ")\n" ] }, @@ -432,9 +434,54 @@ "source": [ "\n", "\n", - "### Single Metric Assignment\n", + "### Single Scorer Assignment\n", + " \n", + "Let's start by assigning a single Scorer - the Brier Score - for our XGBoost model on the test dataset.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Assign Brier Score for XGBoost model\n", + "vm_test_ds.assign_scores(metrics = \"validmind.scorer.classification.BrierScore\", model = vm_xgb_model)\n", + "\n", + "print(\"After assigning Brier Score:\")\n", + "print(f\"New column added: {vm_test_ds.df.columns}\")\n", + "# Display the metric values\n", + "vm_test_ds.df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "### A Scorer returns complex object \n", + " The OutlierScore scorer demonstrates how scorers can return complex objects. It returns a dictionary containing per-row outlier detection results. For each row, it includes:\n", + " - is_outlier: Boolean indicating if the row is an outlier\n", + " - anomaly_score: Numerical score indicating degree of outlierness\n", + " - isolation_path: Length of isolation path in the tree\n", + "\n", + "When assigned to a dataset, these dictionary values are automatically unpacked into separate columns with appropriate prefixes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Assign Brier Score for XGBoost model\n", + "vm_test_ds.assign_scores(metrics = \"validmind.scorer.classification.OutlierScore\", model = vm_xgb_model)\n", "\n", - "Let's start by assigning a single metric - the F1 score - for our XGBoost model on the test dataset.\n" + "print(\"After assigning Score With Confidence:\")\n", + "print(f\"New column added: {vm_test_ds.df.columns}\")\n", + "# Display the metric values\n", + "vm_test_ds.df.head()" ] }, { @@ -443,11 +490,13 @@ "metadata": {}, "outputs": [], "source": [ - "# Assign F1 score for XGBoost model\n", - "vm_test_ds.assign_scores(vm_xgb_model, \"F1\")\n", + "# Assign Brier Score for XGBoost model\n", + "vm_test_ds.assign_scores(\"validmind.scorer.classification.OutlierScore\")\n", "\n", - "print(\"After assigning F1 score:\")\n", - "print(f\"New column added: {vm_test_ds.df.columns}\")\n" + "print(\"After assigning Score With Confidence:\")\n", + "print(f\"New column added: {vm_test_ds.df.columns}\")\n", + "# Display the metric values\n", + "vm_test_ds.df.head()" ] }, { @@ -460,9 +509,9 @@ "source": [ "\n", "\n", - "### Multiple Metrics Assignment\n", + "### Multiple Scorers Assignment\n", "\n", - "We can assign multiple metrics at once by passing a list of metric names. This is more efficient than calling assign_scores() multiple times.\n" + "We can assign multiple metrics at once by passing a list of Scorer names. This is more efficient than calling assign_scores() multiple times.\n" ] }, { @@ -472,20 +521,20 @@ "outputs": [], "source": [ "# Assign multiple classification metrics for the Random Forest model\n", - "classification_metrics = [\"Precision\", \"Recall\", \"Accuracy\", \"ROC_AUC\"]\n", + "scorer = [\n", + " \"validmind.scorer.classification.BrierScore\",\n", + " \"validmind.scorer.classification.LogLoss\",\n", + " \"validmind.scorer.classification.Confidence\"\n", + "]\n", "\n", - "vm_test_ds.assign_scores(vm_rf_model, classification_metrics)\n", + "vm_test_ds.assign_scores(metrics = scorer, model = vm_rf_model)\n", "\n", - "print(\"After assigning multiple metrics for Random Forest:\")\n", + "print(\"After assigning multiple row metrics for Random Forest:\")\n", "rf_columns = [col for col in vm_test_ds.df.columns if 'random_forest_model' in col]\n", "print(f\"Random Forest columns: {rf_columns}\")\n", "\n", "# Display the metric values\n", - "for metric in classification_metrics:\n", - " col_name = f\"random_forest_model_{metric}\"\n", - " if col_name in vm_test_ds.df.columns:\n", - " value = vm_test_ds.df[col_name].iloc[0]\n", - " print(f\"{metric}: {value:.4f}\")\n" + "vm_test_ds.df[rf_columns].head()\n" ] }, { @@ -498,9 +547,9 @@ "source": [ "\n", "\n", - "### Passing Parameters to Metrics\n", + "### Passing Parameters to Scorer\n", "\n", - "Many unit metrics accept additional parameters that are passed through to the underlying sklearn implementations. Let's demonstrate this with the ROC_AUC metric.\n" + "Many row metrics accept additional parameters that are passed through to the underlying implementations. Let's demonstrate this with the LogLoss metric.\n" ] }, { @@ -509,21 +558,23 @@ "metadata": {}, "outputs": [], "source": [ - "# Assign ROC_AUC with different averaging strategies\n", - "vm_test_ds.assign_scores(vm_xgb_model, \"ROC_AUC\", average=\"macro\")\n", + "# Assign LogLoss\n", + "vm_test_ds.assign_scores(metrics = \"validmind.scorer.classification.LogLoss\", model = vm_xgb_model, eps = 1e-16)\n", "\n", "# We can also assign with different parameters by calling assign_scores again\n", "# Note: This will overwrite the previous column with the same name\n", - "print(\"ROC_AUC assigned with macro averaging\")\n", + "print(\"LogLoss assigned successfully\")\n", "\n", - "# Let's also assign precision and recall with different averaging\n", - "vm_test_ds.assign_scores(vm_xgb_model, [\"Precision\", \"Recall\"], average=\"weighted\")\n", + "# Let's also assign BrierScore and Confidence\n", + "vm_test_ds.assign_scores(metrics = [\"validmind.scorer.classification.BrierScore\",\"validmind.scorer.classification.Confidence\"], model = vm_xgb_model)\n", "\n", - "print(\"Precision and Recall assigned with weighted averaging\")\n", + "print(\"BrierScore and Confidence assigned successfully\")\n", "\n", "# Display current XGBoost metric columns\n", "xgb_columns = [col for col in vm_test_ds.df.columns if 'xgboost_model' in col]\n", - "print(f\"\\nXGBoost model columns: {xgb_columns}\")\n" + "print(f\"\\nXGBoost model columns: {xgb_columns}\")\n", + "\n", + "vm_test_ds.df[xgb_columns].head()\n" ] }, { @@ -536,9 +587,9 @@ "source": [ "\n", "\n", - "### Multi-Model Scoring\n", + "### Multi-Model scorers\n", "\n", - "One of the powerful features of assign_scores() is the ability to assign scores from multiple models to the same dataset, enabling easy model comparison.\n" + "One of the powerful features of assign_scores() is the ability to assign scores from multiple models to the same dataset, enabling detailed model comparison at the prediction level.\n" ] }, { @@ -548,15 +599,20 @@ "outputs": [], "source": [ "# Let's assign a comprehensive set of metrics for both models\n", - "comprehensive_metrics = [\"F1\", \"Precision\", \"Recall\", \"Accuracy\", \"ROC_AUC\"]\n", + "comprehensive_metrics = [\n", + " \"validmind.scorer.classification.BrierScore\",\n", + " \"validmind.scorer.classification.LogLoss\",\n", + " \"validmind.scorer.classification.Confidence\",\n", + " \"validmind.scorer.classification.Correctness\"\n", + "]\n", "\n", "# Assign for XGBoost model\n", - "vm_test_ds.assign_scores(vm_xgb_model, comprehensive_metrics)\n", + "vm_test_ds.assign_scores(metrics = comprehensive_metrics, model = vm_xgb_model)\n", "\n", "# Assign for Random Forest model}\n", - "vm_test_ds.assign_scores(vm_rf_model, comprehensive_metrics)\n", + "vm_test_ds.assign_scores(metrics = comprehensive_metrics, model = vm_rf_model)\n", "\n", - "print(\"Comprehensive metrics assigned for both models!\")\n" + "print(\"Row-level metrics assigned for both models!\")\n" ] }, { @@ -565,14 +621,16 @@ "source": [ "\n", "\n", - "### Individual Metrics\n", + "### Scorer Metrics\n", "The next section demonstrates how to assign individual metrics that compute scores per row, rather than aggregate metrics.\n", - "We'll use two important metrics:\n", + "We'll use several important row metrics:\n", " \n", "- Brier Score: Measures how well calibrated the model's probability predictions are for each individual prediction\n", "- Log Loss: Evaluates how well the predicted probabilities match the true labels on a per-prediction basis\n", + "- Confidence: Measures the model's confidence in its predictions for each row\n", + "- Correctness: Indicates whether each prediction is correct (1) or incorrect (0)\n", "\n", - "Both metrics provide more granular insights into model performance at the individual prediction level.\n" + "All these metrics provide granular insights into model performance at the individual prediction level.\n" ] }, { @@ -585,16 +643,16 @@ "print(\"Adding individual metrics...\")\n", "\n", "# Add Brier Score - measures accuracy of probabilistic predictions per row\n", - "vm_test_ds.assign_scores(vm_xgb_model, \"BrierScore\")\n", + "vm_test_ds.assign_scores(metrics = \"validmind.scorer.classification.BrierScore\", model = vm_xgb_model)\n", "print(\"Added Brier Score - lower values indicate better calibrated probabilities\")\n", "\n", "# Add Log Loss - measures how well the predicted probabilities match true labels per row\n", - "vm_test_ds.assign_scores(vm_xgb_model, \"LogLoss\")\n", + "vm_test_ds.assign_scores(metrics = \"validmind.scorer.classification.LogLoss\", model = vm_xgb_model)\n", "print(\"Added Log Loss - lower values indicate better probability estimates\")\n", "\n", "# Create a comparison summary showing first few rows of individual metrics\n", "print(\"\\nFirst few rows of individual metrics:\")\n", - "individual_metrics = [col for col in vm_test_ds.df.columns if any(m in col for m in ['BrierScore', 'LogLoss'])]\n", + "individual_metrics = [col for col in vm_test_ds.df.columns if any(m in col for m in ['BrierScore', 'LogLoss', 'Confidence', 'Correctness'])]\n", "print(vm_test_ds.df[individual_metrics].head())\n" ] }, @@ -607,6 +665,64 @@ "vm_test_ds._df.head()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Custom Scorer \n", + "Let's see how to create your own custom scorers using the `@scorer` decorator.\n", + " \n", + "The example below demonstrates a scorer that looks at the class balance in the neighborhood around each data point. For each row, it will give you a score from 0 to 1, where a score closer to 1 means there's a nice even balance of classes in that area of your data. This can help you identify regions where your classes are well-mixed vs regions dominated by a single class.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from validmind.scorer import scorer\n", + "import numpy as np\n", + "\n", + "@scorer(\"my_scorers.TestScorer\") \n", + "def test_scorer(model, dataset):\n", + " \"\"\"Custom scorer that calculates class balance ratio.\n", + " \n", + " Args:\n", + " model: Not used in this scorer\n", + " dataset: The dataset to analyze\n", + " \n", + " Returns:\n", + " numpy.ndarray: Array of class balance ratios between 0 and 1,\n", + " where values closer to 1 indicate better class balance in the local neighborhood\n", + " \"\"\"\n", + " # Get target values\n", + " y = dataset.df[dataset.target_column].values\n", + " \n", + " # Calculate local class balance in sliding windows\n", + " window_size = 100\n", + " balance_scores = []\n", + " \n", + " for i in range(len(y)):\n", + " start_idx = max(0, i - window_size//2)\n", + " end_idx = min(len(y), i + window_size//2)\n", + " window = y[start_idx:end_idx]\n", + " \n", + " # Calculate ratio of minority class\n", + " class_ratio = np.mean(window)\n", + " # Adjust to be symmetric around 0.5\n", + " balance_score = 1 - abs(0.5 - class_ratio) * 2\n", + " \n", + " balance_scores.append(balance_score)\n", + " \n", + " return np.array(balance_scores)\n", + "\n", + "# Assign the class balance scores to the dataset\n", + "vm_test_ds.assign_scores(metrics = \"my_scorers.TestScorer\", model = vm_xgb_model)\n", + " " + ] + }, { "cell_type": "markdown", "metadata": { diff --git a/poetry.lock b/poetry.lock index da85d1a0c..d7b0c8774 100644 --- a/poetry.lock +++ b/poetry.lock @@ -194,6 +194,33 @@ files = [ {file = "ansicolors-1.1.8.zip", hash = "sha256:99f94f5e3348a0bcd43c82e5fc4414013ccc19d70bd939ad71e0133ce9c372e0"}, ] +[[package]] +name = "anthropic" +version = "0.64.0" +description = "The official Python library for the anthropic API" +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "anthropic-0.64.0-py3-none-any.whl", hash = "sha256:6f5f7d913a6a95eb7f8e1bda4e75f76670e8acd8d4cd965e02e2a256b0429dd1"}, + {file = "anthropic-0.64.0.tar.gz", hash = "sha256:3d496c91a63dff64f451b3e8e4b238a9640bf87b0c11d0b74ddc372ba5a3fe58"}, +] + +[package.dependencies] +anyio = ">=3.5.0,<5" +distro = ">=1.7.0,<2" +httpx = ">=0.25.0,<1" +jiter = ">=0.4.0,<1" +pydantic = ">=1.9.0,<3" +sniffio = "*" +typing-extensions = ">=4.10,<5" + +[package.extras] +aiohttp = ["aiohttp", "httpx-aiohttp (>=0.1.8)"] +bedrock = ["boto3 (>=1.28.57)", "botocore (>=1.31.57)"] +vertex = ["google-auth[requests] (>=2,<3)"] + [[package]] name = "anyio" version = "4.10.0" @@ -474,6 +501,32 @@ files = [ [package.extras] dev = ["backports.zoneinfo ; python_version < \"3.9\"", "freezegun (>=1.0,<2.0)", "jinja2 (>=3.0)", "pytest (>=6.0)", "pytest-cov", "pytz", "setuptools", "tzdata ; sys_platform == \"win32\""] +[[package]] +name = "backoff" +version = "2.2.1" +description = "Function decoration for backoff and retry" +optional = true +python-versions = ">=3.7,<4.0" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8"}, + {file = "backoff-2.2.1.tar.gz", hash = "sha256:03f829f5bb1923180821643f8753b0502c3b682293992485b0eef2807afa5cba"}, +] + +[[package]] +name = "backports-asyncio-runner" +version = "1.2.0" +description = "Backport of asyncio.Runner, a context manager that controls event loop life cycle." +optional = true +python-versions = "<3.11,>=3.8" +groups = ["main"] +markers = "python_version < \"3.11\" and extra == \"llm\"" +files = [ + {file = "backports_asyncio_runner-1.2.0-py3-none-any.whl", hash = "sha256:0da0a936a8aeb554eccb426dc55af3ba63bcdc69fa1a600b5bb305413a4477b5"}, + {file = "backports_asyncio_runner-1.2.0.tar.gz", hash = "sha256:a5aa7b2b7d8f8bfcaa2b57313f70792df84e32a2a746f585213373f900b42162"}, +] + [[package]] name = "backports-tarfile" version = "1.2.0" @@ -662,10 +715,6 @@ files = [ {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:a37b8f0391212d29b3a91a799c8e4a2855e0576911cdfb2515487e30e322253d"}, {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_1_ppc64le.whl", hash = "sha256:e84799f09591700a4154154cab9787452925578841a94321d5ee8fb9a9a328f0"}, {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:f66b5337fa213f1da0d9000bc8dc0cb5b896b726eefd9c6046f699b169c41b9e"}, - {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:5dab0844f2cf82be357a0eb11a9087f70c5430b2c241493fc122bb6f2bb0917c"}, - {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:e4fe605b917c70283db7dfe5ada75e04561479075761a0b3866c081d035b01c1"}, - {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:1e9a65b5736232e7a7f91ff3d02277f11d339bf34099a56cdab6a8b3410a02b2"}, - {file = "Brotli-1.1.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:58d4b711689366d4a03ac7957ab8c28890415e267f9b6589969e74b6e42225ec"}, {file = "Brotli-1.1.0-cp310-cp310-win32.whl", hash = "sha256:be36e3d172dc816333f33520154d708a2657ea63762ec16b62ece02ab5e4daf2"}, {file = "Brotli-1.1.0-cp310-cp310-win_amd64.whl", hash = "sha256:0c6244521dda65ea562d5a69b9a26120769b7a9fb3db2fe9545935ed6735b128"}, {file = "Brotli-1.1.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:a3daabb76a78f829cafc365531c972016e4aa8d5b4bf60660ad8ecee19df7ccc"}, @@ -678,14 +727,8 @@ files = [ {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:19c116e796420b0cee3da1ccec3b764ed2952ccfcc298b55a10e5610ad7885f9"}, {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_1_ppc64le.whl", hash = "sha256:510b5b1bfbe20e1a7b3baf5fed9e9451873559a976c1a78eebaa3b86c57b4265"}, {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:a1fd8a29719ccce974d523580987b7f8229aeace506952fa9ce1d53a033873c8"}, - {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:c247dd99d39e0338a604f8c2b3bc7061d5c2e9e2ac7ba9cc1be5a69cb6cd832f"}, - {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:1b2c248cd517c222d89e74669a4adfa5577e06ab68771a529060cf5a156e9757"}, - {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:2a24c50840d89ded6c9a8fdc7b6ed3692ed4e86f1c4a4a938e1e92def92933e0"}, - {file = "Brotli-1.1.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f31859074d57b4639318523d6ffdca586ace54271a73ad23ad021acd807eb14b"}, {file = "Brotli-1.1.0-cp311-cp311-win32.whl", hash = "sha256:39da8adedf6942d76dc3e46653e52df937a3c4d6d18fdc94a7c29d263b1f5b50"}, {file = "Brotli-1.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:aac0411d20e345dc0920bdec5548e438e999ff68d77564d5e9463a7ca9d3e7b1"}, - {file = "Brotli-1.1.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:32d95b80260d79926f5fab3c41701dbb818fde1c9da590e77e571eefd14abe28"}, - {file = "Brotli-1.1.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b760c65308ff1e462f65d69c12e4ae085cff3b332d894637f6273a12a482d09f"}, {file = "Brotli-1.1.0-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:316cc9b17edf613ac76b1f1f305d2a748f1b976b033b049a6ecdfd5612c70409"}, {file = "Brotli-1.1.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:caf9ee9a5775f3111642d33b86237b05808dafcd6268faa492250e9b78046eb2"}, {file = "Brotli-1.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:70051525001750221daa10907c77830bc889cb6d865cc0b813d9db7fefc21451"}, @@ -696,24 +739,8 @@ files = [ {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:4093c631e96fdd49e0377a9c167bfd75b6d0bad2ace734c6eb20b348bc3ea180"}, {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_1_ppc64le.whl", hash = "sha256:7e4c4629ddad63006efa0ef968c8e4751c5868ff0b1c5c40f76524e894c50248"}, {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:861bf317735688269936f755fa136a99d1ed526883859f86e41a5d43c61d8966"}, - {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:87a3044c3a35055527ac75e419dfa9f4f3667a1e887ee80360589eb8c90aabb9"}, - {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:c5529b34c1c9d937168297f2c1fde7ebe9ebdd5e121297ff9c043bdb2ae3d6fb"}, - {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:ca63e1890ede90b2e4454f9a65135a4d387a4585ff8282bb72964fab893f2111"}, - {file = "Brotli-1.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e79e6520141d792237c70bcd7a3b122d00f2613769ae0cb61c52e89fd3443839"}, {file = "Brotli-1.1.0-cp312-cp312-win32.whl", hash = "sha256:5f4d5ea15c9382135076d2fb28dde923352fe02951e66935a9efaac8f10e81b0"}, {file = "Brotli-1.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:906bc3a79de8c4ae5b86d3d75a8b77e44404b0f4261714306e3ad248d8ab0951"}, - {file = "Brotli-1.1.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:8bf32b98b75c13ec7cf774164172683d6e7891088f6316e54425fde1efc276d5"}, - {file = "Brotli-1.1.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:7bc37c4d6b87fb1017ea28c9508b36bbcb0c3d18b4260fcdf08b200c74a6aee8"}, - {file = "Brotli-1.1.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3c0ef38c7a7014ffac184db9e04debe495d317cc9c6fb10071f7fefd93100a4f"}, - {file = "Brotli-1.1.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:91d7cc2a76b5567591d12c01f019dd7afce6ba8cba6571187e21e2fc418ae648"}, - {file = "Brotli-1.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a93dde851926f4f2678e704fadeb39e16c35d8baebd5252c9fd94ce8ce68c4a0"}, - {file = "Brotli-1.1.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f0db75f47be8b8abc8d9e31bc7aad0547ca26f24a54e6fd10231d623f183d089"}, - {file = "Brotli-1.1.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:6967ced6730aed543b8673008b5a391c3b1076d834ca438bbd70635c73775368"}, - {file = "Brotli-1.1.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:7eedaa5d036d9336c95915035fb57422054014ebdeb6f3b42eac809928e40d0c"}, - {file = "Brotli-1.1.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:d487f5432bf35b60ed625d7e1b448e2dc855422e87469e3f450aa5552b0eb284"}, - {file = "Brotli-1.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:832436e59afb93e1836081a20f324cb185836c617659b07b129141a8426973c7"}, - {file = "Brotli-1.1.0-cp313-cp313-win32.whl", hash = "sha256:43395e90523f9c23a3d5bdf004733246fba087f2948f87ab28015f12359ca6a0"}, - {file = "Brotli-1.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:9011560a466d2eb3f5a6e4929cf4a09be405c64154e12df0dd72713f6500e32b"}, {file = "Brotli-1.1.0-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:a090ca607cbb6a34b0391776f0cb48062081f5f60ddcce5d11838e67a01928d1"}, {file = "Brotli-1.1.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2de9d02f5bda03d27ede52e8cfe7b865b066fa49258cbab568720aa5be80a47d"}, {file = "Brotli-1.1.0-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2333e30a5e00fe0fe55903c8832e08ee9c3b1382aacf4db26664a16528d51b4b"}, @@ -723,10 +750,6 @@ files = [ {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_1_i686.whl", hash = "sha256:fd5f17ff8f14003595ab414e45fce13d073e0762394f957182e69035c9f3d7c2"}, {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_1_ppc64le.whl", hash = "sha256:069a121ac97412d1fe506da790b3e69f52254b9df4eb665cd42460c837193354"}, {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_1_x86_64.whl", hash = "sha256:e93dfc1a1165e385cc8239fab7c036fb2cd8093728cbd85097b284d7b99249a2"}, - {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_2_aarch64.whl", hash = "sha256:aea440a510e14e818e67bfc4027880e2fb500c2ccb20ab21c7a7c8b5b4703d75"}, - {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_2_i686.whl", hash = "sha256:6974f52a02321b36847cd19d1b8e381bf39939c21efd6ee2fc13a28b0d99348c"}, - {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_2_ppc64le.whl", hash = "sha256:a7e53012d2853a07a4a79c00643832161a910674a893d296c9f1259859a289d2"}, - {file = "Brotli-1.1.0-cp36-cp36m-musllinux_1_2_x86_64.whl", hash = "sha256:d7702622a8b40c49bffb46e1e3ba2e81268d5c04a34f460978c6b5517a34dd52"}, {file = "Brotli-1.1.0-cp36-cp36m-win32.whl", hash = "sha256:a599669fd7c47233438a56936988a2478685e74854088ef5293802123b5b2460"}, {file = "Brotli-1.1.0-cp36-cp36m-win_amd64.whl", hash = "sha256:d143fd47fad1db3d7c27a1b1d66162e855b5d50a89666af46e1679c496e8e579"}, {file = "Brotli-1.1.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:11d00ed0a83fa22d29bc6b64ef636c4552ebafcef57154b4ddd132f5638fbd1c"}, @@ -738,10 +761,6 @@ files = [ {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:919e32f147ae93a09fe064d77d5ebf4e35502a8df75c29fb05788528e330fe74"}, {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_1_ppc64le.whl", hash = "sha256:23032ae55523cc7bccb4f6a0bf368cd25ad9bcdcc1990b64a647e7bbcce9cb5b"}, {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:224e57f6eac61cc449f498cc5f0e1725ba2071a3d4f48d5d9dffba42db196438"}, - {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_2_aarch64.whl", hash = "sha256:cb1dac1770878ade83f2ccdf7d25e494f05c9165f5246b46a621cc849341dc01"}, - {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_2_i686.whl", hash = "sha256:3ee8a80d67a4334482d9712b8e83ca6b1d9bc7e351931252ebef5d8f7335a547"}, - {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_2_ppc64le.whl", hash = "sha256:5e55da2c8724191e5b557f8e18943b1b4839b8efc3ef60d65985bcf6f587dd38"}, - {file = "Brotli-1.1.0-cp37-cp37m-musllinux_1_2_x86_64.whl", hash = "sha256:d342778ef319e1026af243ed0a07c97acf3bad33b9f29e7ae6a1f68fd083e90c"}, {file = "Brotli-1.1.0-cp37-cp37m-win32.whl", hash = "sha256:587ca6d3cef6e4e868102672d3bd9dc9698c309ba56d41c2b9c85bbb903cdb95"}, {file = "Brotli-1.1.0-cp37-cp37m-win_amd64.whl", hash = "sha256:2954c1c23f81c2eaf0b0717d9380bd348578a94161a65b3a2afc62c86467dd68"}, {file = "Brotli-1.1.0-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:efa8b278894b14d6da122a72fefcebc28445f2d3f880ac59d46c90f4c13be9a3"}, @@ -754,10 +773,6 @@ files = [ {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:1ab4fbee0b2d9098c74f3057b2bc055a8bd92ccf02f65944a241b4349229185a"}, {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_1_ppc64le.whl", hash = "sha256:141bd4d93984070e097521ed07e2575b46f817d08f9fa42b16b9b5f27b5ac088"}, {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:fce1473f3ccc4187f75b4690cfc922628aed4d3dd013d047f95a9b3919a86596"}, - {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_2_aarch64.whl", hash = "sha256:d2b35ca2c7f81d173d2fadc2f4f31e88cc5f7a39ae5b6db5513cf3383b0e0ec7"}, - {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_2_i686.whl", hash = "sha256:af6fa6817889314555aede9a919612b23739395ce767fe7fcbea9a80bf140fe5"}, - {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_2_ppc64le.whl", hash = "sha256:2feb1d960f760a575dbc5ab3b1c00504b24caaf6986e2dc2b01c09c87866a943"}, - {file = "Brotli-1.1.0-cp38-cp38-musllinux_1_2_x86_64.whl", hash = "sha256:4410f84b33374409552ac9b6903507cdb31cd30d2501fc5ca13d18f73548444a"}, {file = "Brotli-1.1.0-cp38-cp38-win32.whl", hash = "sha256:db85ecf4e609a48f4b29055f1e144231b90edc90af7481aa731ba2d059226b1b"}, {file = "Brotli-1.1.0-cp38-cp38-win_amd64.whl", hash = "sha256:3d7954194c36e304e1523f55d7042c59dc53ec20dd4e9ea9d151f1b62b4415c0"}, {file = "Brotli-1.1.0-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:5fb2ce4b8045c78ebbc7b8f3c15062e435d47e7393cc57c25115cfd49883747a"}, @@ -770,10 +785,6 @@ files = [ {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:949f3b7c29912693cee0afcf09acd6ebc04c57af949d9bf77d6101ebb61e388c"}, {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_1_ppc64le.whl", hash = "sha256:89f4988c7203739d48c6f806f1e87a1d96e0806d44f0fba61dba81392c9e474d"}, {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:de6551e370ef19f8de1807d0a9aa2cdfdce2e85ce88b122fe9f6b2b076837e59"}, - {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:0737ddb3068957cf1b054899b0883830bb1fec522ec76b1098f9b6e0f02d9419"}, - {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_2_i686.whl", hash = "sha256:4f3607b129417e111e30637af1b56f24f7a49e64763253bbc275c75fa887d4b2"}, - {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_2_ppc64le.whl", hash = "sha256:6c6e0c425f22c1c719c42670d561ad682f7bfeeef918edea971a79ac5252437f"}, - {file = "Brotli-1.1.0-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:494994f807ba0b92092a163a0a283961369a65f6cbe01e8891132b7a320e61eb"}, {file = "Brotli-1.1.0-cp39-cp39-win32.whl", hash = "sha256:f0d8a7a6b5983c2496e364b969f0e526647a06b075d034f3297dc66f3b360c64"}, {file = "Brotli-1.1.0-cp39-cp39-win_amd64.whl", hash = "sha256:cdad5b9014d83ca68c25d2e9444e28e967ef16e80f6b436918c700c117a85467"}, {file = "Brotli-1.1.0.tar.gz", hash = "sha256:81de08ac11bcb85841e440c13611c00b67d3bf82698314928d0b676362546724"}, @@ -820,6 +831,19 @@ files = [ [package.dependencies] cffi = ">=1.0.0" +[[package]] +name = "cachetools" +version = "5.5.2" +description = "Extensible memoizing collections and decorators" +optional = true +python-versions = ">=3.7" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "cachetools-5.5.2-py3-none-any.whl", hash = "sha256:d26a22bcc62eb95c3beabd9f1ee5e820d3d2704fe2967cbe350e20c8ffcd3f0a"}, + {file = "cachetools-5.5.2.tar.gz", hash = "sha256:1a661caa9175d26759571b2e19580f9d6393969e5dfca11fdb1f947a23e640d4"}, +] + [[package]] name = "catalogue" version = "2.0.10" @@ -1655,6 +1679,49 @@ files = [ {file = "decorator-5.2.1.tar.gz", hash = "sha256:65f266143752f734b0a7cc83c46f4618af75b8c5911b00ccb61d0ac9b6da0360"}, ] +[[package]] +name = "deepeval" +version = "3.4.0" +description = "The LLM Evaluation Framework" +optional = true +python-versions = "<4.0,>=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "deepeval-3.4.0-py3-none-any.whl", hash = "sha256:ae95fd290f47861e004e5174c995dd0902def477d537cb8c80eff4bd9b93b9bd"}, + {file = "deepeval-3.4.0.tar.gz", hash = "sha256:c21af882f078b220e28a4455e3363abc1f57a45a04fcf380e1b3a2d4a526f5ef"}, +] + +[package.dependencies] +aiohttp = "*" +anthropic = "*" +click = ">=8.0.0,<8.3.0" +google-genai = ">=1.9.0,<2.0.0" +grpcio = ">=1.67.1,<2.0.0" +nest_asyncio = "*" +ollama = "*" +openai = "*" +opentelemetry-api = ">=1.24.0,<2.0.0" +opentelemetry-exporter-otlp-proto-grpc = ">=1.24.0,<2.0.0" +opentelemetry-sdk = ">=1.24.0,<2.0.0" +portalocker = "*" +posthog = ">=6.3.0,<7.0.0" +pyfiglet = "*" +pytest = "*" +pytest-asyncio = "*" +pytest-repeat = "*" +pytest-rerunfailures = ">=12.0,<13.0" +pytest-xdist = "*" +requests = ">=2.31.0,<3.0.0" +rich = ">=13.6.0,<15.0.0" +sentry-sdk = "*" +setuptools = "*" +tabulate = ">=0.9.0,<0.10.0" +tenacity = ">=8.0.0,<=10.0.0" +tqdm = ">=4.66.1,<5.0.0" +typer = ">=0.9,<1.0.0" +wheel = "*" + [[package]] name = "defusedxml" version = "0.7.1" @@ -1805,6 +1872,22 @@ typing-extensions = {version = ">=4.6.0", markers = "python_version < \"3.13\""} [package.extras] test = ["pytest (>=6)"] +[[package]] +name = "execnet" +version = "2.1.1" +description = "execnet: rapid multi-Python deployment" +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "execnet-2.1.1-py3-none-any.whl", hash = "sha256:26dee51f1b80cebd6d0ca8e74dd8745419761d3bef34163928cbebbdc4749fdc"}, + {file = "execnet-2.1.1.tar.gz", hash = "sha256:5189b52c6121c24feae288166ab41b32549c7e2348652736540b9e6e7d4e72e3"}, +] + +[package.extras] +testing = ["hatch", "pre-commit", "pytest", "tox"] + [[package]] name = "executing" version = "2.2.0" @@ -2117,6 +2200,79 @@ test-downstream = ["aiobotocore (>=2.5.4,<3.0.0)", "dask-expr", "dask[dataframe, test-full = ["adlfs", "aiohttp (!=4.0.0a0,!=4.0.0a1)", "cloudpickle", "dask", "distributed", "dropbox", "dropboxdrivefs", "fastparquet", "fusepy", "gcsfs", "jinja2", "kerchunk", "libarchive-c", "lz4", "notebook", "numpy", "ocifs", "pandas", "panel", "paramiko", "pyarrow", "pyarrow (>=1)", "pyftpdlib", "pygit2", "pytest", "pytest-asyncio (!=0.22.0)", "pytest-benchmark", "pytest-cov", "pytest-mock", "pytest-recording", "pytest-rerunfailures", "python-snappy", "requests", "smbprotocol", "tqdm", "urllib3", "zarr", "zstandard"] tqdm = ["tqdm"] +[[package]] +name = "google-auth" +version = "2.40.3" +description = "Google Authentication Library" +optional = true +python-versions = ">=3.7" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "google_auth-2.40.3-py2.py3-none-any.whl", hash = "sha256:1370d4593e86213563547f97a92752fc658456fe4514c809544f330fed45a7ca"}, + {file = "google_auth-2.40.3.tar.gz", hash = "sha256:500c3a29adedeb36ea9cf24b8d10858e152f2412e3ca37829b3fa18e33d63b77"}, +] + +[package.dependencies] +cachetools = ">=2.0.0,<6.0" +pyasn1-modules = ">=0.2.1" +rsa = ">=3.1.4,<5" + +[package.extras] +aiohttp = ["aiohttp (>=3.6.2,<4.0.0)", "requests (>=2.20.0,<3.0.0)"] +enterprise-cert = ["cryptography", "pyopenssl"] +pyjwt = ["cryptography (<39.0.0) ; python_version < \"3.8\"", "cryptography (>=38.0.3)", "pyjwt (>=2.0)"] +pyopenssl = ["cryptography (<39.0.0) ; python_version < \"3.8\"", "cryptography (>=38.0.3)", "pyopenssl (>=20.0.0)"] +reauth = ["pyu2f (>=0.1.5)"] +requests = ["requests (>=2.20.0,<3.0.0)"] +testing = ["aiohttp (<3.10.0)", "aiohttp (>=3.6.2,<4.0.0)", "aioresponses", "cryptography (<39.0.0) ; python_version < \"3.8\"", "cryptography (>=38.0.3)", "flask", "freezegun", "grpcio", "mock", "oauth2client", "packaging", "pyjwt (>=2.0)", "pyopenssl (<24.3.0)", "pyopenssl (>=20.0.0)", "pytest", "pytest-asyncio", "pytest-cov", "pytest-localserver", "pyu2f (>=0.1.5)", "requests (>=2.20.0,<3.0.0)", "responses", "urllib3"] +urllib3 = ["packaging", "urllib3"] + +[[package]] +name = "google-genai" +version = "1.31.0" +description = "GenAI Python SDK" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "google_genai-1.31.0-py3-none-any.whl", hash = "sha256:5c6959bcf862714e8ed0922db3aaf41885bacf6318751b3421bf1e459f78892f"}, + {file = "google_genai-1.31.0.tar.gz", hash = "sha256:8572b47aa684357c3e5e10d290ec772c65414114939e3ad2955203e27cd2fcbc"}, +] + +[package.dependencies] +anyio = ">=4.8.0,<5.0.0" +google-auth = ">=2.14.1,<3.0.0" +httpx = ">=0.28.1,<1.0.0" +pydantic = ">=2.0.0,<3.0.0" +requests = ">=2.28.1,<3.0.0" +tenacity = ">=8.2.3,<9.2.0" +typing-extensions = ">=4.11.0,<5.0.0" +websockets = ">=13.0.0,<15.1.0" + +[package.extras] +aiohttp = ["aiohttp (<4.0.0)"] + +[[package]] +name = "googleapis-common-protos" +version = "1.70.0" +description = "Common protobufs used in Google APIs" +optional = true +python-versions = ">=3.7" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "googleapis_common_protos-1.70.0-py3-none-any.whl", hash = "sha256:b8bfcca8c25a2bb253e0e0b0adaf8c00773e5e6af6fd92397576680b807e0fd8"}, + {file = "googleapis_common_protos-1.70.0.tar.gz", hash = "sha256:0e1b44e0ea153e6594f9f394fef15193a68aaaea2d843f83e2742717ca753257"}, +] + +[package.dependencies] +protobuf = ">=3.20.2,<4.21.1 || >4.21.1,<4.21.2 || >4.21.2,<4.21.3 || >4.21.3,<4.21.4 || >4.21.4,<4.21.5 || >4.21.5,<7.0.0" + +[package.extras] +grpc = ["grpcio (>=1.44.0,<2.0.0)"] + [[package]] name = "greenlet" version = "3.2.4" @@ -2201,6 +2357,71 @@ files = [ [package.dependencies] colorama = ">=0.4" +[[package]] +name = "grpcio" +version = "1.74.0" +description = "HTTP/2-based RPC framework" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "grpcio-1.74.0-cp310-cp310-linux_armv7l.whl", hash = "sha256:85bd5cdf4ed7b2d6438871adf6afff9af7096486fcf51818a81b77ef4dd30907"}, + {file = "grpcio-1.74.0-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:68c8ebcca945efff9d86d8d6d7bfb0841cf0071024417e2d7f45c5e46b5b08eb"}, + {file = "grpcio-1.74.0-cp310-cp310-manylinux_2_17_aarch64.whl", hash = "sha256:e154d230dc1bbbd78ad2fdc3039fa50ad7ffcf438e4eb2fa30bce223a70c7486"}, + {file = "grpcio-1.74.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:e8978003816c7b9eabe217f88c78bc26adc8f9304bf6a594b02e5a49b2ef9c11"}, + {file = "grpcio-1.74.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c3d7bd6e3929fd2ea7fbc3f562e4987229ead70c9ae5f01501a46701e08f1ad9"}, + {file = "grpcio-1.74.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:136b53c91ac1d02c8c24201bfdeb56f8b3ac3278668cbb8e0ba49c88069e1bdc"}, + {file = "grpcio-1.74.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:fe0f540750a13fd8e5da4b3eaba91a785eea8dca5ccd2bc2ffe978caa403090e"}, + {file = "grpcio-1.74.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:4e4181bfc24413d1e3a37a0b7889bea68d973d4b45dd2bc68bb766c140718f82"}, + {file = "grpcio-1.74.0-cp310-cp310-win32.whl", hash = "sha256:1733969040989f7acc3d94c22f55b4a9501a30f6aaacdbccfaba0a3ffb255ab7"}, + {file = "grpcio-1.74.0-cp310-cp310-win_amd64.whl", hash = "sha256:9e912d3c993a29df6c627459af58975b2e5c897d93287939b9d5065f000249b5"}, + {file = "grpcio-1.74.0-cp311-cp311-linux_armv7l.whl", hash = "sha256:69e1a8180868a2576f02356565f16635b99088da7df3d45aaa7e24e73a054e31"}, + {file = "grpcio-1.74.0-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:8efe72fde5500f47aca1ef59495cb59c885afe04ac89dd11d810f2de87d935d4"}, + {file = "grpcio-1.74.0-cp311-cp311-manylinux_2_17_aarch64.whl", hash = "sha256:a8f0302f9ac4e9923f98d8e243939a6fb627cd048f5cd38595c97e38020dffce"}, + {file = "grpcio-1.74.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2f609a39f62a6f6f05c7512746798282546358a37ea93c1fcbadf8b2fed162e3"}, + {file = "grpcio-1.74.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c98e0b7434a7fa4e3e63f250456eaef52499fba5ae661c58cc5b5477d11e7182"}, + {file = "grpcio-1.74.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:662456c4513e298db6d7bd9c3b8df6f75f8752f0ba01fb653e252ed4a59b5a5d"}, + {file = "grpcio-1.74.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:3d14e3c4d65e19d8430a4e28ceb71ace4728776fd6c3ce34016947474479683f"}, + {file = "grpcio-1.74.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1bf949792cee20d2078323a9b02bacbbae002b9e3b9e2433f2741c15bdeba1c4"}, + {file = "grpcio-1.74.0-cp311-cp311-win32.whl", hash = "sha256:55b453812fa7c7ce2f5c88be3018fb4a490519b6ce80788d5913f3f9d7da8c7b"}, + {file = "grpcio-1.74.0-cp311-cp311-win_amd64.whl", hash = "sha256:86ad489db097141a907c559988c29718719aa3e13370d40e20506f11b4de0d11"}, + {file = "grpcio-1.74.0-cp312-cp312-linux_armv7l.whl", hash = "sha256:8533e6e9c5bd630ca98062e3a1326249e6ada07d05acf191a77bc33f8948f3d8"}, + {file = "grpcio-1.74.0-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:2918948864fec2a11721d91568effffbe0a02b23ecd57f281391d986847982f6"}, + {file = "grpcio-1.74.0-cp312-cp312-manylinux_2_17_aarch64.whl", hash = "sha256:60d2d48b0580e70d2e1954d0d19fa3c2e60dd7cbed826aca104fff518310d1c5"}, + {file = "grpcio-1.74.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:3601274bc0523f6dc07666c0e01682c94472402ac2fd1226fd96e079863bfa49"}, + {file = "grpcio-1.74.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:176d60a5168d7948539def20b2a3adcce67d72454d9ae05969a2e73f3a0feee7"}, + {file = "grpcio-1.74.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:e759f9e8bc908aaae0412642afe5416c9f983a80499448fcc7fab8692ae044c3"}, + {file = "grpcio-1.74.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:9e7c4389771855a92934b2846bd807fc25a3dfa820fd912fe6bd8136026b2707"}, + {file = "grpcio-1.74.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:cce634b10aeab37010449124814b05a62fb5f18928ca878f1bf4750d1f0c815b"}, + {file = "grpcio-1.74.0-cp312-cp312-win32.whl", hash = "sha256:885912559974df35d92219e2dc98f51a16a48395f37b92865ad45186f294096c"}, + {file = "grpcio-1.74.0-cp312-cp312-win_amd64.whl", hash = "sha256:42f8fee287427b94be63d916c90399ed310ed10aadbf9e2e5538b3e497d269bc"}, + {file = "grpcio-1.74.0-cp313-cp313-linux_armv7l.whl", hash = "sha256:2bc2d7d8d184e2362b53905cb1708c84cb16354771c04b490485fa07ce3a1d89"}, + {file = "grpcio-1.74.0-cp313-cp313-macosx_11_0_universal2.whl", hash = "sha256:c14e803037e572c177ba54a3e090d6eb12efd795d49327c5ee2b3bddb836bf01"}, + {file = "grpcio-1.74.0-cp313-cp313-manylinux_2_17_aarch64.whl", hash = "sha256:f6ec94f0e50eb8fa1744a731088b966427575e40c2944a980049798b127a687e"}, + {file = "grpcio-1.74.0-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:566b9395b90cc3d0d0c6404bc8572c7c18786ede549cdb540ae27b58afe0fb91"}, + {file = "grpcio-1.74.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e1ea6176d7dfd5b941ea01c2ec34de9531ba494d541fe2057c904e601879f249"}, + {file = "grpcio-1.74.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:64229c1e9cea079420527fa8ac45d80fc1e8d3f94deaa35643c381fa8d98f362"}, + {file = "grpcio-1.74.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:0f87bddd6e27fc776aacf7ebfec367b6d49cad0455123951e4488ea99d9b9b8f"}, + {file = "grpcio-1.74.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:3b03d8f2a07f0fea8c8f74deb59f8352b770e3900d143b3d1475effcb08eec20"}, + {file = "grpcio-1.74.0-cp313-cp313-win32.whl", hash = "sha256:b6a73b2ba83e663b2480a90b82fdae6a7aa6427f62bf43b29912c0cfd1aa2bfa"}, + {file = "grpcio-1.74.0-cp313-cp313-win_amd64.whl", hash = "sha256:fd3c71aeee838299c5887230b8a1822795325ddfea635edd82954c1eaa831e24"}, + {file = "grpcio-1.74.0-cp39-cp39-linux_armv7l.whl", hash = "sha256:4bc5fca10aaf74779081e16c2bcc3d5ec643ffd528d9e7b1c9039000ead73bae"}, + {file = "grpcio-1.74.0-cp39-cp39-macosx_11_0_universal2.whl", hash = "sha256:6bab67d15ad617aff094c382c882e0177637da73cbc5532d52c07b4ee887a87b"}, + {file = "grpcio-1.74.0-cp39-cp39-manylinux_2_17_aarch64.whl", hash = "sha256:655726919b75ab3c34cdad39da5c530ac6fa32696fb23119e36b64adcfca174a"}, + {file = "grpcio-1.74.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:1a2b06afe2e50ebfd46247ac3ba60cac523f54ec7792ae9ba6073c12daf26f0a"}, + {file = "grpcio-1.74.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5f251c355167b2360537cf17bea2cf0197995e551ab9da6a0a59b3da5e8704f9"}, + {file = "grpcio-1.74.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:8f7b5882fb50632ab1e48cb3122d6df55b9afabc265582808036b6e51b9fd6b7"}, + {file = "grpcio-1.74.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:834988b6c34515545b3edd13e902c1acdd9f2465d386ea5143fb558f153a7176"}, + {file = "grpcio-1.74.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:22b834cef33429ca6cc28303c9c327ba9a3fafecbf62fae17e9a7b7163cc43ac"}, + {file = "grpcio-1.74.0-cp39-cp39-win32.whl", hash = "sha256:7d95d71ff35291bab3f1c52f52f474c632db26ea12700c2ff0ea0532cb0b5854"}, + {file = "grpcio-1.74.0-cp39-cp39-win_amd64.whl", hash = "sha256:ecde9ab49f58433abe02f9ed076c7b5be839cf0153883a6d23995937a82392fa"}, + {file = "grpcio-1.74.0.tar.gz", hash = "sha256:80d1f4fbb35b0742d3e3d3bb654b7381cd5f015f8497279a1e9c21ba623e01b1"}, +] + +[package.extras] +protobuf = ["grpcio-tools (>=1.74.0)"] + [[package]] name = "h11" version = "0.16.0" @@ -2400,7 +2621,7 @@ files = [ {file = "importlib_metadata-8.7.0-py3-none-any.whl", hash = "sha256:e5dd1551894c77868a30651cef00984d50e1002d06942a7101d34870c5f02afd"}, {file = "importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000"}, ] -markers = {main = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and (extra == \"all\" or extra == \"llm\" or extra == \"pytorch\" or extra == \"nlp\") and python_version == \"3.9\""} +markers = {main = "platform_system == \"Linux\" and python_version == \"3.9\" and platform_machine == \"x86_64\" and (extra == \"llm\" or extra == \"all\" or extra == \"pytorch\" or extra == \"nlp\") or extra == \"llm\""} [package.dependencies] zipp = ">=3.20" @@ -2438,6 +2659,19 @@ enabler = ["pytest-enabler (>=2.2)"] test = ["jaraco.test (>=5.4)", "pytest (>=6,!=8.1.*)", "zipp (>=3.17)"] type = ["pytest-mypy"] +[[package]] +name = "iniconfig" +version = "2.1.0" +description = "brain-dead simple config-ini parsing" +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "iniconfig-2.1.0-py3-none-any.whl", hash = "sha256:9deba5723312380e77435581c6bf4935c94cbfab9b1ed33ef8d238ea168eb760"}, + {file = "iniconfig-2.1.0.tar.gz", hash = "sha256:3abbd2e30b36733fee78f9c7f7308f2d0050e88f0087fd25c2645f63c773e1c7"}, +] + [[package]] name = "ipykernel" version = "6.30.1" @@ -3054,14 +3288,14 @@ jupyter_server = ">=1.1.2" [[package]] name = "jupyter-server" -version = "2.16.0" +version = "2.17.0" description = "The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications." optional = false python-versions = ">=3.9" groups = ["dev"] files = [ - {file = "jupyter_server-2.16.0-py3-none-any.whl", hash = "sha256:3d8db5be3bc64403b1c65b400a1d7f4647a5ce743f3b20dbdefe8ddb7b55af9e"}, - {file = "jupyter_server-2.16.0.tar.gz", hash = "sha256:65d4b44fdf2dcbbdfe0aa1ace4a842d4aaf746a2b7b168134d5aaed35621b7f6"}, + {file = "jupyter_server-2.17.0-py3-none-any.whl", hash = "sha256:e8cb9c7db4251f51ed307e329b81b72ccf2056ff82d50524debde1ee1870e13f"}, + {file = "jupyter_server-2.17.0.tar.gz", hash = "sha256:c38ea898566964c888b4772ae1ed58eca84592e88251d2cfc4d171f81f7e99d5"}, ] [package.dependencies] @@ -3074,7 +3308,7 @@ jupyter-events = ">=0.11.0" jupyter-server-terminals = ">=0.4.4" nbconvert = ">=6.4.4" nbformat = ">=5.3.0" -overrides = ">=5.0" +overrides = {version = ">=5.0", markers = "python_version < \"3.12\""} packaging = ">=22.0" prometheus-client = ">=0.9" pywinpty = {version = ">=2.0.1", markers = "os_name == \"nt\""} @@ -3572,15 +3806,15 @@ typing-extensions = ">=4.7" [[package]] name = "langchain-openai" -version = "0.3.30" +version = "0.3.31" description = "An integration package connecting OpenAI and LangChain" optional = true python-versions = ">=3.9" groups = ["main"] markers = "extra == \"all\" or extra == \"llm\"" files = [ - {file = "langchain_openai-0.3.30-py3-none-any.whl", hash = "sha256:280f1f31004393228e3f75ff8353b1aae86bbc282abc7890a05beb5f43b89923"}, - {file = "langchain_openai-0.3.30.tar.gz", hash = "sha256:90df37509b2dcf5e057f491326fcbf78cf2a71caff5103a5a7de560320171842"}, + {file = "langchain_openai-0.3.31-py3-none-any.whl", hash = "sha256:b5b2ae7d3f996f189d400d864e1884e6c368ab6b1a0c1305042761ab946c3a26"}, + {file = "langchain_openai-0.3.31.tar.gz", hash = "sha256:3a039f81f2aa64e85fd18be14f72b8f79bbb1d58efd57327918289aed6eedd3d"}, ] [package.dependencies] @@ -3642,15 +3876,15 @@ six = "*" [[package]] name = "langsmith" -version = "0.4.14" +version = "0.4.16" description = "Client library to connect to the LangSmith LLM Tracing and Evaluation Platform." optional = true python-versions = ">=3.9" groups = ["main"] markers = "extra == \"all\" or extra == \"llm\"" files = [ - {file = "langsmith-0.4.14-py3-none-any.whl", hash = "sha256:b6d070ac425196947d2a98126fb0e35f3b8c001a2e6e5b7049dd1c56f0767d0b"}, - {file = "langsmith-0.4.14.tar.gz", hash = "sha256:4d29c7a9c85b20ba813ab9c855407bccdf5eb4f397f512ffa89959b2a2cb83ed"}, + {file = "langsmith-0.4.16-py3-none-any.whl", hash = "sha256:9ba95ed09b057dfe227e882f5446e1824bfc9f2c89de542ee6f0f8d90ab953a7"}, + {file = "langsmith-0.4.16.tar.gz", hash = "sha256:a94f374c7fa0f406757f95f311e84873258563961e1af0ba8996411822cd7241"}, ] [package.dependencies] @@ -3861,7 +4095,7 @@ files = [ {file = "markdown-it-py-3.0.0.tar.gz", hash = "sha256:e3f60a94fa066dc52ec76661e37c851cb232d92f9886b15cb560aaada2df8feb"}, {file = "markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1"}, ] -markers = {main = "extra == \"pii-detection\""} +markers = {main = "extra == \"llm\" or extra == \"pii-detection\""} [package.dependencies] mdurl = ">=0.1,<1.0" @@ -4171,7 +4405,7 @@ files = [ {file = "mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8"}, {file = "mdurl-0.1.2.tar.gz", hash = "sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba"}, ] -markers = {main = "extra == \"pii-detection\""} +markers = {main = "extra == \"llm\" or extra == \"pii-detection\""} [[package]] name = "mistune" @@ -5208,16 +5442,33 @@ files = [ {file = "nvidia_nvtx_cu12-12.8.90-py3-none-win_amd64.whl", hash = "sha256:619c8304aedc69f02ea82dd244541a83c3d9d40993381b3b590f1adaed3db41e"}, ] +[[package]] +name = "ollama" +version = "0.5.3" +description = "The official Python client for Ollama." +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "ollama-0.5.3-py3-none-any.whl", hash = "sha256:a8303b413d99a9043dbf77ebf11ced672396b59bec27e6d5db67c88f01b279d2"}, + {file = "ollama-0.5.3.tar.gz", hash = "sha256:40b6dff729df3b24e56d4042fd9d37e231cee8e528677e0d085413a1d6692394"}, +] + +[package.dependencies] +httpx = ">=0.27" +pydantic = ">=2.9" + [[package]] name = "openai" -version = "1.100.2" +version = "1.101.0" description = "The official Python library for the openai API" optional = false python-versions = ">=3.8" groups = ["main"] files = [ - {file = "openai-1.100.2-py3-none-any.whl", hash = "sha256:54d3457b2c8d7303a1bc002a058de46bdd8f37a8117751c7cf4ed4438051f151"}, - {file = "openai-1.100.2.tar.gz", hash = "sha256:787b4c3c8a65895182c58c424f790c25c790cc9a0330e34f73d55b6ee5a00e32"}, + {file = "openai-1.101.0-py3-none-any.whl", hash = "sha256:6539a446cce154f8d9fb42757acdfd3ed9357ab0d34fcac11096c461da87133b"}, + {file = "openai-1.101.0.tar.gz", hash = "sha256:29f56df2236069686e64aca0e13c24a4ec310545afb25ef7da2ab1a18523f22d"}, ] [package.dependencies] @@ -5236,6 +5487,112 @@ datalib = ["numpy (>=1)", "pandas (>=1.2.3)", "pandas-stubs (>=1.1.0.11)"] realtime = ["websockets (>=13,<16)"] voice-helpers = ["numpy (>=2.0.2)", "sounddevice (>=0.5.1)"] +[[package]] +name = "opentelemetry-api" +version = "1.36.0" +description = "OpenTelemetry Python API" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "opentelemetry_api-1.36.0-py3-none-any.whl", hash = "sha256:02f20bcacf666e1333b6b1f04e647dc1d5111f86b8e510238fcc56d7762cda8c"}, + {file = "opentelemetry_api-1.36.0.tar.gz", hash = "sha256:9a72572b9c416d004d492cbc6e61962c0501eaf945ece9b5a0f56597d8348aa0"}, +] + +[package.dependencies] +importlib-metadata = ">=6.0,<8.8.0" +typing-extensions = ">=4.5.0" + +[[package]] +name = "opentelemetry-exporter-otlp-proto-common" +version = "1.36.0" +description = "OpenTelemetry Protobuf encoding" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "opentelemetry_exporter_otlp_proto_common-1.36.0-py3-none-any.whl", hash = "sha256:0fc002a6ed63eac235ada9aa7056e5492e9a71728214a61745f6ad04b923f840"}, + {file = "opentelemetry_exporter_otlp_proto_common-1.36.0.tar.gz", hash = "sha256:6c496ccbcbe26b04653cecadd92f73659b814c6e3579af157d8716e5f9f25cbf"}, +] + +[package.dependencies] +opentelemetry-proto = "1.36.0" + +[[package]] +name = "opentelemetry-exporter-otlp-proto-grpc" +version = "1.36.0" +description = "OpenTelemetry Collector Protobuf over gRPC Exporter" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl", hash = "sha256:734e841fc6a5d6f30e7be4d8053adb703c70ca80c562ae24e8083a28fadef211"}, + {file = "opentelemetry_exporter_otlp_proto_grpc-1.36.0.tar.gz", hash = "sha256:b281afbf7036b325b3588b5b6c8bb175069e3978d1bd24071f4a59d04c1e5bbf"}, +] + +[package.dependencies] +googleapis-common-protos = ">=1.57,<2.0" +grpcio = {version = ">=1.63.2,<2.0.0", markers = "python_version < \"3.13\""} +opentelemetry-api = ">=1.15,<2.0" +opentelemetry-exporter-otlp-proto-common = "1.36.0" +opentelemetry-proto = "1.36.0" +opentelemetry-sdk = ">=1.36.0,<1.37.0" +typing-extensions = ">=4.6.0" + +[[package]] +name = "opentelemetry-proto" +version = "1.36.0" +description = "OpenTelemetry Python Proto" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "opentelemetry_proto-1.36.0-py3-none-any.whl", hash = "sha256:151b3bf73a09f94afc658497cf77d45a565606f62ce0c17acb08cd9937ca206e"}, + {file = "opentelemetry_proto-1.36.0.tar.gz", hash = "sha256:0f10b3c72f74c91e0764a5ec88fd8f1c368ea5d9c64639fb455e2854ef87dd2f"}, +] + +[package.dependencies] +protobuf = ">=5.0,<7.0" + +[[package]] +name = "opentelemetry-sdk" +version = "1.36.0" +description = "OpenTelemetry Python SDK" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "opentelemetry_sdk-1.36.0-py3-none-any.whl", hash = "sha256:19fe048b42e98c5c1ffe85b569b7073576ad4ce0bcb6e9b4c6a39e890a6c45fb"}, + {file = "opentelemetry_sdk-1.36.0.tar.gz", hash = "sha256:19c8c81599f51b71670661ff7495c905d8fdf6976e41622d5245b791b06fa581"}, +] + +[package.dependencies] +opentelemetry-api = "1.36.0" +opentelemetry-semantic-conventions = "0.57b0" +typing-extensions = ">=4.5.0" + +[[package]] +name = "opentelemetry-semantic-conventions" +version = "0.57b0" +description = "OpenTelemetry Semantic Conventions" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "opentelemetry_semantic_conventions-0.57b0-py3-none-any.whl", hash = "sha256:757f7e76293294f124c827e514c2a3144f191ef175b069ce8d1211e1e38e9e78"}, + {file = "opentelemetry_semantic_conventions-0.57b0.tar.gz", hash = "sha256:609a4a79c7891b4620d64c7aac6898f872d790d75f22019913a660756f27ff32"}, +] + +[package.dependencies] +opentelemetry-api = "1.36.0" +typing-extensions = ">=4.5.0" + [[package]] name = "orjson" version = "3.11.2" @@ -5337,6 +5694,7 @@ description = "A decorator to automatically detect mismatch when overriding a me optional = false python-versions = ">=3.6" groups = ["dev"] +markers = "python_version <= \"3.11\"" files = [ {file = "overrides-7.7.0-py3-none-any.whl", hash = "sha256:c7ed9d062f78b8e4c1a7b70bd8796b35ead4d9f510227ef9c5dc7626c60d7e49"}, {file = "overrides-7.7.0.tar.gz", hash = "sha256:55158fa3d93b98cc75299b1e67078ad9003ca27945c76162c1c0766d6f91820a"}, @@ -5356,54 +5714,54 @@ files = [ [[package]] name = "pandas" -version = "2.3.1" +version = "2.3.2" description = "Powerful data structures for data analysis, time series, and statistics" optional = false python-versions = ">=3.9" groups = ["main"] files = [ - {file = "pandas-2.3.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:22c2e866f7209ebc3a8f08d75766566aae02bcc91d196935a1d9e59c7b990ac9"}, - {file = "pandas-2.3.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:3583d348546201aff730c8c47e49bc159833f971c2899d6097bce68b9112a4f1"}, - {file = "pandas-2.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0f951fbb702dacd390561e0ea45cdd8ecfa7fb56935eb3dd78e306c19104b9b0"}, - {file = "pandas-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cd05b72ec02ebfb993569b4931b2e16fbb4d6ad6ce80224a3ee838387d83a191"}, - {file = "pandas-2.3.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:1b916a627919a247d865aed068eb65eb91a344b13f5b57ab9f610b7716c92de1"}, - {file = "pandas-2.3.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:fe67dc676818c186d5a3d5425250e40f179c2a89145df477dd82945eaea89e97"}, - {file = "pandas-2.3.1-cp310-cp310-win_amd64.whl", hash = "sha256:2eb789ae0274672acbd3c575b0598d213345660120a257b47b5dafdc618aec83"}, - {file = "pandas-2.3.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:2b0540963d83431f5ce8870ea02a7430adca100cec8a050f0811f8e31035541b"}, - {file = "pandas-2.3.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:fe7317f578c6a153912bd2292f02e40c1d8f253e93c599e82620c7f69755c74f"}, - {file = "pandas-2.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e6723a27ad7b244c0c79d8e7007092d7c8f0f11305770e2f4cd778b3ad5f9f85"}, - {file = "pandas-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3462c3735fe19f2638f2c3a40bd94ec2dc5ba13abbb032dd2fa1f540a075509d"}, - {file = "pandas-2.3.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:98bcc8b5bf7afed22cc753a28bc4d9e26e078e777066bc53fac7904ddef9a678"}, - {file = "pandas-2.3.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:4d544806b485ddf29e52d75b1f559142514e60ef58a832f74fb38e48d757b299"}, - {file = "pandas-2.3.1-cp311-cp311-win_amd64.whl", hash = "sha256:b3cd4273d3cb3707b6fffd217204c52ed92859533e31dc03b7c5008aa933aaab"}, - {file = "pandas-2.3.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:689968e841136f9e542020698ee1c4fbe9caa2ed2213ae2388dc7b81721510d3"}, - {file = "pandas-2.3.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:025e92411c16cbe5bb2a4abc99732a6b132f439b8aab23a59fa593eb00704232"}, - {file = "pandas-2.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9b7ff55f31c4fcb3e316e8f7fa194566b286d6ac430afec0d461163312c5841e"}, - {file = "pandas-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7dcb79bf373a47d2a40cf7232928eb7540155abbc460925c2c96d2d30b006eb4"}, - {file = "pandas-2.3.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:56a342b231e8862c96bdb6ab97170e203ce511f4d0429589c8ede1ee8ece48b8"}, - {file = "pandas-2.3.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ca7ed14832bce68baef331f4d7f294411bed8efd032f8109d690df45e00c4679"}, - {file = "pandas-2.3.1-cp312-cp312-win_amd64.whl", hash = "sha256:ac942bfd0aca577bef61f2bc8da8147c4ef6879965ef883d8e8d5d2dc3e744b8"}, - {file = "pandas-2.3.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:9026bd4a80108fac2239294a15ef9003c4ee191a0f64b90f170b40cfb7cf2d22"}, - {file = "pandas-2.3.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:6de8547d4fdb12421e2d047a2c446c623ff4c11f47fddb6b9169eb98ffba485a"}, - {file = "pandas-2.3.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:782647ddc63c83133b2506912cc6b108140a38a37292102aaa19c81c83db2928"}, - {file = "pandas-2.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2ba6aff74075311fc88504b1db890187a3cd0f887a5b10f5525f8e2ef55bfdb9"}, - {file = "pandas-2.3.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:e5635178b387bd2ba4ac040f82bc2ef6e6b500483975c4ebacd34bec945fda12"}, - {file = "pandas-2.3.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6f3bf5ec947526106399a9e1d26d40ee2b259c66422efdf4de63c848492d91bb"}, - {file = "pandas-2.3.1-cp313-cp313-win_amd64.whl", hash = "sha256:1c78cf43c8fde236342a1cb2c34bcff89564a7bfed7e474ed2fffa6aed03a956"}, - {file = "pandas-2.3.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:8dfc17328e8da77be3cf9f47509e5637ba8f137148ed0e9b5241e1baf526e20a"}, - {file = "pandas-2.3.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ec6c851509364c59a5344458ab935e6451b31b818be467eb24b0fe89bd05b6b9"}, - {file = "pandas-2.3.1-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:911580460fc4884d9b05254b38a6bfadddfcc6aaef856fb5859e7ca202e45275"}, - {file = "pandas-2.3.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2f4d6feeba91744872a600e6edbbd5b033005b431d5ae8379abee5bcfa479fab"}, - {file = "pandas-2.3.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:fe37e757f462d31a9cd7580236a82f353f5713a80e059a29753cf938c6775d96"}, - {file = "pandas-2.3.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:5db9637dbc24b631ff3707269ae4559bce4b7fd75c1c4d7e13f40edc42df4444"}, - {file = "pandas-2.3.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:4645f770f98d656f11c69e81aeb21c6fca076a44bed3dcbb9396a4311bc7f6d8"}, - {file = "pandas-2.3.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:342e59589cc454aaff7484d75b816a433350b3d7964d7847327edda4d532a2e3"}, - {file = "pandas-2.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1d12f618d80379fde6af007f65f0c25bd3e40251dbd1636480dfffce2cf1e6da"}, - {file = "pandas-2.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dd71c47a911da120d72ef173aeac0bf5241423f9bfea57320110a978457e069e"}, - {file = "pandas-2.3.1-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:09e3b1587f0f3b0913e21e8b32c3119174551deb4a4eba4a89bc7377947977e7"}, - {file = "pandas-2.3.1-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:2323294c73ed50f612f67e2bf3ae45aea04dce5690778e08a09391897f35ff88"}, - {file = "pandas-2.3.1-cp39-cp39-win_amd64.whl", hash = "sha256:b4b0de34dc8499c2db34000ef8baad684cfa4cbd836ecee05f323ebfba348c7d"}, - {file = "pandas-2.3.1.tar.gz", hash = "sha256:0a95b9ac964fe83ce317827f80304d37388ea77616b1425f0ae41c9d2d0d7bb2"}, + {file = "pandas-2.3.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:52bc29a946304c360561974c6542d1dd628ddafa69134a7131fdfd6a5d7a1a35"}, + {file = "pandas-2.3.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:220cc5c35ffaa764dd5bb17cf42df283b5cb7fdf49e10a7b053a06c9cb48ee2b"}, + {file = "pandas-2.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:42c05e15111221384019897df20c6fe893b2f697d03c811ee67ec9e0bb5a3424"}, + {file = "pandas-2.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cc03acc273c5515ab69f898df99d9d4f12c4d70dbfc24c3acc6203751d0804cf"}, + {file = "pandas-2.3.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:d25c20a03e8870f6339bcf67281b946bd20b86f1a544ebbebb87e66a8d642cba"}, + {file = "pandas-2.3.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:21bb612d148bb5860b7eb2c10faacf1a810799245afd342cf297d7551513fbb6"}, + {file = "pandas-2.3.2-cp310-cp310-win_amd64.whl", hash = "sha256:b62d586eb25cb8cb70a5746a378fc3194cb7f11ea77170d59f889f5dfe3cec7a"}, + {file = "pandas-2.3.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1333e9c299adcbb68ee89a9bb568fc3f20f9cbb419f1dd5225071e6cddb2a743"}, + {file = "pandas-2.3.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:76972bcbd7de8e91ad5f0ca884a9f2c477a2125354af624e022c49e5bd0dfff4"}, + {file = "pandas-2.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b98bdd7c456a05eef7cd21fd6b29e3ca243591fe531c62be94a2cc987efb5ac2"}, + {file = "pandas-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1d81573b3f7db40d020983f78721e9bfc425f411e616ef019a10ebf597aedb2e"}, + {file = "pandas-2.3.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:e190b738675a73b581736cc8ec71ae113d6c3768d0bd18bffa5b9a0927b0b6ea"}, + {file = "pandas-2.3.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:c253828cb08f47488d60f43c5fc95114c771bbfff085da54bfc79cb4f9e3a372"}, + {file = "pandas-2.3.2-cp311-cp311-win_amd64.whl", hash = "sha256:9467697b8083f9667b212633ad6aa4ab32436dcbaf4cd57325debb0ddef2012f"}, + {file = "pandas-2.3.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:3fbb977f802156e7a3f829e9d1d5398f6192375a3e2d1a9ee0803e35fe70a2b9"}, + {file = "pandas-2.3.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1b9b52693123dd234b7c985c68b709b0b009f4521000d0525f2b95c22f15944b"}, + {file = "pandas-2.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0bd281310d4f412733f319a5bc552f86d62cddc5f51d2e392c8787335c994175"}, + {file = "pandas-2.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:96d31a6b4354e3b9b8a2c848af75d31da390657e3ac6f30c05c82068b9ed79b9"}, + {file = "pandas-2.3.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:df4df0b9d02bb873a106971bb85d448378ef14b86ba96f035f50bbd3688456b4"}, + {file = "pandas-2.3.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:213a5adf93d020b74327cb2c1b842884dbdd37f895f42dcc2f09d451d949f811"}, + {file = "pandas-2.3.2-cp312-cp312-win_amd64.whl", hash = "sha256:8c13b81a9347eb8c7548f53fd9a4f08d4dfe996836543f805c987bafa03317ae"}, + {file = "pandas-2.3.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0c6ecbac99a354a051ef21c5307601093cb9e0f4b1855984a084bfec9302699e"}, + {file = "pandas-2.3.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:c6f048aa0fd080d6a06cc7e7537c09b53be6642d330ac6f54a600c3ace857ee9"}, + {file = "pandas-2.3.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0064187b80a5be6f2f9c9d6bdde29372468751dfa89f4211a3c5871854cfbf7a"}, + {file = "pandas-2.3.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4ac8c320bded4718b298281339c1a50fb00a6ba78cb2a63521c39bec95b0209b"}, + {file = "pandas-2.3.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:114c2fe4f4328cf98ce5716d1532f3ab79c5919f95a9cfee81d9140064a2e4d6"}, + {file = "pandas-2.3.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:48fa91c4dfb3b2b9bfdb5c24cd3567575f4e13f9636810462ffed8925352be5a"}, + {file = "pandas-2.3.2-cp313-cp313-win_amd64.whl", hash = "sha256:12d039facec710f7ba305786837d0225a3444af7bbd9c15c32ca2d40d157ed8b"}, + {file = "pandas-2.3.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:c624b615ce97864eb588779ed4046186f967374185c047070545253a52ab2d57"}, + {file = "pandas-2.3.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:0cee69d583b9b128823d9514171cabb6861e09409af805b54459bd0c821a35c2"}, + {file = "pandas-2.3.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2319656ed81124982900b4c37f0e0c58c015af9a7bbc62342ba5ad07ace82ba9"}, + {file = "pandas-2.3.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b37205ad6f00d52f16b6d09f406434ba928c1a1966e2771006a9033c736d30d2"}, + {file = "pandas-2.3.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:837248b4fc3a9b83b9c6214699a13f069dc13510a6a6d7f9ba33145d2841a012"}, + {file = "pandas-2.3.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:d2c3554bd31b731cd6490d94a28f3abb8dd770634a9e06eb6d2911b9827db370"}, + {file = "pandas-2.3.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:88080a0ff8a55eac9c84e3ff3c7665b3b5476c6fbc484775ca1910ce1c3e0b87"}, + {file = "pandas-2.3.2-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:d4a558c7620340a0931828d8065688b3cc5b4c8eb674bcaf33d18ff4a6870b4a"}, + {file = "pandas-2.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:45178cf09d1858a1509dc73ec261bf5b25a625a389b65be2e47b559905f0ab6a"}, + {file = "pandas-2.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:77cefe00e1b210f9c76c697fedd8fdb8d3dd86563e9c8adc9fa72b90f5e9e4c2"}, + {file = "pandas-2.3.2-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:13bd629c653856f00c53dc495191baa59bcafbbf54860a46ecc50d3a88421a96"}, + {file = "pandas-2.3.2-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:36d627906fd44b5fd63c943264e11e96e923f8de77d6016dc2f667b9ad193438"}, + {file = "pandas-2.3.2-cp39-cp39-win_amd64.whl", hash = "sha256:a9d7ec92d71a420185dec44909c32e9a362248c4ae2238234b76d5be37f208cc"}, + {file = "pandas-2.3.2.tar.gz", hash = "sha256:ab7b58f8f82706890924ccdfb5f48002b83d2b5a3845976a9fb705d36c34dcdb"}, ] [package.dependencies] @@ -5766,6 +6124,23 @@ dev-optional = ["anywidget", "colorcet", "fiona (<=1.9.6) ; python_version <= \" express = ["numpy"] kaleido = ["kaleido (>=1.0.0)"] +[[package]] +name = "pluggy" +version = "1.6.0" +description = "plugin and hook calling mechanisms for python" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746"}, + {file = "pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3"}, +] + +[package.extras] +dev = ["pre-commit", "tox"] +testing = ["coverage", "pytest", "pytest-benchmark"] + [[package]] name = "polars" version = "1.32.3" @@ -5810,6 +6185,53 @@ timezone = ["tzdata ; platform_system == \"Windows\""] xlsx2csv = ["xlsx2csv (>=0.8.0)"] xlsxwriter = ["xlsxwriter"] +[[package]] +name = "portalocker" +version = "3.2.0" +description = "Wraps the portalocker recipe for easy usage" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "portalocker-3.2.0-py3-none-any.whl", hash = "sha256:3cdc5f565312224bc570c49337bd21428bba0ef363bbcf58b9ef4a9f11779968"}, + {file = "portalocker-3.2.0.tar.gz", hash = "sha256:1f3002956a54a8c3730586c5c77bf18fae4149e07eaf1c29fc3faf4d5a3f89ac"}, +] + +[package.dependencies] +pywin32 = {version = ">=226", markers = "platform_system == \"Windows\""} + +[package.extras] +docs = ["portalocker[tests]"] +redis = ["redis"] +tests = ["coverage-conditional-plugin (>=0.9.0)", "portalocker[redis]", "pytest (>=5.4.1)", "pytest-cov (>=2.8.1)", "pytest-mypy (>=0.8.0)", "pytest-rerunfailures (>=15.0)", "pytest-timeout (>=2.1.0)", "sphinx (>=6.0.0)", "types-pywin32 (>=310.0.0.20250429)", "types-redis"] + +[[package]] +name = "posthog" +version = "6.6.1" +description = "Integrate PostHog into any python application." +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "posthog-6.6.1-py3-none-any.whl", hash = "sha256:cba48af9af1df2a611d08fd10a2014dbee99433118973b8c51881d9ef1aa6667"}, + {file = "posthog-6.6.1.tar.gz", hash = "sha256:87dfc67d48a50eed737b77d6dd306c340f0da2f32101533e8e17b2f22ad572e0"}, +] + +[package.dependencies] +backoff = ">=1.10.0" +distro = ">=1.5.0" +python-dateutil = ">=2.2" +requests = ">=2.7,<3.0" +six = ">=1.5" +typing-extensions = ">=4.2.0" + +[package.extras] +dev = ["django-stubs", "lxml", "mypy", "mypy-baseline", "packaging", "pre-commit", "pydantic", "ruff", "setuptools", "tomli", "tomli_w", "twine", "types-mock", "types-python-dateutil", "types-requests", "types-setuptools", "types-six", "wheel"] +langchain = ["langchain (>=0.2.0)"] +test = ["anthropic", "coverage", "django", "freezegun (==1.5.1)", "google-genai", "langchain-anthropic (>=0.3.15)", "langchain-community (>=0.3.25)", "langchain-core (>=0.3.65)", "langchain-openai (>=0.3.22)", "langgraph (>=0.4.8)", "mock (>=2.0.0)", "openai", "parameterized (>=0.8.1)", "pydantic", "pytest", "pytest-asyncio", "pytest-timeout"] + [[package]] name = "pre-commit" version = "3.8.0" @@ -5924,23 +6346,6 @@ cryptography = "<44.1" [package.extras] server = ["flask (>=1.1)", "gunicorn"] -[[package]] -name = "presidio-structured" -version = "0.0.4a0" -description = "Presidio structured package - analyzes and anonymizes structured and semi-structured data." -optional = true -python-versions = "<4.0,>=3.9" -groups = ["main"] -markers = "python_version < \"3.11\" and extra == \"pii-detection\"" -files = [ - {file = "presidio_structured-0.0.4a0-py3-none-any.whl", hash = "sha256:7cc63b48038a177684cb9512d481571814c04331a0f4ddeb09299cc76803258b"}, -] - -[package.dependencies] -pandas = ">=1.5.2" -presidio-analyzer = ">=2.2" -presidio-anonymizer = ">=2.2" - [[package]] name = "presidio-structured" version = "0.0.6" @@ -5948,7 +6353,7 @@ description = "Presidio structured package - analyzes and anonymizes structured optional = true python-versions = "<4.0,>=3.9" groups = ["main"] -markers = "python_version >= \"3.11\" and extra == \"pii-detection\"" +markers = "extra == \"pii-detection\"" files = [ {file = "presidio_structured-0.0.6-py3-none-any.whl", hash = "sha256:f3454c86857a00db9828e684895da43411bcc7d750cac0a52e15d68f6c6455a1"}, ] @@ -5957,6 +6362,7 @@ files = [ pandas = ">=1.5.2" presidio-analyzer = ">=2.2" presidio-anonymizer = ">=2.2" +spacy = {version = "<3.8.4", markers = "python_version < \"3.10\""} [[package]] name = "prometheus-client" @@ -6097,6 +6503,26 @@ files = [ ] markers = {dev = "python_version == \"3.12\""} +[[package]] +name = "protobuf" +version = "6.32.0" +description = "" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "protobuf-6.32.0-cp310-abi3-win32.whl", hash = "sha256:84f9e3c1ff6fb0308dbacb0950d8aa90694b0d0ee68e75719cb044b7078fe741"}, + {file = "protobuf-6.32.0-cp310-abi3-win_amd64.whl", hash = "sha256:a8bdbb2f009cfc22a36d031f22a625a38b615b5e19e558a7b756b3279723e68e"}, + {file = "protobuf-6.32.0-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d52691e5bee6c860fff9a1c86ad26a13afbeb4b168cd4445c922b7e2cf85aaf0"}, + {file = "protobuf-6.32.0-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:501fe6372fd1c8ea2a30b4d9be8f87955a64d6be9c88a973996cef5ef6f0abf1"}, + {file = "protobuf-6.32.0-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:75a2aab2bd1aeb1f5dc7c5f33bcb11d82ea8c055c9becbb41c26a8c43fd7092c"}, + {file = "protobuf-6.32.0-cp39-cp39-win32.whl", hash = "sha256:7db8ed09024f115ac877a1427557b838705359f047b2ff2f2b2364892d19dacb"}, + {file = "protobuf-6.32.0-cp39-cp39-win_amd64.whl", hash = "sha256:15eba1b86f193a407607112ceb9ea0ba9569aed24f93333fe9a497cf2fda37d3"}, + {file = "protobuf-6.32.0-py3-none-any.whl", hash = "sha256:ba377e5b67b908c8f3072a57b63e2c6a4cbd18aea4ed98d2584350dbf46f2783"}, + {file = "protobuf-6.32.0.tar.gz", hash = "sha256:a81439049127067fc49ec1d36e25c6ee1d1a2b7be930675f919258d03c04e7d2"}, +] + [[package]] name = "psutil" version = "7.0.0" @@ -6247,6 +6673,35 @@ files = [ [package.extras] test = ["cffi", "hypothesis", "pandas", "pytest", "pytz"] +[[package]] +name = "pyasn1" +version = "0.6.1" +description = "Pure-Python implementation of ASN.1 types and DER/BER/CER codecs (X.208)" +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pyasn1-0.6.1-py3-none-any.whl", hash = "sha256:0d632f46f2ba09143da3a8afe9e33fb6f92fa2320ab7e886e2d0f7672af84629"}, + {file = "pyasn1-0.6.1.tar.gz", hash = "sha256:6f580d2bdd84365380830acf45550f2511469f673cb4a5ae3857a3170128b034"}, +] + +[[package]] +name = "pyasn1-modules" +version = "0.4.2" +description = "A collection of ASN.1-based protocols modules" +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pyasn1_modules-0.4.2-py3-none-any.whl", hash = "sha256:29253a9207ce32b64c3ac6600edc75368f98473906e8fd1043bd6b5b1de2c14a"}, + {file = "pyasn1_modules-0.4.2.tar.gz", hash = "sha256:677091de870a80aae844b1ca6134f54652fa2c8c5a52aa396440ac3106e941e6"}, +] + +[package.dependencies] +pyasn1 = ">=0.6.1,<0.7.0" + [[package]] name = "pycares" version = "4.10.0" @@ -6599,6 +7054,19 @@ typing-extensions = ">3.10,<4.6.0 || >4.6.0" [package.extras] dev = ["build", "coverage", "furo", "invoke", "mypy", "pytest", "pytest-cov", "pytest-mypy-testing", "ruff", "sphinx", "sphinx-autodoc-typehints", "tox", "twine", "wheel"] +[[package]] +name = "pyfiglet" +version = "1.0.4" +description = "Pure-python FIGlet implementation" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pyfiglet-1.0.4-py3-none-any.whl", hash = "sha256:65b57b7a8e1dff8a67dc8e940a117238661d5e14c3e49121032bd404d9b2b39f"}, + {file = "pyfiglet-1.0.4.tar.gz", hash = "sha256:db9c9940ed1bf3048deff534ed52ff2dafbbc2cd7610b17bb5eca1df6d4278ef"}, +] + [[package]] name = "pyflakes" version = "3.4.0" @@ -6653,6 +7121,108 @@ files = [ {file = "pysbd-0.3.4-py3-none-any.whl", hash = "sha256:cd838939b7b0b185fcf86b0baf6636667dfb6e474743beeff878e9f42e022953"}, ] +[[package]] +name = "pytest" +version = "8.4.1" +description = "pytest: simple powerful testing with Python" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pytest-8.4.1-py3-none-any.whl", hash = "sha256:539c70ba6fcead8e78eebbf1115e8b589e7565830d7d006a8723f19ac8a0afb7"}, + {file = "pytest-8.4.1.tar.gz", hash = "sha256:7c67fd69174877359ed9371ec3af8a3d2b04741818c51e5e99cc1742251fa93c"}, +] + +[package.dependencies] +colorama = {version = ">=0.4", markers = "sys_platform == \"win32\""} +exceptiongroup = {version = ">=1", markers = "python_version < \"3.11\""} +iniconfig = ">=1" +packaging = ">=20" +pluggy = ">=1.5,<2" +pygments = ">=2.7.2" +tomli = {version = ">=1", markers = "python_version < \"3.11\""} + +[package.extras] +dev = ["argcomplete", "attrs (>=19.2)", "hypothesis (>=3.56)", "mock", "requests", "setuptools", "xmlschema"] + +[[package]] +name = "pytest-asyncio" +version = "1.1.0" +description = "Pytest support for asyncio" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pytest_asyncio-1.1.0-py3-none-any.whl", hash = "sha256:5fe2d69607b0bd75c656d1211f969cadba035030156745ee09e7d71740e58ecf"}, + {file = "pytest_asyncio-1.1.0.tar.gz", hash = "sha256:796aa822981e01b68c12e4827b8697108f7205020f24b5793b3c41555dab68ea"}, +] + +[package.dependencies] +backports-asyncio-runner = {version = ">=1.1,<2", markers = "python_version < \"3.11\""} +pytest = ">=8.2,<9" +typing-extensions = {version = ">=4.12", markers = "python_version < \"3.10\""} + +[package.extras] +docs = ["sphinx (>=5.3)", "sphinx-rtd-theme (>=1)"] +testing = ["coverage (>=6.2)", "hypothesis (>=5.7.1)"] + +[[package]] +name = "pytest-repeat" +version = "0.9.4" +description = "pytest plugin for repeating tests" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pytest_repeat-0.9.4-py3-none-any.whl", hash = "sha256:c1738b4e412a6f3b3b9e0b8b29fcd7a423e50f87381ad9307ef6f5a8601139f3"}, + {file = "pytest_repeat-0.9.4.tar.gz", hash = "sha256:d92ac14dfaa6ffcfe6917e5d16f0c9bc82380c135b03c2a5f412d2637f224485"}, +] + +[package.dependencies] +pytest = "*" + +[[package]] +name = "pytest-rerunfailures" +version = "12.0" +description = "pytest plugin to re-run tests to eliminate flaky failures" +optional = true +python-versions = ">=3.7" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pytest-rerunfailures-12.0.tar.gz", hash = "sha256:784f462fa87fe9bdf781d0027d856b47a4bfe6c12af108f6bd887057a917b48e"}, + {file = "pytest_rerunfailures-12.0-py3-none-any.whl", hash = "sha256:9a1afd04e21b8177faf08a9bbbf44de7a0fe3fc29f8ddbe83b9684bd5f8f92a9"}, +] + +[package.dependencies] +packaging = ">=17.1" +pytest = ">=6.2" + +[[package]] +name = "pytest-xdist" +version = "3.8.0" +description = "pytest xdist plugin for distributed testing, most importantly across multiple CPUs" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "pytest_xdist-3.8.0-py3-none-any.whl", hash = "sha256:202ca578cfeb7370784a8c33d6d05bc6e13b4f25b5053c30a152269fd10f0b88"}, + {file = "pytest_xdist-3.8.0.tar.gz", hash = "sha256:7e578125ec9bc6050861aa93f2d59f1d8d085595d6551c2c90b6f4fad8d3a9f1"}, +] + +[package.dependencies] +execnet = ">=2.1" +pytest = ">=7.0.0" + +[package.extras] +psutil = ["psutil (>=3.0)"] +setproctitle = ["setproctitle"] +testing = ["filelock"] + [[package]] name = "python-dateutil" version = "2.9.0.post0" @@ -6719,8 +7289,7 @@ version = "311" description = "Python for Window Extensions" optional = false python-versions = "*" -groups = ["dev"] -markers = "sys_platform == \"win32\" and platform_python_implementation != \"PyPy\"" +groups = ["main", "dev"] files = [ {file = "pywin32-311-cp310-cp310-win32.whl", hash = "sha256:d03ff496d2a0cd4a5893504789d4a15399133fe82517455e78bad62efbb7f0a3"}, {file = "pywin32-311-cp310-cp310-win_amd64.whl", hash = "sha256:797c2772017851984b97180b0bebe4b620bb86328e8a884bb626156295a63b3b"}, @@ -6743,6 +7312,7 @@ files = [ {file = "pywin32-311-cp39-cp39-win_amd64.whl", hash = "sha256:e0c4cfb0621281fe40387df582097fd796e80430597cb9944f0ae70447bacd91"}, {file = "pywin32-311-cp39-cp39-win_arm64.whl", hash = "sha256:62ea666235135fee79bb154e695f3ff67370afefd71bd7fea7512fc70ef31e3d"}, ] +markers = {main = "extra == \"llm\" and platform_system == \"Windows\"", dev = "sys_platform == \"win32\" and platform_python_implementation != \"PyPy\""} [[package]] name = "pywin32-ctypes" @@ -6841,104 +7411,104 @@ markers = {main = "extra == \"all\" or extra == \"huggingface\" or extra == \"ll [[package]] name = "pyzmq" -version = "27.0.1" +version = "27.0.2" description = "Python bindings for 0MQ" optional = false python-versions = ">=3.8" groups = ["dev"] files = [ - {file = "pyzmq-27.0.1-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:90a4da42aa322de8a3522461e3b5fe999935763b27f69a02fced40f4e3cf9682"}, - {file = "pyzmq-27.0.1-cp310-cp310-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:e648dca28178fc879c814cf285048dd22fd1f03e1104101106505ec0eea50a4d"}, - {file = "pyzmq-27.0.1-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4bca8abc31799a6f3652d13f47e0b0e1cab76f9125f2283d085a3754f669b607"}, - {file = "pyzmq-27.0.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:092f4011b26d6b0201002f439bd74b38f23f3aefcb358621bdc3b230afc9b2d5"}, - {file = "pyzmq-27.0.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:6f02f30a4a6b3efe665ab13a3dd47109d80326c8fd286311d1ba9f397dc5f247"}, - {file = "pyzmq-27.0.1-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:f293a1419266e3bf3557d1f8778f9e1ffe7e6b2c8df5c9dca191caf60831eb74"}, - {file = "pyzmq-27.0.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:ce181dd1a7c6c012d0efa8ab603c34b5ee9d86e570c03415bbb1b8772eeb381c"}, - {file = "pyzmq-27.0.1-cp310-cp310-win32.whl", hash = "sha256:f65741cc06630652e82aa68ddef4986a3ab9073dd46d59f94ce5f005fa72037c"}, - {file = "pyzmq-27.0.1-cp310-cp310-win_amd64.whl", hash = "sha256:44909aa3ed2234d69fe81e1dade7be336bcfeab106e16bdaa3318dcde4262b93"}, - {file = "pyzmq-27.0.1-cp310-cp310-win_arm64.whl", hash = "sha256:4401649bfa0a38f0f8777f8faba7cd7eb7b5b8ae2abc7542b830dd09ad4aed0d"}, - {file = "pyzmq-27.0.1-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:9729190bd770314f5fbba42476abf6abe79a746eeda11d1d68fd56dd70e5c296"}, - {file = "pyzmq-27.0.1-cp311-cp311-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:696900ef6bc20bef6a242973943574f96c3f97d2183c1bd3da5eea4f559631b1"}, - {file = "pyzmq-27.0.1-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f96a63aecec22d3f7fdea3c6c98df9e42973f5856bb6812c3d8d78c262fee808"}, - {file = "pyzmq-27.0.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c512824360ea7490390566ce00bee880e19b526b312b25cc0bc30a0fe95cb67f"}, - {file = "pyzmq-27.0.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:dfb2bb5e0f7198eaacfb6796fb0330afd28f36d985a770745fba554a5903595a"}, - {file = "pyzmq-27.0.1-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:4f6886c59ba93ffde09b957d3e857e7950c8fe818bd5494d9b4287bc6d5bc7f1"}, - {file = "pyzmq-27.0.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b99ea9d330e86ce1ff7f2456b33f1bf81c43862a5590faf4ef4ed3a63504bdab"}, - {file = "pyzmq-27.0.1-cp311-cp311-win32.whl", hash = "sha256:571f762aed89025ba8cdcbe355fea56889715ec06d0264fd8b6a3f3fa38154ed"}, - {file = "pyzmq-27.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:ee16906c8025fa464bea1e48128c048d02359fb40bebe5333103228528506530"}, - {file = "pyzmq-27.0.1-cp311-cp311-win_arm64.whl", hash = "sha256:ba068f28028849da725ff9185c24f832ccf9207a40f9b28ac46ab7c04994bd41"}, - {file = "pyzmq-27.0.1-cp312-abi3-macosx_10_15_universal2.whl", hash = "sha256:af7ebce2a1e7caf30c0bb64a845f63a69e76a2fadbc1cac47178f7bb6e657bdd"}, - {file = "pyzmq-27.0.1-cp312-abi3-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:8f617f60a8b609a13099b313e7e525e67f84ef4524b6acad396d9ff153f6e4cd"}, - {file = "pyzmq-27.0.1-cp312-abi3-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1d59dad4173dc2a111f03e59315c7bd6e73da1a9d20a84a25cf08325b0582b1a"}, - {file = "pyzmq-27.0.1-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f5b6133c8d313bde8bd0d123c169d22525300ff164c2189f849de495e1344577"}, - {file = "pyzmq-27.0.1-cp312-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:58cca552567423f04d06a075f4b473e78ab5bdb906febe56bf4797633f54aa4e"}, - {file = "pyzmq-27.0.1-cp312-abi3-musllinux_1_2_i686.whl", hash = "sha256:4b9d8e26fb600d0d69cc9933e20af08552e97cc868a183d38a5c0d661e40dfbb"}, - {file = "pyzmq-27.0.1-cp312-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:2329f0c87f0466dce45bba32b63f47018dda5ca40a0085cc5c8558fea7d9fc55"}, - {file = "pyzmq-27.0.1-cp312-abi3-win32.whl", hash = "sha256:57bb92abdb48467b89c2d21da1ab01a07d0745e536d62afd2e30d5acbd0092eb"}, - {file = "pyzmq-27.0.1-cp312-abi3-win_amd64.whl", hash = "sha256:ff3f8757570e45da7a5bedaa140489846510014f7a9d5ee9301c61f3f1b8a686"}, - {file = "pyzmq-27.0.1-cp312-abi3-win_arm64.whl", hash = "sha256:df2c55c958d3766bdb3e9d858b911288acec09a9aab15883f384fc7180df5bed"}, - {file = "pyzmq-27.0.1-cp313-cp313-android_24_arm64_v8a.whl", hash = "sha256:497bd8af534ae55dc4ef67eebd1c149ff2a0b0f1e146db73c8b5a53d83c1a5f5"}, - {file = "pyzmq-27.0.1-cp313-cp313-android_24_x86_64.whl", hash = "sha256:a066ea6ad6218b4c233906adf0ae67830f451ed238419c0db609310dd781fbe7"}, - {file = "pyzmq-27.0.1-cp313-cp313t-macosx_10_15_universal2.whl", hash = "sha256:72d235d6365ca73d8ce92f7425065d70f5c1e19baa458eb3f0d570e425b73a96"}, - {file = "pyzmq-27.0.1-cp313-cp313t-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:313a7b374e3dc64848644ca348a51004b41726f768b02e17e689f1322366a4d9"}, - {file = "pyzmq-27.0.1-cp313-cp313t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:119ce8590409702394f959c159d048002cbed2f3c0645ec9d6a88087fc70f0f1"}, - {file = "pyzmq-27.0.1-cp313-cp313t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:45c3e00ce16896ace2cd770ab9057a7cf97d4613ea5f2a13f815141d8b6894b9"}, - {file = "pyzmq-27.0.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:678e50ec112bdc6df5a83ac259a55a4ba97a8b314c325ab26b3b5b071151bc61"}, - {file = "pyzmq-27.0.1-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:d0b96c30be9f9387b18b18b6133c75a7b1b0065da64e150fe1feb5ebf31ece1c"}, - {file = "pyzmq-27.0.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:88dc92d9eb5ea4968123e74db146d770b0c8d48f0e2bfb1dbc6c50a8edb12d64"}, - {file = "pyzmq-27.0.1-cp313-cp313t-win32.whl", hash = "sha256:6dcbcb34f5c9b0cefdfc71ff745459241b7d3cda5b27c7ad69d45afc0821d1e1"}, - {file = "pyzmq-27.0.1-cp313-cp313t-win_amd64.whl", hash = "sha256:b9fd0fda730461f510cfd9a40fafa5355d65f5e3dbdd8d6dfa342b5b3f5d1949"}, - {file = "pyzmq-27.0.1-cp313-cp313t-win_arm64.whl", hash = "sha256:56a3b1853f3954ec1f0e91085f1350cc57d18f11205e4ab6e83e4b7c414120e0"}, - {file = "pyzmq-27.0.1-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:f98f6b7787bd2beb1f0dde03f23a0621a0c978edf673b7d8f5e7bc039cbe1b60"}, - {file = "pyzmq-27.0.1-cp314-cp314t-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:351bf5d8ca0788ca85327fda45843b6927593ff4c807faee368cc5aaf9f809c2"}, - {file = "pyzmq-27.0.1-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5268a5a9177afff53dc6d70dffe63114ba2a6e7b20d9411cc3adeba09eeda403"}, - {file = "pyzmq-27.0.1-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a4aca06ba295aa78bec9b33ec028d1ca08744c36294338c41432b7171060c808"}, - {file = "pyzmq-27.0.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:1c363c6dc66352331d5ad64bb838765c6692766334a6a02fdb05e76bd408ae18"}, - {file = "pyzmq-27.0.1-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:87aebf4acd7249bdff8d3df03aed4f09e67078e6762cfe0aecf8d0748ff94cde"}, - {file = "pyzmq-27.0.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:e4f22d67756518d71901edf73b38dc0eb4765cce22c8fe122cc81748d425262b"}, - {file = "pyzmq-27.0.1-cp314-cp314t-win32.whl", hash = "sha256:8c62297bc7aea2147b472ca5ca2b4389377ad82898c87cabab2a94aedd75e337"}, - {file = "pyzmq-27.0.1-cp314-cp314t-win_amd64.whl", hash = "sha256:bee5248d5ec9223545f8cc4f368c2d571477ae828c99409125c3911511d98245"}, - {file = "pyzmq-27.0.1-cp314-cp314t-win_arm64.whl", hash = "sha256:0fc24bf45e4a454e55ef99d7f5c8b8712539200ce98533af25a5bfa954b6b390"}, - {file = "pyzmq-27.0.1-cp38-cp38-macosx_10_15_universal2.whl", hash = "sha256:9d16fdfd7d70a6b0ca45d36eb19f7702fa77ef6256652f17594fc9ce534c9da6"}, - {file = "pyzmq-27.0.1-cp38-cp38-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:d0356a21e58c3e99248930ff73cc05b1d302ff50f41a8a47371aefb04327378a"}, - {file = "pyzmq-27.0.1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:a27fa11ebaccc099cac4309c799aa33919671a7660e29b3e465b7893bc64ec81"}, - {file = "pyzmq-27.0.1-cp38-cp38-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b25e72e115399a4441aad322258fa8267b873850dc7c276e3f874042728c2b45"}, - {file = "pyzmq-27.0.1-cp38-cp38-musllinux_1_2_aarch64.whl", hash = "sha256:f8c3b74f1cd577a5a9253eae7ed363f88cbb345a990ca3027e9038301d47c7f4"}, - {file = "pyzmq-27.0.1-cp38-cp38-musllinux_1_2_i686.whl", hash = "sha256:19dce6c93656f9c469540350d29b128cd8ba55b80b332b431b9a1e9ff74cfd01"}, - {file = "pyzmq-27.0.1-cp38-cp38-musllinux_1_2_x86_64.whl", hash = "sha256:da81512b83032ed6cdf85ca62e020b4c23dda87f1b6c26b932131222ccfdbd27"}, - {file = "pyzmq-27.0.1-cp38-cp38-win32.whl", hash = "sha256:7418fb5736d0d39b3ecc6bec4ff549777988feb260f5381636d8bd321b653038"}, - {file = "pyzmq-27.0.1-cp38-cp38-win_amd64.whl", hash = "sha256:af2ee67b3688b067e20fea3fe36b823a362609a1966e7e7a21883ae6da248804"}, - {file = "pyzmq-27.0.1-cp39-cp39-macosx_10_15_universal2.whl", hash = "sha256:05a94233fdde585eb70924a6e4929202a747eea6ed308a6171c4f1c715bbe39e"}, - {file = "pyzmq-27.0.1-cp39-cp39-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:c96702e1082eab62ae583d64c4e19c9b848359196697e536a0c57ae9bd165bd5"}, - {file = "pyzmq-27.0.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:c9180d1f5b4b73e28b64e63cc6c4c097690f102aa14935a62d5dd7426a4e5b5a"}, - {file = "pyzmq-27.0.1-cp39-cp39-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e971d8680003d0af6020713e52f92109b46fedb463916e988814e04c8133578a"}, - {file = "pyzmq-27.0.1-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:fe632fa4501154d58dfbe1764a0495734d55f84eaf1feda4549a1f1ca76659e9"}, - {file = "pyzmq-27.0.1-cp39-cp39-musllinux_1_2_i686.whl", hash = "sha256:4c3874344fd5fa6d58bb51919708048ac4cab21099f40a227173cddb76b4c20b"}, - {file = "pyzmq-27.0.1-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:0ec09073ed67ae236785d543df3b322282acc0bdf6d1b748c3e81f3043b21cb5"}, - {file = "pyzmq-27.0.1-cp39-cp39-win32.whl", hash = "sha256:f44e7ea288d022d4bf93b9e79dafcb4a7aea45a3cbeae2116792904931cefccf"}, - {file = "pyzmq-27.0.1-cp39-cp39-win_amd64.whl", hash = "sha256:ffe6b809a97ac6dea524b3b837d5b28743d8c2f121141056d168ff0ba8f614ef"}, - {file = "pyzmq-27.0.1-cp39-cp39-win_arm64.whl", hash = "sha256:fde26267416c8478c95432c81489b53f57b0b5d24cd5c8bfaebf5bbaac4dc90c"}, - {file = "pyzmq-27.0.1-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:544b995a6a1976fad5d7ff01409b4588f7608ccc41be72147700af91fd44875d"}, - {file = "pyzmq-27.0.1-pp310-pypy310_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:0f772eea55cccce7f45d6ecdd1d5049c12a77ec22404f6b892fae687faa87bee"}, - {file = "pyzmq-27.0.1-pp310-pypy310_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c9d63d66059114a6756d09169c9209ffceabacb65b9cb0f66e6fc344b20b73e6"}, - {file = "pyzmq-27.0.1-pp310-pypy310_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1da8e645c655d86f0305fb4c65a0d848f461cd90ee07d21f254667287b5dbe50"}, - {file = "pyzmq-27.0.1-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:1843fd0daebcf843fe6d4da53b8bdd3fc906ad3e97d25f51c3fed44436d82a49"}, - {file = "pyzmq-27.0.1-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:7fb0ee35845bef1e8c4a152d766242164e138c239e3182f558ae15cb4a891f94"}, - {file = "pyzmq-27.0.1-pp311-pypy311_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:f379f11e138dfd56c3f24a04164f871a08281194dd9ddf656a278d7d080c8ad0"}, - {file = "pyzmq-27.0.1-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b978c0678cffbe8860ec9edc91200e895c29ae1ac8a7085f947f8e8864c489fb"}, - {file = "pyzmq-27.0.1-pp311-pypy311_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7ebccf0d760bc92a4a7c751aeb2fef6626144aace76ee8f5a63abeb100cae87f"}, - {file = "pyzmq-27.0.1-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:77fed80e30fa65708546c4119840a46691290efc231f6bfb2ac2a39b52e15811"}, - {file = "pyzmq-27.0.1-pp38-pypy38_pp73-macosx_10_15_x86_64.whl", hash = "sha256:9d7b6b90da7285642f480b48c9efd1d25302fd628237d8f6f6ee39ba6b2d2d34"}, - {file = "pyzmq-27.0.1-pp38-pypy38_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:d2976b7079f09f48d59dc123293ed6282fca6ef96a270f4ea0364e4e54c8e855"}, - {file = "pyzmq-27.0.1-pp38-pypy38_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:2852f67371918705cc18b321695f75c5d653d5d8c4a9b946c1eec4dab2bd6fdf"}, - {file = "pyzmq-27.0.1-pp38-pypy38_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:be45a895f98877271e8a0b6cf40925e0369121ce423421c20fa6d7958dc753c2"}, - {file = "pyzmq-27.0.1-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:64ca3c7c614aefcdd5e358ecdd41d1237c35fe1417d01ec0160e7cdb0a380edc"}, - {file = "pyzmq-27.0.1-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:d97b59cbd8a6c8b23524a8ce237ff9504d987dc07156258aa68ae06d2dd5f34d"}, - {file = "pyzmq-27.0.1-pp39-pypy39_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:27a78bdd384dbbe7b357af95f72efe8c494306b5ec0a03c31e2d53d6763e5307"}, - {file = "pyzmq-27.0.1-pp39-pypy39_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:b007e5dcba684e888fbc90554cb12a2f4e492927c8c2761a80b7590209821743"}, - {file = "pyzmq-27.0.1-pp39-pypy39_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:95594b2ceeaa94934e3e94dd7bf5f3c3659cf1a26b1fb3edcf6e42dad7e0eaf2"}, - {file = "pyzmq-27.0.1-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:70b719a130b81dd130a57ac0ff636dc2c0127c5b35ca5467d1b67057e3c7a4d2"}, - {file = "pyzmq-27.0.1.tar.gz", hash = "sha256:45c549204bc20e7484ffd2555f6cf02e572440ecf2f3bdd60d4404b20fddf64b"}, + {file = "pyzmq-27.0.2-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:8b32c4636ced87dce0ac3d671e578b3400215efab372f1b4be242e8cf0b11384"}, + {file = "pyzmq-27.0.2-cp310-cp310-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:f9528a4b3e24189cb333a9850fddbbafaa81df187297cfbddee50447cdb042cf"}, + {file = "pyzmq-27.0.2-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3b02ba0c0b2b9ebe74688002e6c56c903429924a25630804b9ede1f178aa5a3f"}, + {file = "pyzmq-27.0.2-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9e4dc5c9a6167617251dea0d024d67559795761aabb4b7ea015518be898be076"}, + {file = "pyzmq-27.0.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:f1151b33aaf3b4fa9da26f4d696e38eebab67d1b43c446184d733c700b3ff8ce"}, + {file = "pyzmq-27.0.2-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:4ecfc7999ac44c9ef92b5ae8f0b44fb935297977df54d8756b195a3cd12f38f0"}, + {file = "pyzmq-27.0.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:31c26a5d0b00befcaeeb600d8b15ad09f5604b6f44e2057ec5e521a9e18dcd9a"}, + {file = "pyzmq-27.0.2-cp310-cp310-win32.whl", hash = "sha256:25a100d2de2ac0c644ecf4ce0b509a720d12e559c77aff7e7e73aa684f0375bc"}, + {file = "pyzmq-27.0.2-cp310-cp310-win_amd64.whl", hash = "sha256:a1acf091f53bb406e9e5e7383e467d1dd1b94488b8415b890917d30111a1fef3"}, + {file = "pyzmq-27.0.2-cp310-cp310-win_arm64.whl", hash = "sha256:b38e01f11e9e95f6668dc8a62dccf9483f454fed78a77447507a0e8dcbd19a63"}, + {file = "pyzmq-27.0.2-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:063845960df76599ad4fad69fa4d884b3ba38304272104fdcd7e3af33faeeb1d"}, + {file = "pyzmq-27.0.2-cp311-cp311-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:845a35fb21b88786aeb38af8b271d41ab0967985410f35411a27eebdc578a076"}, + {file = "pyzmq-27.0.2-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:515d20b5c3c86db95503faa989853a8ab692aab1e5336db011cd6d35626c4cb1"}, + {file = "pyzmq-27.0.2-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:862aedec0b0684a5050cdb5ec13c2da96d2f8dffda48657ed35e312a4e31553b"}, + {file = "pyzmq-27.0.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:2cb5bcfc51c7a4fce335d3bc974fd1d6a916abbcdd2b25f6e89d37b8def25f57"}, + {file = "pyzmq-27.0.2-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:38ff75b2a36e3a032e9fef29a5871e3e1301a37464e09ba364e3c3193f62982a"}, + {file = "pyzmq-27.0.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:7a5709abe8d23ca158a9d0a18c037f4193f5b6afeb53be37173a41e9fb885792"}, + {file = "pyzmq-27.0.2-cp311-cp311-win32.whl", hash = "sha256:47c5dda2018c35d87be9b83de0890cb92ac0791fd59498847fc4eca6ff56671d"}, + {file = "pyzmq-27.0.2-cp311-cp311-win_amd64.whl", hash = "sha256:f54ca3e98f8f4d23e989c7d0edcf9da7a514ff261edaf64d1d8653dd5feb0a8b"}, + {file = "pyzmq-27.0.2-cp311-cp311-win_arm64.whl", hash = "sha256:2ef3067cb5b51b090fb853f423ad7ed63836ec154374282780a62eb866bf5768"}, + {file = "pyzmq-27.0.2-cp312-abi3-macosx_10_15_universal2.whl", hash = "sha256:5da05e3c22c95e23bfc4afeee6ff7d4be9ff2233ad6cb171a0e8257cd46b169a"}, + {file = "pyzmq-27.0.2-cp312-abi3-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:4e4520577971d01d47e2559bb3175fce1be9103b18621bf0b241abe0a933d040"}, + {file = "pyzmq-27.0.2-cp312-abi3-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:56d7de7bf73165b90bd25a8668659ccb134dd28449116bf3c7e9bab5cf8a8ec9"}, + {file = "pyzmq-27.0.2-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:340e7cddc32f147c6c00d116a3f284ab07ee63dbd26c52be13b590520434533c"}, + {file = "pyzmq-27.0.2-cp312-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:ba95693f9df8bb4a9826464fb0fe89033936f35fd4a8ff1edff09a473570afa0"}, + {file = "pyzmq-27.0.2-cp312-abi3-musllinux_1_2_i686.whl", hash = "sha256:ca42a6ce2d697537da34f77a1960d21476c6a4af3e539eddb2b114c3cf65a78c"}, + {file = "pyzmq-27.0.2-cp312-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:3e44e665d78a07214b2772ccbd4b9bcc6d848d7895f1b2d7653f047b6318a4f6"}, + {file = "pyzmq-27.0.2-cp312-abi3-win32.whl", hash = "sha256:272d772d116615397d2be2b1417b3b8c8bc8671f93728c2f2c25002a4530e8f6"}, + {file = "pyzmq-27.0.2-cp312-abi3-win_amd64.whl", hash = "sha256:734be4f44efba0aa69bf5f015ed13eb69ff29bf0d17ea1e21588b095a3147b8e"}, + {file = "pyzmq-27.0.2-cp312-abi3-win_arm64.whl", hash = "sha256:41f0bd56d9279392810950feb2785a419c2920bbf007fdaaa7f4a07332ae492d"}, + {file = "pyzmq-27.0.2-cp313-cp313-android_24_arm64_v8a.whl", hash = "sha256:7f01118133427cd7f34ee133b5098e2af5f70303fa7519785c007bca5aa6f96a"}, + {file = "pyzmq-27.0.2-cp313-cp313-android_24_x86_64.whl", hash = "sha256:e4b860edf6379a7234ccbb19b4ed2c57e3ff569c3414fadfb49ae72b61a8ef07"}, + {file = "pyzmq-27.0.2-cp313-cp313t-macosx_10_15_universal2.whl", hash = "sha256:cb77923ea163156da14295c941930bd525df0d29c96c1ec2fe3c3806b1e17cb3"}, + {file = "pyzmq-27.0.2-cp313-cp313t-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:61678b7407b04df8f9423f188156355dc94d0fb52d360ae79d02ed7e0d431eea"}, + {file = "pyzmq-27.0.2-cp313-cp313t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e3c824b70925963bdc8e39a642672c15ffaa67e7d4b491f64662dd56d6271263"}, + {file = "pyzmq-27.0.2-cp313-cp313t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c4833e02fcf2751975457be1dfa2f744d4d09901a8cc106acaa519d868232175"}, + {file = "pyzmq-27.0.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:b18045668d09cf0faa44918af2a67f0dbbef738c96f61c2f1b975b1ddb92ccfc"}, + {file = "pyzmq-27.0.2-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:bbbb7e2f3ac5a22901324e7b086f398b8e16d343879a77b15ca3312e8cd8e6d5"}, + {file = "pyzmq-27.0.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:b751914a73604d40d88a061bab042a11d4511b3ddbb7624cd83c39c8a498564c"}, + {file = "pyzmq-27.0.2-cp313-cp313t-win32.whl", hash = "sha256:3e8f833dd82af11db5321c414638045c70f61009f72dd61c88db4a713c1fb1d2"}, + {file = "pyzmq-27.0.2-cp313-cp313t-win_amd64.whl", hash = "sha256:5b45153cb8eadcab14139970643a84f7a7b08dda541fbc1f6f4855c49334b549"}, + {file = "pyzmq-27.0.2-cp313-cp313t-win_arm64.whl", hash = "sha256:86898f5c9730df23427c1ee0097d8aa41aa5f89539a79e48cd0d2c22d059f1b7"}, + {file = "pyzmq-27.0.2-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:d2b4b261dce10762be5c116b6ad1f267a9429765b493c454f049f33791dd8b8a"}, + {file = "pyzmq-27.0.2-cp314-cp314t-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:4e4d88b6cff156fed468903006b24bbd85322612f9c2f7b96e72d5016fd3f543"}, + {file = "pyzmq-27.0.2-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8426c0ebbc11ed8416a6e9409c194142d677c2c5c688595f2743664e356d9e9b"}, + {file = "pyzmq-27.0.2-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:565bee96a155fe6452caed5fb5f60c9862038e6b51a59f4f632562081cdb4004"}, + {file = "pyzmq-27.0.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:5de735c745ca5cefe9c2d1547d8f28cfe1b1926aecb7483ab1102fd0a746c093"}, + {file = "pyzmq-27.0.2-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:ea4f498f8115fd90d7bf03a3e83ae3e9898e43362f8e8e8faec93597206e15cc"}, + {file = "pyzmq-27.0.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:d00e81cb0afd672915257a3927124ee2ad117ace3c256d39cd97ca3f190152ad"}, + {file = "pyzmq-27.0.2-cp314-cp314t-win32.whl", hash = "sha256:0f6e9b00d81b58f859fffc112365d50413954e02aefe36c5b4c8fb4af79f8cc3"}, + {file = "pyzmq-27.0.2-cp314-cp314t-win_amd64.whl", hash = "sha256:2e73cf3b127a437fef4100eb3ac2ebe6b49e655bb721329f667f59eca0a26221"}, + {file = "pyzmq-27.0.2-cp314-cp314t-win_arm64.whl", hash = "sha256:4108785f2e5ac865d06f678a07a1901e3465611356df21a545eeea8b45f56265"}, + {file = "pyzmq-27.0.2-cp38-cp38-macosx_10_15_universal2.whl", hash = "sha256:59a50f5eedf8ed20b7dbd57f1c29b2de003940dea3eedfbf0fbfea05ee7f9f61"}, + {file = "pyzmq-27.0.2-cp38-cp38-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:a00e6390e52770ba1ec753b2610f90b4f00e74c71cfc5405b917adf3cc39565e"}, + {file = "pyzmq-27.0.2-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:49d8d05d9844d83cddfbc86a82ac0cafe7ab694fcc9c9618de8d015c318347c3"}, + {file = "pyzmq-27.0.2-cp38-cp38-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3660d85e2b6a28eb2d586dedab9c61a7b7c64ab0d89a35d2973c7be336f12b0d"}, + {file = "pyzmq-27.0.2-cp38-cp38-musllinux_1_2_aarch64.whl", hash = "sha256:bccfee44b392f4d13bbf05aa88d8f7709271b940a8c398d4216fde6b717624ae"}, + {file = "pyzmq-27.0.2-cp38-cp38-musllinux_1_2_i686.whl", hash = "sha256:989066d51686415f1da646d6e2c5364a9b084777c29d9d1720aa5baf192366ef"}, + {file = "pyzmq-27.0.2-cp38-cp38-musllinux_1_2_x86_64.whl", hash = "sha256:cc283595b82f0db155a52f6462945c7b6b47ecaae2f681746eeea537c95cf8c9"}, + {file = "pyzmq-27.0.2-cp38-cp38-win32.whl", hash = "sha256:ad38daf57495beadc0d929e8901b2aa46ff474239b5a8a46ccc7f67dc01d2335"}, + {file = "pyzmq-27.0.2-cp38-cp38-win_amd64.whl", hash = "sha256:36508466a266cf78bba2f56529ad06eb38ba827f443b47388d420bec14d331ba"}, + {file = "pyzmq-27.0.2-cp39-cp39-macosx_10_15_universal2.whl", hash = "sha256:aa9c1c208c263b84386ac25bed6af5672397dc3c232638114fc09bca5c7addf9"}, + {file = "pyzmq-27.0.2-cp39-cp39-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:795c4884cfe7ea59f2b67d82b417e899afab889d332bfda13b02f8e0c155b2e4"}, + {file = "pyzmq-27.0.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:47eb65bb25478358ba3113dd9a08344f616f417ad3ffcbb190cd874fae72b1b1"}, + {file = "pyzmq-27.0.2-cp39-cp39-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a6fc24f00293f10aff04d55ca37029b280474c91f4de2cad5e911e5e10d733b7"}, + {file = "pyzmq-27.0.2-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:58d4cc9b6b768478adfc40a5cbee545303db8dbc81ba688474e0f499cc581028"}, + {file = "pyzmq-27.0.2-cp39-cp39-musllinux_1_2_i686.whl", hash = "sha256:cea2f26c5972796e02b222968a21a378d09eb4ff590eb3c5fafa8913f8c2bdf5"}, + {file = "pyzmq-27.0.2-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:a0621ec020c49fc1b6e31304f1a820900d54e7d9afa03ea1634264bf9387519e"}, + {file = "pyzmq-27.0.2-cp39-cp39-win32.whl", hash = "sha256:1326500792a9cb0992db06bbaf5d0098459133868932b81a6e90d45c39eca99d"}, + {file = "pyzmq-27.0.2-cp39-cp39-win_amd64.whl", hash = "sha256:5ee9560cb1e3094ef01fc071b361121a57ebb8d4232912b6607a6d7d2d0a97b4"}, + {file = "pyzmq-27.0.2-cp39-cp39-win_arm64.whl", hash = "sha256:85e3c6fb0d25ea046ebcfdc2bcb9683d663dc0280645c79a616ff5077962a15b"}, + {file = "pyzmq-27.0.2-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:d67a0960803a37b60f51b460c58444bc7033a804c662f5735172e21e74ee4902"}, + {file = "pyzmq-27.0.2-pp310-pypy310_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:dd4d3e6a567ffd0d232cfc667c49d0852d0ee7481458a2a1593b9b1bc5acba88"}, + {file = "pyzmq-27.0.2-pp310-pypy310_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5e558be423631704803bc6a642e2caa96083df759e25fe6eb01f2d28725f80bd"}, + {file = "pyzmq-27.0.2-pp310-pypy310_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c4c20ba8389f495c7b4f6b896bb1ca1e109a157d4f189267a902079699aaf787"}, + {file = "pyzmq-27.0.2-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:c5be232f7219414ff672ff7ab8c5a7e8632177735186d8a42b57b491fafdd64e"}, + {file = "pyzmq-27.0.2-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:e297784aea724294fe95e442e39a4376c2f08aa4fae4161c669f047051e31b02"}, + {file = "pyzmq-27.0.2-pp311-pypy311_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:e3659a79ded9745bc9c2aef5b444ac8805606e7bc50d2d2eb16dc3ab5483d91f"}, + {file = "pyzmq-27.0.2-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f3dba49ff037d02373a9306b58d6c1e0be031438f822044e8767afccfdac4c6b"}, + {file = "pyzmq-27.0.2-pp311-pypy311_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:de84e1694f9507b29e7b263453a2255a73e3d099d258db0f14539bad258abe41"}, + {file = "pyzmq-27.0.2-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:f0944d65ba2b872b9fcece08411d6347f15a874c775b4c3baae7f278550da0fb"}, + {file = "pyzmq-27.0.2-pp38-pypy38_pp73-macosx_10_15_x86_64.whl", hash = "sha256:05288947797dcd6724702db2056972dceef9963a83041eb734aea504416094ec"}, + {file = "pyzmq-27.0.2-pp38-pypy38_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:dff9198adbb6810ad857f3bfa59b4859c45acb02b0d198b39abeafb9148474f3"}, + {file = "pyzmq-27.0.2-pp38-pypy38_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:849123fd9982c7f63911fdceba9870f203f0f32c953a3bab48e7f27803a0e3ec"}, + {file = "pyzmq-27.0.2-pp38-pypy38_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c5ee06945f3069e3609819890a01958c4bbfea7a2b31ae87107c6478838d309e"}, + {file = "pyzmq-27.0.2-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:6156ad5e8bbe8a78a3f5b5757c9a883b0012325c83f98ce6d58fcec81e8b3d06"}, + {file = "pyzmq-27.0.2-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:400f34321e3bd89b1165b91ea6b18ad26042ba9ad0dfed8b35049e2e24eeab9b"}, + {file = "pyzmq-27.0.2-pp39-pypy39_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:9cbad4ef12e4c15c94d2c24ecd15a8ed56bf091c62f121a2b0c618ddd4b7402b"}, + {file = "pyzmq-27.0.2-pp39-pypy39_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:6b2b74aac3392b8cf508ccb68c980a8555298cd378434a2d065d6ce0f4211dff"}, + {file = "pyzmq-27.0.2-pp39-pypy39_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7db5db88c24cf9253065d69229a148ff60821e5d6f8ff72579b1f80f8f348bab"}, + {file = "pyzmq-27.0.2-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:8ffe40c216c41756ca05188c3e24a23142334b304f7aebd75c24210385e35573"}, + {file = "pyzmq-27.0.2.tar.gz", hash = "sha256:b398dd713b18de89730447347e96a0240225e154db56e35b6bb8447ffdb07798"}, ] [package.dependencies] @@ -7234,7 +7804,7 @@ files = [ {file = "rich-14.1.0-py3-none-any.whl", hash = "sha256:536f5f1785986d6dbdea3c75205c473f970777b4a0d6c6dd1b696aa05a3fa04f"}, {file = "rich-14.1.0.tar.gz", hash = "sha256:e497a48b844b0320d45007cdebfeaeed8db2a4f4bcf49f15e455cfc4af11eaa8"}, ] -markers = {main = "extra == \"pii-detection\""} +markers = {main = "extra == \"llm\" or extra == \"pii-detection\""} [package.dependencies] markdown-it-py = ">=2.2.0" @@ -7424,6 +7994,22 @@ files = [ {file = "rpds_py-0.27.0.tar.gz", hash = "sha256:8b23cf252f180cda89220b378d917180f29d313cd6a07b2431c0d3b776aae86f"}, ] +[[package]] +name = "rsa" +version = "4.9.1" +description = "Pure-Python RSA implementation" +optional = true +python-versions = "<4,>=3.6" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "rsa-4.9.1-py3-none-any.whl", hash = "sha256:68635866661c6836b8d39430f97a996acbd61bfa49406748ea243539fe239762"}, + {file = "rsa-4.9.1.tar.gz", hash = "sha256:e7bdbfdb5497da4c07dfd35530e1a902659db6ff241e39d9953cad06ebd0ae75"}, +] + +[package.dependencies] +pyasn1 = ">=0.1.3" + [[package]] name = "safetensors" version = "0.6.2" @@ -7905,7 +8491,7 @@ files = [ {file = "setuptools-80.9.0-py3-none-any.whl", hash = "sha256:062d34222ad13e0cc312a4c02d73f059e86a4acbfbdea8f8f76b28c99f306922"}, {file = "setuptools-80.9.0.tar.gz", hash = "sha256:f36b47402ecde768dbfafc46e8e4207b4360c654f1f3bb84475f0a28628fb19c"}, ] -markers = {main = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and (extra == \"all\" or extra == \"llm\" or extra == \"pytorch\" or extra == \"nlp\" or extra == \"pii-detection\") or python_version == \"3.12\" and (extra == \"pii-detection\" or extra == \"all\" or extra == \"llm\" or extra == \"pytorch\" or extra == \"nlp\") or extra == \"pii-detection\""} +markers = {main = "platform_system == \"Linux\" and platform_machine == \"x86_64\" and (extra == \"all\" or extra == \"llm\" or extra == \"pytorch\" or extra == \"nlp\" or extra == \"pii-detection\") or python_version == \"3.12\" and (extra == \"llm\" or extra == \"pii-detection\" or extra == \"all\" or extra == \"pytorch\" or extra == \"nlp\") or extra == \"llm\" or extra == \"pii-detection\""} [package.extras] check = ["pytest-checkdocs (>=2.4)", "pytest-ruff (>=0.2.1) ; sys_platform != \"cygwin\"", "ruff (>=0.8.0) ; sys_platform != \"cygwin\""] @@ -7985,7 +8571,7 @@ description = "Tool to Detect Surrounding Shell" optional = true python-versions = ">=3.7" groups = ["main"] -markers = "extra == \"pii-detection\"" +markers = "extra == \"llm\" or extra == \"pii-detection\"" files = [ {file = "shellingham-1.5.4-py2.py3-none-any.whl", hash = "sha256:7ecfff8f2fd72616f7481040475a65b2bf8af90a56c89140852d1120324e8686"}, {file = "shellingham-1.5.4.tar.gz", hash = "sha256:8dbca0739d487e5bd35ab3ca4b36e11c4078f3a234bfce294b0a0291363404de"}, @@ -8079,6 +8665,86 @@ files = [ {file = "soupsieve-2.7.tar.gz", hash = "sha256:ad282f9b6926286d2ead4750552c8a6142bc4c783fd66b0293547c8fe6ae126a"}, ] +[[package]] +name = "spacy" +version = "3.8.3" +description = "Industrial-strength Natural Language Processing (NLP) in Python" +optional = true +python-versions = "<3.13,>=3.9" +groups = ["main"] +markers = "python_version < \"3.11\" and extra == \"pii-detection\"" +files = [ + {file = "spacy-3.8.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:b530a5cbb077601d03bdd71bf1ded4de4b7fb0362b5443c5183c628cfa81ffdc"}, + {file = "spacy-3.8.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b28a5f7b77400ebf7e23aa24a82a2d35f97071cd5ef1ad0f859aa9b323fff59a"}, + {file = "spacy-3.8.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bbcfd24a00da30ca53570f5b1c3535c1fa95b633f2a12b3d08395c9552ffb53c"}, + {file = "spacy-3.8.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:e3630ea33608a6db8045fad7e0ba22f864c61ea351445488a89af1734e434a37"}, + {file = "spacy-3.8.3-cp310-cp310-win_amd64.whl", hash = "sha256:20839fa04cc2156ab613e40db54c25031304fdc1dd369930bc01c366586d0079"}, + {file = "spacy-3.8.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:b16b8f9c544cdccd1bd23fc6bf6e2f1d667a1ee285a9b31bdb4a89e2d61345b4"}, + {file = "spacy-3.8.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:f62e45a2259acc51cd8eb185f978848928f2f698ba174b283253485fb7691b04"}, + {file = "spacy-3.8.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:57a267ea25dd8b7ec3e55accd1592d2d0847f0c6277a55145af5bb08e318bab4"}, + {file = "spacy-3.8.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:45bc5fc8d399089607e3e759aee98362ffb007e39386531f195f42dcddcc94dc"}, + {file = "spacy-3.8.3-cp311-cp311-win_amd64.whl", hash = "sha256:9e348359d54418a5752305975f1268013135255bd656a783aa3397b3bd4dd5e9"}, + {file = "spacy-3.8.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b01e50086515fa6d43275be11a762a3a3285d9aabbe27b4f3b98a08083f1d2a1"}, + {file = "spacy-3.8.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:187f9732362d0dc52b16c80e67decf58ff91605e34b251c50c7dc5212082fcb4"}, + {file = "spacy-3.8.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d7517bc969bca924cbdba4e14e0ce16e66d32967468ad27490e95c9b4d8d8aa8"}, + {file = "spacy-3.8.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:460948437c5571367105554b1e67549f957ba8dd6ee7e1594e719f9a88c398bb"}, + {file = "spacy-3.8.3-cp312-cp312-win_amd64.whl", hash = "sha256:1f14d4e2b1e6ab144ee546236f2c32b255f91f24939e62436c3a9c2ee200c6d1"}, + {file = "spacy-3.8.3-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:6f6020603633ec47374af71e936671d5992d68e592661dffac940f5596d77696"}, + {file = "spacy-3.8.3-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:72b492651534460bf4fe842f7efa462887f9e215de86146b862df6238b952650"}, + {file = "spacy-3.8.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6a630119aaa7a6180635eb8f21b27509654882847480c8423a657582b4a9bdd3"}, + {file = "spacy-3.8.3-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:8563ba9cbb71a629c7dc8c2db98f0348416dc0f0927de0e9ed8b448f707b5248"}, + {file = "spacy-3.8.3-cp39-cp39-win_amd64.whl", hash = "sha256:608beca075f7611083e93c91625d7e6c5885e2672cb5ec1b9f274cab6c82c816"}, + {file = "spacy-3.8.3.tar.gz", hash = "sha256:81a967dc3d6a5a0a9ab250559483fe2092306582a9192f98be7a63bdce2797f7"}, +] + +[package.dependencies] +catalogue = ">=2.0.6,<2.1.0" +cymem = ">=2.0.2,<2.1.0" +jinja2 = "*" +langcodes = ">=3.2.0,<4.0.0" +murmurhash = ">=0.28.0,<1.1.0" +numpy = {version = ">=1.19.0", markers = "python_version >= \"3.9\""} +packaging = ">=20.0" +preshed = ">=3.0.2,<3.1.0" +pydantic = ">=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0" +requests = ">=2.13.0,<3.0.0" +setuptools = "*" +spacy-legacy = ">=3.0.11,<3.1.0" +spacy-loggers = ">=1.0.0,<2.0.0" +srsly = ">=2.4.3,<3.0.0" +thinc = ">=8.3.0,<8.4.0" +tqdm = ">=4.38.0,<5.0.0" +typer = ">=0.3.0,<1.0.0" +wasabi = ">=0.9.1,<1.2.0" +weasel = ">=0.1.0,<0.5.0" + +[package.extras] +apple = ["thinc-apple-ops (>=1.0.0,<2.0.0)"] +cuda = ["cupy (>=5.0.0b4,<13.0.0)"] +cuda-autodetect = ["cupy-wheel (>=11.0.0,<13.0.0)"] +cuda100 = ["cupy-cuda100 (>=5.0.0b4,<13.0.0)"] +cuda101 = ["cupy-cuda101 (>=5.0.0b4,<13.0.0)"] +cuda102 = ["cupy-cuda102 (>=5.0.0b4,<13.0.0)"] +cuda110 = ["cupy-cuda110 (>=5.0.0b4,<13.0.0)"] +cuda111 = ["cupy-cuda111 (>=5.0.0b4,<13.0.0)"] +cuda112 = ["cupy-cuda112 (>=5.0.0b4,<13.0.0)"] +cuda113 = ["cupy-cuda113 (>=5.0.0b4,<13.0.0)"] +cuda114 = ["cupy-cuda114 (>=5.0.0b4,<13.0.0)"] +cuda115 = ["cupy-cuda115 (>=5.0.0b4,<13.0.0)"] +cuda116 = ["cupy-cuda116 (>=5.0.0b4,<13.0.0)"] +cuda117 = ["cupy-cuda117 (>=5.0.0b4,<13.0.0)"] +cuda11x = ["cupy-cuda11x (>=11.0.0,<13.0.0)"] +cuda12x = ["cupy-cuda12x (>=11.5.0,<13.0.0)"] +cuda80 = ["cupy-cuda80 (>=5.0.0b4,<13.0.0)"] +cuda90 = ["cupy-cuda90 (>=5.0.0b4,<13.0.0)"] +cuda91 = ["cupy-cuda91 (>=5.0.0b4,<13.0.0)"] +cuda92 = ["cupy-cuda92 (>=5.0.0b4,<13.0.0)"] +ja = ["sudachidict_core (>=20211220)", "sudachipy (>=0.5.2,!=0.6.1)"] +ko = ["natto-py (>=0.9.0)"] +lookups = ["spacy_lookups_data (>=1.0.3,<1.1.0)"] +th = ["pythainlp (>=2.0)"] +transformers = ["spacy_transformers (>=1.1.2,<1.4.0)"] + [[package]] name = "spacy" version = "3.8.7" @@ -8086,7 +8752,7 @@ description = "Industrial-strength Natural Language Processing (NLP) in Python" optional = true python-versions = "<3.14,>=3.9" groups = ["main"] -markers = "extra == \"pii-detection\"" +markers = "python_version >= \"3.11\" and extra == \"pii-detection\"" files = [ {file = "spacy-3.8.7-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:6ec0368ce96cd775fb14906f04b771c912ea8393ba30f8b35f9c4dc47a420b8e"}, {file = "spacy-3.8.7-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:5672f8a0fe7a3847e925544890be60015fbf48a60a838803425f82e849dd4f18"}, @@ -8631,14 +9297,14 @@ dev = ["hypothesis (>=6.70.0)", "pytest (>=7.1.0)"] [[package]] name = "tabulate" -version = "0.8.10" +version = "0.9.0" description = "Pretty-print tabular data" optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" +python-versions = ">=3.7" groups = ["main"] files = [ - {file = "tabulate-0.8.10-py3-none-any.whl", hash = "sha256:0ba055423dbaa164b9e456abe7920c5e8ed33fcc16f6d1b2f2d152c8e1e8b4fc"}, - {file = "tabulate-0.8.10.tar.gz", hash = "sha256:6c57f3f3dd7ac2782770155f3adb2db0b1a269637e42f27599925e64b114f519"}, + {file = "tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f"}, + {file = "tabulate-0.9.0.tar.gz", hash = "sha256:0095b12bf5966de529c0feb1fa08671671b3368eec77d7ef7ab114be2c068b3c"}, ] [package.extras] @@ -8933,8 +9599,7 @@ version = "2.2.1" description = "A lil' TOML parser" optional = false python-versions = ">=3.8" -groups = ["dev"] -markers = "python_version < \"3.11\"" +groups = ["main", "dev"] files = [ {file = "tomli-2.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678e4fa69e4575eb77d103de3df8a895e1591b48e740211bd1067378c69e8249"}, {file = "tomli-2.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:023aa114dd824ade0100497eb2318602af309e5a55595f76b626d6d9f3b7b0a6"}, @@ -8969,6 +9634,7 @@ files = [ {file = "tomli-2.2.1-py3-none-any.whl", hash = "sha256:cb55c73c5f4408779d0cf3eef9f762b9c9f147a77de7b258bef0a5628adc85cc"}, {file = "tomli-2.2.1.tar.gz", hash = "sha256:cd45e1dc79c835ce60f7404ec8119f2eb06d38b1deba146f07ced3bbc44505ff"}, ] +markers = {main = "python_version < \"3.11\" and extra == \"llm\"", dev = "python_version < \"3.11\""} [[package]] name = "torch" @@ -9096,15 +9762,15 @@ test = ["argcomplete (>=3.0.3)", "mypy (>=1.7.0)", "pre-commit", "pytest (>=7.0, [[package]] name = "transformers" -version = "4.55.2" +version = "4.55.4" description = "State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow" optional = true python-versions = ">=3.9.0" groups = ["main"] markers = "extra == \"all\" or extra == \"huggingface\" or extra == \"llm\" or extra == \"nlp\"" files = [ - {file = "transformers-4.55.2-py3-none-any.whl", hash = "sha256:097e3c2e2c0c9681db3da9d748d8f9d6a724c644514673d0030e8c5a1109f1f1"}, - {file = "transformers-4.55.2.tar.gz", hash = "sha256:a45ec60c03474fd67adbce5c434685051b7608b3f4f167c25aa6aeb1cad16d4f"}, + {file = "transformers-4.55.4-py3-none-any.whl", hash = "sha256:df28f3849665faba4af5106f0db4510323277c4bb595055340544f7e59d06458"}, + {file = "transformers-4.55.4.tar.gz", hash = "sha256:574a30559bc273c7a4585599ff28ab6b676e96dc56ffd2025ecfce2fd0ab915d"}, ] [package.dependencies] @@ -9225,7 +9891,7 @@ description = "Typer, build great CLIs. Easy to code. Based on Python type hints optional = true python-versions = ">=3.7" groups = ["main"] -markers = "extra == \"pii-detection\"" +markers = "extra == \"llm\" or extra == \"pii-detection\"" files = [ {file = "typer-0.16.1-py3-none-any.whl", hash = "sha256:90ee01cb02d9b8395ae21ee3368421faf21fa138cb2a541ed369c08cec5237c9"}, {file = "typer-0.16.1.tar.gz", hash = "sha256:d358c65a464a7a90f338e3bb7ff0c74ac081449e53884b12ba658cbd72990614"}, @@ -9239,14 +9905,14 @@ typing-extensions = ">=3.7.4.3" [[package]] name = "types-python-dateutil" -version = "2.9.0.20250809" +version = "2.9.0.20250822" description = "Typing stubs for python-dateutil" optional = false python-versions = ">=3.9" groups = ["dev"] files = [ - {file = "types_python_dateutil-2.9.0.20250809-py3-none-any.whl", hash = "sha256:768890cac4f2d7fd9e0feb6f3217fce2abbfdfc0cadd38d11fba325a815e4b9f"}, - {file = "types_python_dateutil-2.9.0.20250809.tar.gz", hash = "sha256:69cbf8d15ef7a75c3801d65d63466e46ac25a0baa678d89d0a137fc31a608cc1"}, + {file = "types_python_dateutil-2.9.0.20250822-py3-none-any.whl", hash = "sha256:849d52b737e10a6dc6621d2bd7940ec7c65fcb69e6aa2882acf4e56b2b508ddc"}, + {file = "types_python_dateutil-2.9.0.20250822.tar.gz", hash = "sha256:84c92c34bd8e68b117bff742bc00b692a1e8531262d4507b33afcc9f7716cd53"}, ] [[package]] @@ -9478,6 +10144,102 @@ docs = ["Sphinx (>=6.0)", "myst-parser (>=2.0.0)", "sphinx-rtd-theme (>=1.1.0)"] optional = ["python-socks", "wsaccel"] test = ["websockets"] +[[package]] +name = "websockets" +version = "15.0.1" +description = "An implementation of the WebSocket Protocol (RFC 6455 & 7692)" +optional = true +python-versions = ">=3.9" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "websockets-15.0.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:d63efaa0cd96cf0c5fe4d581521d9fa87744540d4bc999ae6e08595a1014b45b"}, + {file = "websockets-15.0.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:ac60e3b188ec7574cb761b08d50fcedf9d77f1530352db4eef1707fe9dee7205"}, + {file = "websockets-15.0.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:5756779642579d902eed757b21b0164cd6fe338506a8083eb58af5c372e39d9a"}, + {file = "websockets-15.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0fdfe3e2a29e4db3659dbd5bbf04560cea53dd9610273917799f1cde46aa725e"}, + {file = "websockets-15.0.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4c2529b320eb9e35af0fa3016c187dffb84a3ecc572bcee7c3ce302bfeba52bf"}, + {file = "websockets-15.0.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ac1e5c9054fe23226fb11e05a6e630837f074174c4c2f0fe442996112a6de4fb"}, + {file = "websockets-15.0.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:5df592cd503496351d6dc14f7cdad49f268d8e618f80dce0cd5a36b93c3fc08d"}, + {file = "websockets-15.0.1-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:0a34631031a8f05657e8e90903e656959234f3a04552259458aac0b0f9ae6fd9"}, + {file = "websockets-15.0.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:3d00075aa65772e7ce9e990cab3ff1de702aa09be3940d1dc88d5abf1ab8a09c"}, + {file = "websockets-15.0.1-cp310-cp310-win32.whl", hash = "sha256:1234d4ef35db82f5446dca8e35a7da7964d02c127b095e172e54397fb6a6c256"}, + {file = "websockets-15.0.1-cp310-cp310-win_amd64.whl", hash = "sha256:39c1fec2c11dc8d89bba6b2bf1556af381611a173ac2b511cf7231622058af41"}, + {file = "websockets-15.0.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:823c248b690b2fd9303ba00c4f66cd5e2d8c3ba4aa968b2779be9532a4dad431"}, + {file = "websockets-15.0.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678999709e68425ae2593acf2e3ebcbcf2e69885a5ee78f9eb80e6e371f1bf57"}, + {file = "websockets-15.0.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:d50fd1ee42388dcfb2b3676132c78116490976f1300da28eb629272d5d93e905"}, + {file = "websockets-15.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d99e5546bf73dbad5bf3547174cd6cb8ba7273062a23808ffea025ecb1cf8562"}, + {file = "websockets-15.0.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:66dd88c918e3287efc22409d426c8f729688d89a0c587c88971a0faa2c2f3792"}, + {file = "websockets-15.0.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8dd8327c795b3e3f219760fa603dcae1dcc148172290a8ab15158cf85a953413"}, + {file = "websockets-15.0.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:8fdc51055e6ff4adeb88d58a11042ec9a5eae317a0a53d12c062c8a8865909e8"}, + {file = "websockets-15.0.1-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:693f0192126df6c2327cce3baa7c06f2a117575e32ab2308f7f8216c29d9e2e3"}, + {file = "websockets-15.0.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:54479983bd5fb469c38f2f5c7e3a24f9a4e70594cd68cd1fa6b9340dadaff7cf"}, + {file = "websockets-15.0.1-cp311-cp311-win32.whl", hash = "sha256:16b6c1b3e57799b9d38427dda63edcbe4926352c47cf88588c0be4ace18dac85"}, + {file = "websockets-15.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:27ccee0071a0e75d22cb35849b1db43f2ecd3e161041ac1ee9d2352ddf72f065"}, + {file = "websockets-15.0.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:3e90baa811a5d73f3ca0bcbf32064d663ed81318ab225ee4f427ad4e26e5aff3"}, + {file = "websockets-15.0.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:592f1a9fe869c778694f0aa806ba0374e97648ab57936f092fd9d87f8bc03665"}, + {file = "websockets-15.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:0701bc3cfcb9164d04a14b149fd74be7347a530ad3bbf15ab2c678a2cd3dd9a2"}, + {file = "websockets-15.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e8b56bdcdb4505c8078cb6c7157d9811a85790f2f2b3632c7d1462ab5783d215"}, + {file = "websockets-15.0.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0af68c55afbd5f07986df82831c7bff04846928ea8d1fd7f30052638788bc9b5"}, + {file = "websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:64dee438fed052b52e4f98f76c5790513235efaa1ef7f3f2192c392cd7c91b65"}, + {file = "websockets-15.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:d5f6b181bb38171a8ad1d6aa58a67a6aa9d4b38d0f8c5f496b9e42561dfc62fe"}, + {file = "websockets-15.0.1-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:5d54b09eba2bada6011aea5375542a157637b91029687eb4fdb2dab11059c1b4"}, + {file = "websockets-15.0.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:3be571a8b5afed347da347bfcf27ba12b069d9d7f42cb8c7028b5e98bbb12597"}, + {file = "websockets-15.0.1-cp312-cp312-win32.whl", hash = "sha256:c338ffa0520bdb12fbc527265235639fb76e7bc7faafbb93f6ba80d9c06578a9"}, + {file = "websockets-15.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:fcd5cf9e305d7b8338754470cf69cf81f420459dbae8a3b40cee57417f4614a7"}, + {file = "websockets-15.0.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:ee443ef070bb3b6ed74514f5efaa37a252af57c90eb33b956d35c8e9c10a1931"}, + {file = "websockets-15.0.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:5a939de6b7b4e18ca683218320fc67ea886038265fd1ed30173f5ce3f8e85675"}, + {file = "websockets-15.0.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:746ee8dba912cd6fc889a8147168991d50ed70447bf18bcda7039f7d2e3d9151"}, + {file = "websockets-15.0.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:595b6c3969023ecf9041b2936ac3827e4623bfa3ccf007575f04c5a6aa318c22"}, + {file = "websockets-15.0.1-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:3c714d2fc58b5ca3e285461a4cc0c9a66bd0e24c5da9911e30158286c9b5be7f"}, + {file = "websockets-15.0.1-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0f3c1e2ab208db911594ae5b4f79addeb3501604a165019dd221c0bdcabe4db8"}, + {file = "websockets-15.0.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:229cf1d3ca6c1804400b0a9790dc66528e08a6a1feec0d5040e8b9eb14422375"}, + {file = "websockets-15.0.1-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:756c56e867a90fb00177d530dca4b097dd753cde348448a1012ed6c5131f8b7d"}, + {file = "websockets-15.0.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:558d023b3df0bffe50a04e710bc87742de35060580a293c2a984299ed83bc4e4"}, + {file = "websockets-15.0.1-cp313-cp313-win32.whl", hash = "sha256:ba9e56e8ceeeedb2e080147ba85ffcd5cd0711b89576b83784d8605a7df455fa"}, + {file = "websockets-15.0.1-cp313-cp313-win_amd64.whl", hash = "sha256:e09473f095a819042ecb2ab9465aee615bd9c2028e4ef7d933600a8401c79561"}, + {file = "websockets-15.0.1-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:5f4c04ead5aed67c8a1a20491d54cdfba5884507a48dd798ecaf13c74c4489f5"}, + {file = "websockets-15.0.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:abdc0c6c8c648b4805c5eacd131910d2a7f6455dfd3becab248ef108e89ab16a"}, + {file = "websockets-15.0.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:a625e06551975f4b7ea7102bc43895b90742746797e2e14b70ed61c43a90f09b"}, + {file = "websockets-15.0.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d591f8de75824cbb7acad4e05d2d710484f15f29d4a915092675ad3456f11770"}, + {file = "websockets-15.0.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:47819cea040f31d670cc8d324bb6435c6f133b8c7a19ec3d61634e62f8d8f9eb"}, + {file = "websockets-15.0.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ac017dd64572e5c3bd01939121e4d16cf30e5d7e110a119399cf3133b63ad054"}, + {file = "websockets-15.0.1-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:4a9fac8e469d04ce6c25bb2610dc535235bd4aa14996b4e6dbebf5e007eba5ee"}, + {file = "websockets-15.0.1-cp39-cp39-musllinux_1_2_i686.whl", hash = "sha256:363c6f671b761efcb30608d24925a382497c12c506b51661883c3e22337265ed"}, + {file = "websockets-15.0.1-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:2034693ad3097d5355bfdacfffcbd3ef5694f9718ab7f29c29689a9eae841880"}, + {file = "websockets-15.0.1-cp39-cp39-win32.whl", hash = "sha256:3b1ac0d3e594bf121308112697cf4b32be538fb1444468fb0a6ae4feebc83411"}, + {file = "websockets-15.0.1-cp39-cp39-win_amd64.whl", hash = "sha256:b7643a03db5c95c799b89b31c036d5f27eeb4d259c798e878d6937d71832b1e4"}, + {file = "websockets-15.0.1-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:0c9e74d766f2818bb95f84c25be4dea09841ac0f734d1966f415e4edfc4ef1c3"}, + {file = "websockets-15.0.1-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:1009ee0c7739c08a0cd59de430d6de452a55e42d6b522de7aa15e6f67db0b8e1"}, + {file = "websockets-15.0.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:76d1f20b1c7a2fa82367e04982e708723ba0e7b8d43aa643d3dcd404d74f1475"}, + {file = "websockets-15.0.1-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f29d80eb9a9263b8d109135351caf568cc3f80b9928bccde535c235de55c22d9"}, + {file = "websockets-15.0.1-pp310-pypy310_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b359ed09954d7c18bbc1680f380c7301f92c60bf924171629c5db97febb12f04"}, + {file = "websockets-15.0.1-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:cad21560da69f4ce7658ca2cb83138fb4cf695a2ba3e475e0559e05991aa8122"}, + {file = "websockets-15.0.1-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:7f493881579c90fc262d9cdbaa05a6b54b3811c2f300766748db79f098db9940"}, + {file = "websockets-15.0.1-pp39-pypy39_pp73-macosx_11_0_arm64.whl", hash = "sha256:47b099e1f4fbc95b701b6e85768e1fcdaf1630f3cbe4765fa216596f12310e2e"}, + {file = "websockets-15.0.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:67f2b6de947f8c757db2db9c71527933ad0019737ec374a8a6be9a956786aaf9"}, + {file = "websockets-15.0.1-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d08eb4c2b7d6c41da6ca0600c077e93f5adcfd979cd777d747e9ee624556da4b"}, + {file = "websockets-15.0.1-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4b826973a4a2ae47ba357e4e82fa44a463b8f168e1ca775ac64521442b19e87f"}, + {file = "websockets-15.0.1-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:21c1fa28a6a7e3cbdc171c694398b6df4744613ce9b36b1a498e816787e28123"}, + {file = "websockets-15.0.1-py3-none-any.whl", hash = "sha256:f7a866fbc1e97b5c617ee4116daaa09b722101d4a3c170c787450ba409f9736f"}, + {file = "websockets-15.0.1.tar.gz", hash = "sha256:82544de02076bafba038ce055ee6412d68da13ab47f0c60cab827346de828dee"}, +] + +[[package]] +name = "wheel" +version = "0.45.1" +description = "A built-package format for Python" +optional = true +python-versions = ">=3.8" +groups = ["main"] +markers = "extra == \"llm\"" +files = [ + {file = "wheel-0.45.1-py3-none-any.whl", hash = "sha256:708e7481cc80179af0e556bbf0cc00b8444c7321e2700b8d8580231d13017248"}, + {file = "wheel-0.45.1.tar.gz", hash = "sha256:661e1abd9198507b1409a20c02106d9670b2576e916d58f520316666abca6729"}, +] + +[package.extras] +test = ["pytest (>=6.0.0)", "setuptools (>=65)"] + [[package]] name = "widgetsnbextension" version = "4.0.14" @@ -9895,7 +10657,7 @@ files = [ {file = "zipp-3.23.0-py3-none-any.whl", hash = "sha256:071652d6115ed432f5ce1d34c336c0adfd6a884660d1e9712a256d3d3bd4b14e"}, {file = "zipp-3.23.0.tar.gz", hash = "sha256:a07157588a12518c9d4034df3fbbee09c814741a33ff63c05fa29d26a2404166"}, ] -markers = {main = "python_version == \"3.9\""} +markers = {main = "extra == \"llm\" or python_version == \"3.9\""} [package.extras] check = ["pytest-checkdocs (>=2.4)", "pytest-ruff (>=0.2.1) ; sys_platform != \"cygwin\""] @@ -10024,7 +10786,7 @@ credit-risk = ["scorecardpy"] datasets = ["datasets"] explainability = ["shap"] huggingface = ["sentencepiece", "transformers"] -llm = ["langchain-openai", "pycocoevalcap", "ragas", "sentencepiece", "torch", "transformers"] +llm = ["deepeval", "langchain-openai", "pycocoevalcap", "ragas", "sentencepiece", "torch", "transformers"] nlp = ["bert-score", "evaluate", "langdetect", "nltk", "rouge", "textblob"] pii-detection = ["presidio-analyzer", "presidio-structured"] pytorch = ["torch"] @@ -10034,4 +10796,4 @@ xgboost = ["xgboost"] [metadata] lock-version = "2.1" python-versions = ">=3.9,<3.13" -content-hash = "8ee77fe173c5abeed209af25844ec7bf38da5cc43a648ef9561a2eee08cbe84c" +content-hash = "c0d19b5f56a04e23ab24ef3dda0ff866f7cadc21ab47721b083021feea7a0104" diff --git a/pyproject.toml b/pyproject.toml index 92b9957b3..b27773081 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -27,7 +27,7 @@ dependencies = [ "scikit-learn", "seaborn", "sentry-sdk (>=1.24.0,<2.0.0)", - "tabulate (>=0.8.9,<0.9.0)", + "tabulate (>=0.9.0,<0.10.0)", "tiktoken", "tqdm", "anywidget", @@ -66,6 +66,7 @@ llm = [ "ragas (>=0.2.3,<=0.2.7)", "sentencepiece (>=0.2.0,<0.3.0)", "langchain-openai (>=0.1.8)", + "deepeval (>3.3.9)", ] nlp = [ "langdetect", diff --git a/tests/test_dataset.py b/tests/test_dataset.py index c15aa07fe..f5e6e590d 100644 --- a/tests/test_dataset.py +++ b/tests/test_dataset.py @@ -534,19 +534,19 @@ def test_assign_scores_single_metric(self): vm_dataset.assign_predictions(model=vm_model) # Test assign_scores with single metric - vm_dataset.assign_scores(vm_model, "F1") + vm_dataset.assign_scores(model = vm_model, metrics = "validmind.scorer.classification.LogLoss") # Check that the metric column was added - expected_column = f"{vm_model.input_id}_F1" + expected_column = f"{vm_model.input_id}_LogLoss" self.assertTrue(expected_column in vm_dataset.df.columns) - # Verify the column has the same value for all rows (scalar metric) + # Verify the column has different values for different rows (row metric) metric_values = vm_dataset.df[expected_column] - self.assertEqual(metric_values.nunique(), 1, "All rows should have the same metric value") + self.assertGreater(metric_values.nunique(), 1, "Row metric should have different values per row") - # Verify the value is reasonable for F1 score (between 0 and 1) - f1_value = metric_values.iloc[0] - self.assertTrue(0 <= f1_value <= 1, f"F1 score should be between 0 and 1, got {f1_value}") + # Verify the values are reasonable for LogLoss (non-negative) + logloss_values = metric_values + self.assertTrue((logloss_values >= 0).all(), "LogLoss should be non-negative, got negative values") def test_assign_scores_multiple_metrics(self): """ @@ -566,21 +566,23 @@ def test_assign_scores_multiple_metrics(self): vm_dataset.assign_predictions(model=vm_model) # Test assign_scores with multiple metrics - metrics = ["F1", "Precision", "Recall"] - vm_dataset.assign_scores(vm_model, metrics) + metrics = ["validmind.scorer.classification.LogLoss", "validmind.scorer.classification.BrierScore", "validmind.scorer.classification.Confidence"] + metrics_column_name = [metric.split(".")[-1] for metric in metrics] + + vm_dataset.assign_scores(model = vm_model, metrics = metrics) # Check that all metric columns were added - for metric in metrics: + for metric in metrics_column_name: expected_column = f"{vm_model.input_id}_{metric}" self.assertTrue(expected_column in vm_dataset.df.columns) - # Verify each column has the same value for all rows + # Verify each column has different values for different rows (row metrics) metric_values = vm_dataset.df[expected_column] - self.assertEqual(metric_values.nunique(), 1, f"All rows should have the same {metric} value") + self.assertGreater(metric_values.nunique(), 1, f"Row metric {metric} should have different values per row") - # Verify the value is reasonable (between 0 and 1 for these metrics) - metric_value = metric_values.iloc[0] - self.assertTrue(0 <= metric_value <= 1, f"{metric} should be between 0 and 1, got {metric_value}") + # Verify the values are reasonable (non-negative for these metrics) + metric_values_array = metric_values + self.assertTrue((metric_values_array >= 0).all(), f"{metric} should be non-negative, got negative values") def test_assign_scores_with_parameters(self): """ @@ -600,16 +602,15 @@ def test_assign_scores_with_parameters(self): vm_dataset.assign_predictions(model=vm_model) # Test assign_scores with parameters - vm_dataset.assign_scores(vm_model, "ROC_AUC", **{"average": "weighted"}) + vm_dataset.assign_scores(model = vm_model, metrics = "validmind.scorer.classification.LogLoss") # Check that the metric column was added - expected_column = f"{vm_model.input_id}_ROC_AUC" + expected_column = f"{vm_model.input_id}_LogLoss" self.assertTrue(expected_column in vm_dataset.df.columns) - # Verify the value is reasonable for ROC AUC (between 0 and 1) - roc_values = vm_dataset.df[expected_column] - roc_value = roc_values.iloc[0] - self.assertTrue(0 <= roc_value <= 1, f"ROC AUC should be between 0 and 1, got {roc_value}") + # Verify the values are reasonable for LogLoss (non-negative) + logloss_values = vm_dataset.df[expected_column] + self.assertTrue((logloss_values >= 0).all(), "LogLoss should be non-negative") def test_assign_scores_full_metric_id(self): """ @@ -629,17 +630,16 @@ def test_assign_scores_full_metric_id(self): vm_dataset.assign_predictions(model=vm_model) # Test assign_scores with full metric ID - full_metric_id = "validmind.unit_metrics.classification.Accuracy" - vm_dataset.assign_scores(vm_model, full_metric_id) + full_metric_id = "validmind.scorer.classification.LogLoss" + vm_dataset.assign_scores(model = vm_model, metrics = full_metric_id) # Check that the metric column was added with correct name - expected_column = f"{vm_model.input_id}_Accuracy" + expected_column = f"{vm_model.input_id}_LogLoss" self.assertTrue(expected_column in vm_dataset.df.columns) - # Verify the value is reasonable for accuracy (between 0 and 1) - accuracy_values = vm_dataset.df[expected_column] - accuracy_value = accuracy_values.iloc[0] - self.assertTrue(0 <= accuracy_value <= 1, f"Accuracy should be between 0 and 1, got {accuracy_value}") + # Verify the values are reasonable for LogLoss (non-negative) + logloss_values = vm_dataset.df[expected_column] + self.assertTrue((logloss_values >= 0).all(), "LogLoss should be non-negative") def test_assign_scores_regression_model(self): """ @@ -658,27 +658,25 @@ def test_assign_scores_regression_model(self): # Assign predictions first vm_dataset.assign_predictions(model=vm_model) - # Test assign_scores with regression metrics - vm_dataset.assign_scores(vm_model, ["MeanSquaredError", "RSquaredScore"]) + # Test assign_scores with available row metrics (using classification metrics for testing) + vm_dataset.assign_scores(model=vm_model, metrics=["validmind.scorer.classification.LogLoss", "validmind.scorer.classification.BrierScore"]) # Check that both metric columns were added - expected_columns = ["reg_model_MeanSquaredError", "reg_model_RSquaredScore"] + expected_columns = ["reg_model_LogLoss", "reg_model_BrierScore"] for column in expected_columns: self.assertTrue(column in vm_dataset.df.columns) - # Verify R-squared is reasonable (can be negative, but typically between -1 and 1 for reasonable models) - r2_values = vm_dataset.df["reg_model_RSquaredScore"] - r2_value = r2_values.iloc[0] - self.assertTrue(-2 <= r2_value <= 1, f"R-squared should be reasonable, got {r2_value}") + # Verify LogLoss is reasonable (non-negative) + logloss_values = vm_dataset.df["reg_model_LogLoss"] + self.assertTrue((logloss_values >= 0).all(), "LogLoss should be non-negative") - # Verify MSE is non-negative - mse_values = vm_dataset.df["reg_model_MeanSquaredError"] - mse_value = mse_values.iloc[0] - self.assertTrue(mse_value >= 0, f"MSE should be non-negative, got {mse_value}") + # Verify BrierScore is reasonable (non-negative) + brier_values = vm_dataset.df["reg_model_BrierScore"] + self.assertTrue((brier_values >= 0).all(), "BrierScore should be non-negative") def test_assign_scores_no_model_input_id(self): """ - Test that assign_scores raises error when model has no input_id + Test that assign_scores works when model has no input_id (creates columns without prefix) """ df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) vm_dataset = DataFrameDataset( @@ -690,14 +688,22 @@ def test_assign_scores_no_model_input_id(self): model.fit(vm_dataset.x, vm_dataset.y.ravel()) vm_model = init_model(model=model, __log=False) # No input_id provided - # Clear the input_id to test the error case + # Clear the input_id to test the no prefix case vm_model.input_id = None - # Should raise ValueError - with self.assertRaises(ValueError) as context: - vm_dataset.assign_scores(vm_model, "F1") + # Assign predictions first (after clearing input_id) + vm_dataset.assign_predictions(model=vm_model) + + # Should work and create column without prefix + vm_dataset.assign_scores(model = vm_model, metrics = "validmind.scorer.classification.LogLoss") - self.assertIn("Model input_id must be set", str(context.exception)) + # Check that the metric column was added without prefix + expected_column = "LogLoss" # No model prefix + self.assertTrue(expected_column in vm_dataset.df.columns) + + # Verify the values are reasonable for LogLoss (non-negative) + logloss_values = vm_dataset.df[expected_column] + self.assertTrue((logloss_values >= 0).all(), "LogLoss should be non-negative") def test_assign_scores_invalid_metric(self): """ @@ -718,9 +724,9 @@ def test_assign_scores_invalid_metric(self): # Should raise ValueError for invalid metric with self.assertRaises(ValueError) as context: - vm_dataset.assign_scores(vm_model, "InvalidMetricName") + vm_dataset.assign_scores(model = vm_model, metrics = "InvalidMetricName") - self.assertIn("Metric 'InvalidMetricName' not found", str(context.exception)) + self.assertIn("Failed to compute metric InvalidMetricName:", str(context.exception)) def test_assign_scores_no_predictions(self): """ @@ -737,9 +743,9 @@ def test_assign_scores_no_predictions(self): vm_model = init_model(input_id="test_model", model=model, __log=False) # Don't assign predictions - test that assign_scores raises error - # (unit metrics require predictions to be available) + # (row metrics require predictions to be available) with self.assertRaises(ValueError) as context: - vm_dataset.assign_scores(vm_model, "F1") + vm_dataset.assign_scores(model = vm_model, metrics = "validmind.scorer.classification.LogLoss") self.assertIn("No prediction column found", str(context.exception)) @@ -761,11 +767,12 @@ def test_assign_scores_column_naming_convention(self): vm_dataset.assign_predictions(model=vm_model) # Test multiple metrics to verify naming convention - metrics = ["F1", "Precision", "Recall"] - vm_dataset.assign_scores(vm_model, metrics) + metrics = ["validmind.scorer.classification.LogLoss", "validmind.scorer.classification.BrierScore", "validmind.scorer.classification.Confidence"] + metrics_column_name = [metric.split(".")[-1] for metric in metrics] + vm_dataset.assign_scores(model = vm_model, metrics = metrics) # Verify all columns follow the naming convention: {model.input_id}_{metric_name} - for metric in metrics: + for metric in metrics_column_name: expected_column = f"my_special_model_{metric}" self.assertTrue(expected_column in vm_dataset.df.columns, f"Expected column '{expected_column}' not found") @@ -793,23 +800,306 @@ def test_assign_scores_multiple_models(self): vm_dataset.assign_predictions(model=vm_rf_model) # Assign scores for both models - vm_dataset.assign_scores(vm_lr_model, "F1") - vm_dataset.assign_scores(vm_rf_model, "F1") + vm_dataset.assign_scores(model = vm_lr_model, metrics = "validmind.scorer.classification.LogLoss") + vm_dataset.assign_scores(model = vm_rf_model, metrics = "validmind.scorer.classification.LogLoss") # Check that both metric columns exist with correct names - lr_column = "lr_model_F1" - rf_column = "rf_model_F1" + lr_column = "lr_model_LogLoss" + rf_column = "rf_model_LogLoss" self.assertTrue(lr_column in vm_dataset.df.columns) self.assertTrue(rf_column in vm_dataset.df.columns) # Verify that the values might be different (different models) - lr_f1 = vm_dataset.df[lr_column].iloc[0] - rf_f1 = vm_dataset.df[rf_column].iloc[0] + lr_logloss = vm_dataset.df[lr_column].iloc[0] + rf_logloss = vm_dataset.df[rf_column].iloc[0] + + # Both should be valid LogLoss scores (non-negative) + self.assertTrue(lr_logloss >= 0) + self.assertTrue(rf_logloss >= 0) + + def test_assign_scores_without_model(self): + """ + Test that assign_scores works without a model (creates columns without prefix) + """ + df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) + vm_dataset = DataFrameDataset( + raw_dataset=df, target_column="y", feature_columns=["x1", "x2"] + ) + + # Test assign_scores without model using a data validation test that doesn't require model + vm_dataset.assign_scores(metrics = "validmind.data_validation.MissingValues") + + # Check that the metric column was added without prefix + expected_column = "MissingValues" # No model prefix + self.assertTrue(expected_column in vm_dataset.df.columns) + + # Verify the values are reasonable (should be boolean or numeric) + missing_values = vm_dataset.df[expected_column] + self.assertTrue(len(missing_values) == len(df), "Should have one value per row") + + def test_assign_scores_without_model_multiple_metrics(self): + """ + Test that assign_scores works without a model for multiple metrics + """ + df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) + vm_dataset = DataFrameDataset( + raw_dataset=df, target_column="y", feature_columns=["x1", "x2"] + ) + + # Test assign_scores without model for multiple data validation metrics + metrics = ["validmind.data_validation.MissingValues", "validmind.data_validation.UniqueRows"] + vm_dataset.assign_scores(metrics) + + # Check that both metric columns were added without prefix + expected_columns = ["MissingValues", "UniqueRows"] + for column in expected_columns: + self.assertTrue(column in vm_dataset.df.columns) + + # Verify the values are reasonable (should have one value per row) + for column in expected_columns: + values = vm_dataset.df[column] + self.assertTrue(len(values) == len(df), f"{column} should have one value per row") + + def test_assign_scores_column_overwriting(self): + """ + Test that assign_scores overwrites existing columns with warning + """ + df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) + vm_dataset = DataFrameDataset( + raw_dataset=df, target_column="y", feature_columns=["x1", "x2"] + ) + + # First, add a column manually + vm_dataset.add_extra_column("MissingValues", [0.1, 0.2, 0.3]) + original_values = vm_dataset.df["MissingValues"].copy() + + # Now assign scores without model (should overwrite) + # Note: The warning is logged but not raised as an exception + vm_dataset.assign_scores("validmind.data_validation.MissingValues") + + # Check that the column still exists + self.assertTrue("MissingValues" in vm_dataset.df.columns) + + # Check that values were overwritten (should be different from original) + new_values = vm_dataset.df["MissingValues"] + self.assertFalse(original_values.equals(new_values), "Column values should have been overwritten") + + def test_assign_scores_mixed_model_scenarios(self): + """ + Test assign_scores with mixed scenarios: model with input_id, model without input_id, and no model + """ + df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) + vm_dataset = DataFrameDataset( + raw_dataset=df, target_column="y", feature_columns=["x1", "x2"] + ) + + # Train a model + model = LogisticRegression() + model.fit(vm_dataset.x, vm_dataset.y.ravel()) + vm_model = init_model(input_id="test_model", model=model, __log=False) + + # Assign predictions + vm_dataset.assign_predictions(model=vm_model) + + # Scenario 1: Model with input_id (should have prefix) + vm_dataset.assign_scores(model = vm_model, metrics = "validmind.scorer.classification.LogLoss") + self.assertTrue("test_model_LogLoss" in vm_dataset.df.columns) + + # Scenario 2: Model without input_id (should not have prefix) + vm_model_no_id = init_model(model=model, __log=False) + vm_model_no_id.input_id = None + # Assign predictions for this model too + vm_dataset.assign_predictions(model=vm_model_no_id) + vm_dataset.assign_scores(model = vm_model_no_id, metrics = "validmind.scorer.classification.BrierScore") + self.assertTrue("BrierScore" in vm_dataset.df.columns) + + # Scenario 3: No model (should not have prefix) + vm_dataset.assign_scores(metrics = "validmind.data_validation.MissingValues") + self.assertTrue("MissingValues" in vm_dataset.df.columns) + + # Verify all columns exist and have reasonable values + for column in ["test_model_LogLoss", "BrierScore", "MissingValues"]: + values = vm_dataset.df[column] + self.assertTrue(len(values) == len(df), f"{column} should have one value per row") + + def test_assign_scores_dict_output_without_model(self): + """ + Test assign_scores with dictionary output without model (no prefix) + """ + df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) + vm_dataset = DataFrameDataset( + raw_dataset=df, target_column="y", feature_columns=["x1", "x2"] + ) + + # Test with a data validation metric that doesn't require model + vm_dataset.assign_scores(metrics = "validmind.data_validation.MissingValues") + + # Check that the main column was created without prefix + self.assertTrue("MissingValues" in vm_dataset.df.columns) + + def test_assign_scores_scalar_output_without_model(self): + """ + Test assign_scores with scalar output without model (no prefix) + """ + df = pd.DataFrame({"x1": [1, 2, 3], "x2": [4, 5, 6], "y": [0, 1, 0]}) + vm_dataset = DataFrameDataset( + raw_dataset=df, target_column="y", feature_columns=["x1", "x2"] + ) + + # Test assign_scores without model using data validation metric + vm_dataset.assign_scores(metrics = "validmind.data_validation.MissingValues") + + # Check that the metric column was added without prefix + expected_column = "MissingValues" + self.assertTrue(expected_column in vm_dataset.df.columns) + + # Verify the column has values for all rows + values = vm_dataset.df[expected_column] + self.assertTrue(len(values) == len(df), "Should have one value per row") + + def test_process_dict_list_scorer_output(self): + """Test that _process_dict_list_scorer_output correctly handles list of dictionaries.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with valid list of dictionaries + scorer_output = [ + {"score": 0.1, "confidence": 0.9}, + {"score": 0.2, "confidence": 0.8}, + {"score": 0.3, "confidence": 0.7} + ] + + vm_dataset._process_dict_list_scorer_output(scorer_output, "test_model", "TestMetric") + + # Check that columns were added + self.assertTrue("test_model_TestMetric_score" in vm_dataset.df.columns) + self.assertTrue("test_model_TestMetric_confidence" in vm_dataset.df.columns) + + # Check values + expected_scores = [0.1, 0.2, 0.3] + expected_confidences = [0.9, 0.8, 0.7] + np.testing.assert_array_equal(vm_dataset.df["test_model_TestMetric_score"].values, expected_scores) + np.testing.assert_array_equal(vm_dataset.df["test_model_TestMetric_confidence"].values, expected_confidences) + + def test_process_dict_list_scorer_output_inconsistent_keys(self): + """Test that _process_dict_list_scorer_output raises error for inconsistent keys.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with inconsistent keys + scorer_output = [ + {"score": 0.1, "confidence": 0.9}, + {"score": 0.2, "confidence": 0.8}, + {"score": 0.3, "error": 0.1} # Different key + ] + + with self.assertRaises(ValueError) as context: + vm_dataset._process_dict_list_scorer_output(scorer_output, "test_model", "TestMetric") + + self.assertIn("All dictionaries must have the same keys", str(context.exception)) + + def test_process_dict_list_scorer_output_non_dict_items(self): + """Test that _process_dict_list_scorer_output raises error for non-dict items.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with non-dict items + scorer_output = [ + {"score": 0.1, "confidence": 0.9}, + {"score": 0.2, "confidence": 0.8}, + "not_a_dict" # Not a dictionary + ] + + with self.assertRaises(ValueError) as context: + vm_dataset._process_dict_list_scorer_output(scorer_output, "test_model", "TestMetric") + + self.assertIn("All items in list must be dictionaries", str(context.exception)) + + def test_process_list_scorer_output_dict_list(self): + """Test that _process_list_scorer_output correctly handles list of dictionaries.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with valid list of dictionaries + scorer_output = [ + {"score": 0.1, "confidence": 0.9}, + {"score": 0.2, "confidence": 0.8}, + {"score": 0.3, "confidence": 0.7} + ] + + vm_dataset._process_list_scorer_output(scorer_output, "test_model", "TestMetric") + + # Check that columns were added + self.assertTrue("test_model_TestMetric_score" in vm_dataset.df.columns) + self.assertTrue("test_model_TestMetric_confidence" in vm_dataset.df.columns) + + def test_process_list_scorer_output_regular_list(self): + """Test that _process_list_scorer_output correctly handles regular list.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with regular list + scorer_output = [0.1, 0.2, 0.3] + + vm_dataset._process_list_scorer_output(scorer_output, "test_model", "TestMetric") + + # Check that single column was added + self.assertTrue("test_model_TestMetric" in vm_dataset.df.columns) + np.testing.assert_array_equal(vm_dataset.df["test_model_TestMetric"].values, [0.1, 0.2, 0.3]) + + def test_process_list_scorer_output_wrong_length(self): + """Test that _process_list_scorer_output raises error for wrong length.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with wrong length + scorer_output = [0.1, 0.2] # Only 2 items, but dataset has 3 rows + + with self.assertRaises(ValueError) as context: + vm_dataset._process_list_scorer_output(scorer_output, "test_model", "TestMetric") + + self.assertIn("does not match dataset length", str(context.exception)) + + def test_process_and_add_scorer_output_dict_list(self): + """Test that _process_and_add_scorer_output correctly handles list of dictionaries.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with valid list of dictionaries + scorer_output = [ + {"score": 0.1, "confidence": 0.9}, + {"score": 0.2, "confidence": 0.8}, + {"score": 0.3, "confidence": 0.7} + ] + + vm_dataset._process_and_add_scorer_output(scorer_output, "test_model", "TestMetric") + + # Check that columns were added + self.assertTrue("test_model_TestMetric_score" in vm_dataset.df.columns) + self.assertTrue("test_model_TestMetric_confidence" in vm_dataset.df.columns) + + def test_process_and_add_scorer_output_scalar(self): + """Test that _process_and_add_scorer_output correctly handles scalar values.""" + # Create a sample dataset + df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"], "target": [0, 1, 0]}) + vm_dataset = DataFrameDataset(raw_dataset=df, target_column="target") + + # Test with scalar + scorer_output = 0.5 + + vm_dataset._process_and_add_scorer_output(scorer_output, "test_model", "TestMetric") - # Both should be valid F1 scores - self.assertTrue(0 <= lr_f1 <= 1) - self.assertTrue(0 <= rf_f1 <= 1) + # Check that single column was added with repeated values + self.assertTrue("test_model_TestMetric" in vm_dataset.df.columns) + np.testing.assert_array_equal(vm_dataset.df["test_model_TestMetric"].values, [0.5, 0.5, 0.5]) if __name__ == "__main__": diff --git a/tests/test_results.py b/tests/test_results.py index afa1e7dea..a6f4d58e9 100644 --- a/tests/test_results.py +++ b/tests/test_results.py @@ -1,20 +1,18 @@ import asyncio -import json import unittest -from unittest.mock import MagicMock, Mock, patch +from unittest.mock import patch import pandas as pd import matplotlib.pyplot as plt -import plotly.graph_objs as go from ipywidgets import HTML, VBox from validmind.vm_models.result import ( - Result, TestResult, ErrorResult, TextGenerationResult, ResultTable, RawData, ) + from validmind.vm_models.figure import Figure from validmind.errors import InvalidParameterError @@ -22,17 +20,17 @@ class MockAsyncResponse: - def __init__(self, status, text=None, json=None): + def __init__(self, status, text=None, json_data=None): self.status = status self.status_code = status self._text = text - self._json = json + self._json_data = json_data async def text(self): return self._text async def json(self): - return self._json + return self._json_data async def __aexit__(self, exc_type, exc, tb): pass @@ -50,7 +48,7 @@ def run_async(self, func, *args, **kwargs): def test_raw_data_initialization(self): """Test RawData initialization and methods""" - raw_data = RawData(log=True, dataset_duplicates=pd.DataFrame({"col1": [1, 2]})) + raw_data = RawData(log=True, dataset_duplicates=pd.DataFrame({'col1': [1, 2]})) self.assertTrue(raw_data.log) self.assertIsInstance(raw_data.dataset_duplicates, pd.DataFrame) @@ -238,6 +236,81 @@ async def test_metadata_update_content_id_handling(self, mock_update_metadata): content_id="test_description:test_1::ai", text="Test description" ) + def test_test_result_metric_values_integration(self): + """Test metric values integration with TestResult""" + test_result = TestResult(result_id="test_metric_values") + + # Test setting metric with scalar using set_metric + test_result.set_metric(0.85) + self.assertEqual(test_result.metric, 0.85) + self.assertIsNone(test_result.scorer) + self.assertEqual(test_result._get_metric_display_value(), 0.85) + self.assertEqual(test_result._get_metric_serialized_value(), 0.85) + + # Test setting metric with list using set_metric + test_result.set_metric([0.1, 0.2, 0.3]) + self.assertEqual(test_result.scorer, [0.1, 0.2, 0.3]) + self.assertIsNone(test_result.metric) + self.assertEqual(test_result._get_metric_display_value(), [0.1, 0.2, 0.3]) + self.assertEqual(test_result._get_metric_serialized_value(), [0.1, 0.2, 0.3]) + + def test_test_result_metric_type_detection(self): + """Test metric type detection for both metric and scorer fields""" + test_result = TestResult(result_id="test_metric_type") + + # Test unit metric type + test_result.set_metric(42.0) + self.assertEqual(test_result._get_metric_type(), "unit_metric") + + # Test row metric type + test_result.set_metric([1.0, 2.0, 3.0]) + self.assertEqual(test_result._get_metric_type(), "scorer") + + # Test with no metric + test_result.metric = None + test_result.scorer = None + self.assertIsNone(test_result._get_metric_type()) + + def test_test_result_backward_compatibility(self): + """Test backward compatibility with direct metric assignment""" + test_result = TestResult(result_id="test_backward_compat") + + # Direct assignment of raw values (old style) + test_result.metric = 42.0 + self.assertEqual(test_result._get_metric_display_value(), 42.0) + self.assertEqual(test_result._get_metric_serialized_value(), 42.0) + + # Direct assignment of list (old style) + test_result.metric = [1.0, 2.0, 3.0] + self.assertEqual(test_result._get_metric_display_value(), [1.0, 2.0, 3.0]) + self.assertEqual(test_result._get_metric_serialized_value(), [1.0, 2.0, 3.0]) + + # Mixed usage - set with set_metric then access display value + test_result.set_metric(100) + self.assertEqual(test_result.metric, 100) + self.assertEqual(test_result._get_metric_display_value(), 100) + + def test_test_result_metric_values_widget_display(self): + """Test MetricValues display in TestResult widgets""" + # Test scalar metric display + test_result_scalar = TestResult(result_id="test_scalar_widget") + test_result_scalar.set_metric(0.95) + + widget_scalar = test_result_scalar.to_widget() + self.assertIsInstance(widget_scalar, HTML) + # Check that the metric value appears in the HTML + self.assertIn("0.95", widget_scalar.value) + + # Test list metric display + test_result_list = TestResult(result_id="test_list_widget") + test_result_list.set_metric([0.1, 0.2, 0.3]) + + widget_list = test_result_list.to_widget() + # Even with lists, when no tables/figures exist, it returns HTML + self.assertIsInstance(widget_list, HTML) + # Check that the list values appear in the HTML + self.assertIn("[0.1, 0.2, 0.3]", widget_list.value) + if __name__ == "__main__": unittest.main() diff --git a/tests/test_scorer_decorator.py b/tests/test_scorer_decorator.py new file mode 100644 index 000000000..50b8a05e8 --- /dev/null +++ b/tests/test_scorer_decorator.py @@ -0,0 +1,445 @@ +#!/usr/bin/env python3 +""" +Unit tests for the @scorer decorator functionality (merged). + +This module includes two kinds of tests: +1) Integration tests that exercise the real ValidMind imports (skipped if imports fail) +2) Standalone tests that use lightweight mocks and always run + +Coverage: +- Registration (explicit and auto IDs) +- Separation from regular tests +- Metadata (tags, tasks) +- Save function +- Parameter handling +- Path-based ID generation (integration only) +""" + +import unittest +from unittest.mock import patch, MagicMock + +# Real imports for integration tests; may fail in certain dev environments +from validmind.tests.decorator import scorer, _generate_scorer_id_from_path, tags, tasks +from validmind.tests._store import scorer_store, test_store + + +class TestScorerDecorator(unittest.TestCase): + """Integration tests for the @scorer decorator.""" + + def setUp(self): + """Set up test fixtures before each test method.""" + # Clear the scorer store before each test + scorer_store.scorers.clear() + test_store.tests.clear() + + def tearDown(self): + """Clean up after each test method.""" + # Clear the scorer store after each test + scorer_store.scorers.clear() + test_store.tests.clear() + + def test_scorer_with_explicit_id(self): + """Test @scorer decorator with explicit ID.""" + @scorer("validmind.scorer.test.ExplicitScorer") + def explicit_scorer(model, dataset): + """A scorer with explicit ID.""" + return [1.0, 2.0, 3.0] + + # Check that the scorer is registered + registered_scorer = scorer_store.get_scorer("validmind.scorer.test.ExplicitScorer") + self.assertIsNotNone(registered_scorer) + self.assertEqual(registered_scorer, explicit_scorer) + self.assertEqual(explicit_scorer.scorer_id, "validmind.scorer.test.ExplicitScorer") + + def test_scorer_with_empty_parentheses(self): + """Test @scorer() decorator with empty parentheses.""" + @scorer() + def empty_parentheses_scorer(model, dataset): + """A scorer with empty parentheses.""" + return list([4.0, 5.0, 6.0]) + + # Check that the scorer is registered with auto-generated ID + # The ID will be based on the file path since we're in a test file + actual_id = empty_parentheses_scorer.scorer_id + self.assertIsNotNone(actual_id) + self.assertTrue(actual_id.startswith("validmind.scorer")) + + registered_scorer = scorer_store.get_scorer(actual_id) + self.assertIsNotNone(registered_scorer) + self.assertEqual(registered_scorer, empty_parentheses_scorer) + self.assertEqual(empty_parentheses_scorer.scorer_id, actual_id) + + def test_scorer_without_parentheses(self): + """Test @scorer decorator without parentheses.""" + @scorer + def no_parentheses_scorer(model, dataset): + """A scorer without parentheses.""" + return list([7.0, 8.0, 9.0]) + + # Check that the scorer is registered with auto-generated ID + # The ID will be based on the file path since we're in a test file + actual_id = no_parentheses_scorer.scorer_id + self.assertIsNotNone(actual_id) + self.assertTrue(actual_id.startswith("validmind.scorer")) + + registered_scorer = scorer_store.get_scorer(actual_id) + self.assertIsNotNone(registered_scorer) + self.assertEqual(registered_scorer, no_parentheses_scorer) + self.assertEqual(no_parentheses_scorer.scorer_id, actual_id) + + def test_scorer_separation_from_tests(self): + """Test that scorers are stored separately from regular tests.""" + @scorer("validmind.scorer.test.SeparationTest") + def separation_scorer(model, dataset): + """A scorer for separation testing.""" + return list([1.0]) + + # Check that scorer is in scorer store + scorer_in_store = scorer_store.get_scorer("validmind.scorer.test.SeparationTest") + self.assertIsNotNone(scorer_in_store) + self.assertEqual(scorer_in_store, separation_scorer) + + # Check that scorer is NOT in regular test store + test_in_store = test_store.get_test("validmind.scorer.test.SeparationTest") + self.assertIsNone(test_in_store) + + def test_scorer_with_tags_and_tasks(self): + """Test that @scorer decorator works with @tags and @tasks decorators.""" + @scorer("validmind.scorer.test.TaggedScorer") + @tags("test", "scorer", "tagged") + @tasks("classification") + def tagged_scorer(model, dataset): + """A scorer with tags and tasks.""" + return list([1.0]) + + # Check that the scorer is registered + registered_scorer = scorer_store.get_scorer("validmind.scorer.test.TaggedScorer") + self.assertIsNotNone(registered_scorer) + + # Check that tags and tasks are preserved + self.assertTrue(hasattr(tagged_scorer, '__tags__')) + self.assertEqual(tagged_scorer.__tags__, ["test", "scorer", "tagged"]) + + self.assertTrue(hasattr(tagged_scorer, '__tasks__')) + self.assertEqual(tagged_scorer.__tasks__, ["classification"]) + + def test_scorer_save_functionality(self): + """Test that the save functionality is available.""" + @scorer("validmind.scorer.test.SaveTest") + def save_test_scorer(model, dataset): + """A scorer for testing save functionality.""" + return list([1.0]) + + # Check that save function is available + self.assertTrue(hasattr(save_test_scorer, 'save')) + self.assertTrue(callable(save_test_scorer.save)) + + def test_multiple_scorers_registration(self): + """Test that multiple scorers can be registered without conflicts.""" + @scorer("validmind.scorer.test.Multiple1") + def scorer1(model, dataset): + return list([1.0]) + + @scorer("validmind.scorer.test.Multiple2") + def scorer2(model, dataset): + return list([2.0]) + + @scorer("validmind.scorer.test.Multiple3") + def scorer3(model, dataset): + return list([3.0]) + + # Check that all scorers are registered + self.assertIsNotNone(scorer_store.get_scorer("validmind.scorer.test.Multiple1")) + self.assertIsNotNone(scorer_store.get_scorer("validmind.scorer.test.Multiple2")) + self.assertIsNotNone(scorer_store.get_scorer("validmind.scorer.test.Multiple3")) + + # Check that they are different functions + self.assertNotEqual( + scorer_store.get_scorer("validmind.scorer.test.Multiple1"), + scorer_store.get_scorer("validmind.scorer.test.Multiple2") + ) + + def test_scorer_with_parameters(self): + """Test that scorers can have parameters.""" + @scorer("validmind.scorer.test.ParameterScorer") + def parameter_scorer(model, dataset, threshold: float = 0.5, multiplier: int = 2): + """A scorer with parameters.""" + return list([threshold * multiplier]) + + # Check that the scorer is registered + registered_scorer = scorer_store.get_scorer("validmind.scorer.test.ParameterScorer") + self.assertIsNotNone(registered_scorer) + self.assertEqual(registered_scorer, parameter_scorer) + + def test_scorer_docstring_preservation(self): + """Test that docstrings are preserved.""" + @scorer("validmind.scorer.test.DocstringTest") + def docstring_scorer(model, dataset): + """This is a test docstring for the scorer.""" + return list([1.0]) + + # Check that docstring is preserved + self.assertEqual(docstring_scorer.__doc__, "This is a test docstring for the scorer.") + + +class TestScorerIdGeneration(unittest.TestCase): + """Integration tests for automatic scorer ID generation from file paths.""" + + def setUp(self): + """Set up test fixtures.""" + scorer_store.scorers.clear() + + def tearDown(self): + """Clean up after each test.""" + scorer_store.scorers.clear() + + @patch('validmind.tests.decorator.inspect.getfile') + @patch('validmind.tests.decorator.os.path.relpath') + @patch('validmind.tests.decorator.os.path.abspath') + def test_generate_id_from_path_classification(self, mock_abspath, mock_relpath, mock_getfile): + """Test ID generation for classification scorer.""" + # Mock the file path + mock_getfile.return_value = "/path/to/validmind/scorer/classification/BrierScore.py" + mock_abspath.return_value = "/path/to/validmind/scorer" + mock_relpath.return_value = "classification/BrierScore.py" + + def mock_function(): + pass + + scorer_id = _generate_scorer_id_from_path(mock_function) + expected_id = "validmind.scorer.classification.BrierScore" + self.assertEqual(scorer_id, expected_id) + + @patch('validmind.tests.decorator.inspect.getfile') + @patch('validmind.tests.decorator.os.path.relpath') + @patch('validmind.tests.decorator.os.path.abspath') + def test_generate_id_from_path_llm(self, mock_abspath, mock_relpath, mock_getfile): + """Test ID generation for LLM scorer.""" + # Mock the file path + mock_getfile.return_value = "/path/to/validmind/scorer/llm/deepeval/AnswerRelevancy.py" + mock_abspath.return_value = "/path/to/validmind/scorer" + mock_relpath.return_value = "llm/deepeval/AnswerRelevancy.py" + + def mock_function(): + pass + + scorer_id = _generate_scorer_id_from_path(mock_function) + expected_id = "validmind.scorer.llm.deepeval.AnswerRelevancy" + self.assertEqual(scorer_id, expected_id) + + @patch('validmind.tests.decorator.inspect.getfile') + @patch('validmind.tests.decorator.os.path.relpath') + @patch('validmind.tests.decorator.os.path.abspath') + def test_generate_id_from_path_root_scorer(self, mock_abspath, mock_relpath, mock_getfile): + """Test ID generation for scorer in root scorer directory.""" + # Mock the file path + mock_getfile.return_value = "/path/to/validmind/scorer/MyScorer.py" + mock_abspath.return_value = "/path/to/validmind/scorer" + mock_relpath.return_value = "MyScorer.py" + + def mock_function(): + pass + + scorer_id = _generate_scorer_id_from_path(mock_function) + expected_id = "validmind.scorer.MyScorer" + self.assertEqual(scorer_id, expected_id) + + @patch('validmind.tests.decorator.inspect.getfile') + def test_generate_id_fallback_on_error(self, mock_getfile): + """Test ID generation fallback when path detection fails.""" + # Mock getfile to raise an exception + mock_getfile.side_effect = OSError("Cannot determine file path") + + def mock_function(): + pass + + scorer_id = _generate_scorer_id_from_path(mock_function) + expected_id = "validmind.scorer.mock_function" + self.assertEqual(scorer_id, expected_id) + + @patch('validmind.tests.decorator.inspect.getfile') + @patch('validmind.tests.decorator.os.path.relpath') + @patch('validmind.tests.decorator.os.path.abspath') + def test_generate_id_fallback_on_value_error(self, mock_abspath, mock_relpath, mock_getfile): + """Test ID generation fallback when relative path calculation fails.""" + # Mock getfile to return a path outside the scorer directory + mock_getfile.return_value = "/path/to/some/other/directory/MyScorer.py" + mock_abspath.return_value = "/path/to/validmind/scorer" + mock_relpath.side_effect = ValueError("Path not under scorer directory") + + def mock_function(): + pass + + scorer_id = _generate_scorer_id_from_path(mock_function) + expected_id = "validmind.scorer.mock_function" + self.assertEqual(scorer_id, expected_id) + + +class TestScorerIntegration(unittest.TestCase): + """More integration tests for scorer behavior with the broader system.""" + + def setUp(self): + """Set up test fixtures.""" + scorer_store.scorers.clear() + test_store.tests.clear() + + def tearDown(self): + """Clean up after each test.""" + scorer_store.scorers.clear() + test_store.tests.clear() + + def test_scorer_store_singleton(self): + """Test that scorer store is a singleton.""" + from validmind.tests._store import ScorerStore + + store1 = ScorerStore() + store2 = ScorerStore() + + self.assertIs(store1, store2) + + def test_scorer_registration_and_retrieval(self): + """Test complete registration and retrieval cycle.""" + @scorer("validmind.scorer.test.IntegrationTest") + def integration_scorer(model, dataset): + """Integration test scorer.""" + return list([1.0, 2.0, 3.0]) + + # Test registration + self.assertIsNotNone(scorer_store.get_scorer("validmind.scorer.test.IntegrationTest")) + + # Test retrieval + retrieved_scorer = scorer_store.get_scorer("validmind.scorer.test.IntegrationTest") + self.assertEqual(retrieved_scorer, integration_scorer) + + # Test that it's callable + self.assertTrue(callable(retrieved_scorer)) + + def test_scorer_with_mock_model_and_dataset(self): + """Test scorer execution with mock model and dataset.""" + @scorer("validmind.scorer.test.MockExecution") + def mock_execution_scorer(model, dataset): + """Scorer for mock execution testing.""" + return list([1.0, 2.0, 3.0]) + + # Create mock model and dataset + mock_model = MagicMock() + mock_dataset = MagicMock() + + # Execute the scorer + result = mock_execution_scorer(mock_model, mock_dataset) + + # Check result + self.assertIsInstance(result, list) + self.assertEqual(result, [1.0, 2.0, 3.0]) + + +# --------------------------- +# Standalone (mock-based) tests +# --------------------------- + +from typing import Any, Callable, Optional, Union, List # noqa: E402 + + +class _MockList: + def __init__(self, values): + self.values = values + + def __eq__(self, other): + if isinstance(other, list): + return self.values == other + return getattr(other, "values", None) == self.values + + +class _MockScorerStore: + def __init__(self): + self.scorers = {} + + def register_scorer(self, scorer_id: str, scorer: Callable[..., Any]) -> None: + self.scorers[scorer_id] = scorer + + def get_scorer(self, scorer_id: str) -> Optional[Callable[..., Any]]: + return self.scorers.get(scorer_id) + + +class _MockTestStore: + def __init__(self): + self.tests = {} + + def get_test(self, test_id: str) -> Optional[Callable[..., Any]]: + return self.tests.get(test_id) + + +_mock_scorer_store = _MockScorerStore() +_mock_test_store = _MockTestStore() + + +def _mock_scorer(func_or_id: Union[Callable[..., Any], str, None] = None) -> Callable[[Callable[..., Any]], Callable[..., Any]]: + """Lightweight scorer decorator used for mock-based tests.""" + + def _decorator(func: Callable[..., Any]) -> Callable[..., Any]: + if func_or_id is None or func_or_id == "": + scorer_id = f"validmind.scorer.{func.__name__}" + elif isinstance(func_or_id, str): + scorer_id = func_or_id + else: + scorer_id = f"validmind.scorer.{func.__name__}" + + _mock_scorer_store.register_scorer(scorer_id, func) + func.scorer_id = scorer_id + return func + + if callable(func_or_id): + return _decorator(func_or_id) + return _decorator + + +class TestScorerDecoratorEdgeCases(unittest.TestCase): + def setUp(self): + _mock_scorer_store.scorers.clear() + _mock_test_store.tests.clear() + + def tearDown(self): + _mock_scorer_store.scorers.clear() + _mock_test_store.tests.clear() + + def test_scorer_with_empty_string_id(self): + @_mock_scorer("") + def empty_string_scorer(model, dataset): + return _MockList([1.0]) + self.assertEqual(empty_string_scorer.scorer_id, "validmind.scorer.empty_string_scorer") + self.assertIsNotNone(_mock_scorer_store.get_scorer("validmind.scorer.empty_string_scorer")) + + def test_scorer_with_none_id(self): + @_mock_scorer(None) + def none_id_scorer(model, dataset): + return _MockList([1.0]) + self.assertEqual(none_id_scorer.scorer_id, "validmind.scorer.none_id_scorer") + self.assertIsNotNone(_mock_scorer_store.get_scorer("validmind.scorer.none_id_scorer")) + + def test_scorer_with_complex_parameters(self): + @_mock_scorer("validmind.scorer.test.ComplexParams") + def complex_params_scorer( + model, + dataset, + threshold: float = 0.5, + enabled: bool = True, + categories: List[str] = None, + config: dict = None, + ): + if categories is None: + categories = ["A", "B", "C"] + if config is None: + config = {"key": "value"} + return _MockList([threshold, float(enabled), len(categories)]) + + self.assertIsNotNone(_mock_scorer_store.get_scorer("validmind.scorer.test.ComplexParams")) + + def test_scorer_with_no_parameters(self): + @_mock_scorer("validmind.scorer.test.NoParams") + def no_params_scorer(model, dataset): + return _MockList([1.0]) + self.assertIsNotNone(_mock_scorer_store.get_scorer("validmind.scorer.test.NoParams")) + + +if __name__ == '__main__': + unittest.main(verbosity=2) diff --git a/validmind/__init__.py b/validmind/__init__.py index 898872631..45554259d 100644 --- a/validmind/__init__.py +++ b/validmind/__init__.py @@ -48,6 +48,7 @@ except ImportError: ... +from . import scorer from .__version__ import __version__ # noqa: E402 from .api_client import init, log_metric, log_text, reload from .client import ( # noqa: E402 @@ -60,6 +61,7 @@ run_test_suite, ) from .experimental import agents as experimental_agent +from .tests.decorator import scorer as scorer_decorator from .tests.decorator import tags, tasks, test from .tests.run import print_env from .utils import is_notebook, parse_version @@ -128,6 +130,9 @@ def check_version(): "tags", "tasks", "test", + "scorer_decorator", + # scorer module + "scorer", # raw data (for post-processing test results and building tests) "RawData", # submodules diff --git a/validmind/api_client.py b/validmind/api_client.py index 0071b8884..a09abf139 100644 --- a/validmind/api_client.py +++ b/validmind/api_client.py @@ -24,7 +24,7 @@ from .errors import MissingAPICredentialsError, MissingModelIdError, raise_api_error from .logging import get_logger, init_sentry, log_api_operation, send_single_error from .utils import NumpyEncoder, is_html, md_to_html, run_async -from .vm_models import Figure +from .vm_models.figure import Figure logger = get_logger(__name__) @@ -459,11 +459,11 @@ async def alog_metric( if value is None: raise ValueError("Must provide a value for the metric") + # Validate that value is a scalar (int or float) if not isinstance(value, (int, float)): - try: - value = float(value) - except (ValueError, TypeError): - raise ValueError("`value` must be a scalar (int or float)") + raise ValueError( + "Only scalar values (int or float) are allowed for logging metrics." + ) if thresholds is not None and not isinstance(thresholds, dict): raise ValueError("`thresholds` must be a dictionary or None") @@ -492,7 +492,7 @@ async def alog_metric( def log_metric( key: str, - value: float, + value: Union[int, float], inputs: Optional[List[str]] = None, params: Optional[Dict[str, Any]] = None, recorded_at: Optional[str] = None, @@ -502,18 +502,19 @@ def log_metric( """Logs a unit metric. Unit metrics are key-value pairs where the key is the metric name and the value is - a scalar (int or float). These key-value pairs are associated with the currently - selected model (inventory model in the ValidMind Platform) and keys can be logged - to over time to create a history of the metric. On the ValidMind Platform, these metrics - will be used to create plots/visualizations for documentation and dashboards etc. + a scalar (int or float). These key-value pairs are associated + with the currently selected model (inventory model in the ValidMind Platform) and keys + can be logged to over time to create a history of the metric. On the ValidMind Platform, + these metrics will be used to create plots/visualizations for documentation and dashboards etc. Args: key (str): The metric key - value (Union[int, float]): The metric value + value (Union[int, float]): The metric value (scalar) inputs (List[str], optional): List of input IDs params (Dict[str, Any], optional): Parameters used to generate the metric recorded_at (str, optional): Timestamp when the metric was recorded thresholds (Dict[str, Any], optional): Thresholds for the metric + passed (bool, optional): Whether the metric passed validation thresholds """ return run_async( alog_metric, diff --git a/validmind/datasets/llm/__init__.py b/validmind/datasets/llm/__init__.py new file mode 100644 index 000000000..1e5937374 --- /dev/null +++ b/validmind/datasets/llm/__init__.py @@ -0,0 +1,14 @@ +# Copyright © 2023-2024 ValidMind Inc. All rights reserved. +# See the LICENSE file in the root of this repository for details. +# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial + +""" +Entrypoint for LLM datasets. +""" + +from .agent_dataset import LLMAgentDataset + +__all__ = [ + "rag", + "LLMAgentDataset", +] diff --git a/validmind/datasets/llm/agent_dataset.py b/validmind/datasets/llm/agent_dataset.py new file mode 100644 index 000000000..c6dbba5ca --- /dev/null +++ b/validmind/datasets/llm/agent_dataset.py @@ -0,0 +1,459 @@ +# Copyright © 2023-2024 ValidMind Inc. All rights reserved. +# See the LICENSE file in the root of this repository for details. +# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial + +""" +LLM Agent Dataset for integrating with DeepEval evaluation framework. + +This module provides an LLMAgentDataset class that inherits from VMDataset +and enables the use of all DeepEval tests and metrics within the ValidMind library. +""" + +from typing import Any, Dict, List, Optional + +import pandas as pd + +from validmind.logging import get_logger +from validmind.vm_models.dataset import VMDataset + +logger = get_logger(__name__) + +# Optional DeepEval imports with graceful fallback +try: + from deepeval import evaluate + from deepeval.dataset import EvaluationDataset, Golden + from deepeval.metrics import BaseMetric + from deepeval.test_case import LLMTestCase, ToolCall + + DEEPEVAL_AVAILABLE = True +except ImportError: + DEEPEVAL_AVAILABLE = False + LLMTestCase = None + ToolCall = None + EvaluationDataset = None + Golden = None + BaseMetric = None + evaluate = None + + +class LLMAgentDataset(VMDataset): + """ + LLM Agent Dataset for DeepEval integration with ValidMind. + + This dataset class allows you to use all DeepEval tests and metrics + within the ValidMind evaluation framework. It stores LLM interaction data + in a format compatible with both frameworks. + + Attributes: + test_cases (List[LLMTestCase]): List of DeepEval test cases + goldens (List[Golden]): List of DeepEval golden templates + deepeval_dataset (EvaluationDataset): DeepEval dataset instance + + Example: + ```python + # Create from DeepEval test cases + test_cases = [ + LLMTestCase( + input="What is machine learning?", + actual_output="Machine learning is a subset of AI...", + expected_output="ML is a method of data analysis...", + context=["Machine learning context..."] + ) + ] + + dataset = LLMAgentDataset.from_test_cases( + test_cases=test_cases, + input_id="llm_eval_dataset" + ) + + # Run DeepEval metrics + from deepeval.metrics import AnswerRelevancyMetric + results = dataset.evaluate_with_deepeval([AnswerRelevancyMetric()]) + ``` + """ + + def __init__( + self, + input_id: str = None, + test_cases: Optional[List] = None, + goldens: Optional[List] = None, + deepeval_dataset: Optional[Any] = None, + **kwargs, + ): + """ + Initialize LLMAgentDataset. + + Args: + input_id: Identifier for the dataset + test_cases: List of DeepEval LLMTestCase objects + goldens: List of DeepEval Golden objects + deepeval_dataset: DeepEval EvaluationDataset instance + **kwargs: Additional arguments passed to VMDataset + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError( + "DeepEval is required to use LLMAgentDataset. " + "Install it with: pip install deepeval" + ) + + # Store DeepEval objects + self.test_cases = test_cases or [] + self.goldens = goldens or [] + self.deepeval_dataset = deepeval_dataset + + # Convert to pandas DataFrame for VMDataset compatibility + df = self._convert_to_dataframe() + + # Initialize VMDataset with the converted data + super().__init__( + raw_dataset=df.values, + input_id=input_id or "llm_agent_dataset", + columns=df.columns.tolist(), + text_column="input", # The input text for LLM + target_column="expected_output", # Expected response + extra_columns={ + "actual_output": "actual_output", + "context": "context", + "retrieval_context": "retrieval_context", + "tools_called": "tools_called", + "expected_tools": "expected_tools", + }, + **kwargs, + ) + + def _convert_to_dataframe(self) -> pd.DataFrame: + """Convert DeepEval test cases/goldens to pandas DataFrame.""" + data = [] + + # Process test cases + for i, test_case in enumerate(self.test_cases): + row = { + "id": f"test_case_{i}", + "input": test_case.input, + "actual_output": test_case.actual_output, + "expected_output": getattr(test_case, "expected_output", None), + "context": self._serialize_list_field( + getattr(test_case, "context", None) + ), + "retrieval_context": self._serialize_list_field( + getattr(test_case, "retrieval_context", None) + ), + "tools_called": self._serialize_tools_field( + getattr(test_case, "tools_called", None) + ), + "expected_tools": self._serialize_tools_field( + getattr(test_case, "expected_tools", None) + ), + "type": "test_case", + } + data.append(row) + + # Process goldens + for i, golden in enumerate(self.goldens): + row = { + "id": f"golden_{i}", + "input": golden.input, + "actual_output": getattr(golden, "actual_output", None), + "expected_output": getattr(golden, "expected_output", None), + "context": self._serialize_list_field(getattr(golden, "context", None)), + "retrieval_context": self._serialize_list_field( + getattr(golden, "retrieval_context", None) + ), + "tools_called": self._serialize_tools_field( + getattr(golden, "tools_called", None) + ), + "expected_tools": self._serialize_tools_field( + getattr(golden, "expected_tools", None) + ), + "type": "golden", + } + data.append(row) + + if not data: + # Create empty DataFrame with expected columns + data = [ + { + "id": "", + "input": "", + "actual_output": "", + "expected_output": "", + "context": "", + "retrieval_context": "", + "tools_called": "", + "expected_tools": "", + "type": "", + } + ] + + return pd.DataFrame(data) + + def _serialize_list_field(self, field: Optional[List[str]]) -> str: + """Serialize list field to string for DataFrame storage.""" + if field is None: + return "" + return "|".join(str(item) for item in field) + + def _serialize_tools_field(self, tools: Optional[List]) -> str: + """Serialize tools list to string for DataFrame storage.""" + if tools is None: + return "" + tool_strs = [] + for tool in tools: + if hasattr(tool, "name"): + tool_strs.append(tool.name) + else: + tool_strs.append(str(tool)) + return "|".join(tool_strs) + + def _deserialize_list_field(self, field_str: str) -> List[str]: + """Deserialize string back to list.""" + if not field_str: + return [] + return field_str.split("|") + + @classmethod + def from_test_cases( + cls, test_cases: List, input_id: str = "llm_agent_dataset", **kwargs + ) -> "LLMAgentDataset": + """ + Create LLMAgentDataset from DeepEval test cases. + + Args: + test_cases: List of DeepEval LLMTestCase objects + input_id: Dataset identifier + **kwargs: Additional arguments + + Returns: + LLMAgentDataset instance + """ + return cls(input_id=input_id, test_cases=test_cases, **kwargs) + + @classmethod + def from_goldens( + cls, goldens: List, input_id: str = "llm_agent_dataset", **kwargs + ) -> "LLMAgentDataset": + """ + Create LLMAgentDataset from DeepEval goldens. + + Args: + goldens: List of DeepEval Golden objects + input_id: Dataset identifier + **kwargs: Additional arguments + + Returns: + LLMAgentDataset instance + """ + return cls(input_id=input_id, goldens=goldens, **kwargs) + + @classmethod + def from_deepeval_dataset( + cls, deepeval_dataset, input_id: str = "llm_agent_dataset", **kwargs + ) -> "LLMAgentDataset": + """ + Create LLMAgentDataset from DeepEval EvaluationDataset. + + Args: + deepeval_dataset: DeepEval EvaluationDataset instance + input_id: Dataset identifier + **kwargs: Additional arguments + + Returns: + LLMAgentDataset instance + """ + return cls( + input_id=input_id, + test_cases=getattr(deepeval_dataset, "test_cases", []), + goldens=getattr(deepeval_dataset, "goldens", []), + deepeval_dataset=deepeval_dataset, + **kwargs, + ) + + def add_test_case(self, test_case) -> None: + """ + Add a DeepEval test case to the dataset. + + Args: + test_case: DeepEval LLMTestCase instance + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError("DeepEval is required to add test cases") + + self.test_cases.append(test_case) + # Refresh the DataFrame + df = self._convert_to_dataframe() + self._df = df + self.columns = df.columns.tolist() + + def add_golden(self, golden) -> None: + """ + Add a DeepEval golden to the dataset. + + Args: + golden: DeepEval Golden instance + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError("DeepEval is required to add goldens") + + self.goldens.append(golden) + # Refresh the DataFrame + df = self._convert_to_dataframe() + self._df = df + self.columns = df.columns.tolist() + + def convert_goldens_to_test_cases(self, llm_app_function) -> None: + """ + Convert goldens to test cases by generating actual outputs. + + Args: + llm_app_function: Function that takes input and returns LLM output + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError("DeepEval is required for conversion") + + new_test_cases = [] + for golden in self.goldens: + try: + actual_output = llm_app_function(golden.input) + if LLMTestCase is not None: + test_case = LLMTestCase( + input=golden.input, + actual_output=actual_output, + expected_output=getattr(golden, "expected_output", None), + context=getattr(golden, "context", None), + retrieval_context=getattr(golden, "retrieval_context", None), + tools_called=getattr(golden, "tools_called", None), + expected_tools=getattr(golden, "expected_tools", None), + ) + else: + raise ImportError("DeepEval LLMTestCase is not available") + new_test_cases.append(test_case) + except Exception as e: + logger.warning(f"Failed to convert golden to test case: {e}") + continue + + self.test_cases.extend(new_test_cases) + # Refresh the DataFrame + df = self._convert_to_dataframe() + self._df = df + self.columns = df.columns.tolist() + + def evaluate_with_deepeval(self, metrics: List, **kwargs) -> Dict[str, Any]: + """ + Evaluate the dataset using DeepEval metrics. + + Args: + metrics: List of DeepEval metric instances + **kwargs: Additional arguments passed to deepeval.evaluate() + + Returns: + Evaluation results dictionary + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError("DeepEval is required for evaluation") + + if not self.test_cases: + raise ValueError("No test cases available for evaluation") + + try: + # Use DeepEval's evaluate function + if evaluate is not None: + results = evaluate( + test_cases=self.test_cases, metrics=metrics, **kwargs + ) + return results + else: + raise ImportError("DeepEval evaluate function is not available") + except Exception as e: + logger.error(f"DeepEval evaluation failed: {e}") + raise + + def get_deepeval_dataset(self): + """ + Get or create a DeepEval EvaluationDataset instance. + + Returns: + DeepEval EvaluationDataset instance + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError("DeepEval is required to get dataset") + + if self.deepeval_dataset is None: + if EvaluationDataset is not None: + self.deepeval_dataset = EvaluationDataset(goldens=self.goldens) + # Add test cases if available + for test_case in self.test_cases: + self.deepeval_dataset.add_test_case(test_case) + else: + raise ImportError("DeepEval EvaluationDataset is not available") + + return self.deepeval_dataset + + def to_deepeval_test_cases(self) -> List: + """ + Convert dataset rows back to DeepEval test cases. + + Returns: + List of DeepEval LLMTestCase objects + """ + if not DEEPEVAL_AVAILABLE: + raise ImportError("DeepEval is required for conversion") + + test_cases = [] + for _, row in self.df.iterrows(): + # Check if this row has actual output (is a test case) + has_actual_output = ( + pd.notna(row["actual_output"]) + and str(row["actual_output"]).strip() != "" + ) + is_test_case = str(row["type"]) == "test_case" + + if is_test_case or has_actual_output: + if LLMTestCase is not None: + # Safely get context fields + context_val = ( + row["context"] + if pd.notna(row["context"]) and str(row["context"]).strip() + else None + ) + retrieval_context_val = ( + row["retrieval_context"] + if pd.notna(row["retrieval_context"]) + and str(row["retrieval_context"]).strip() + else None + ) + expected_output_val = ( + row["expected_output"] + if pd.notna(row["expected_output"]) + and str(row["expected_output"]).strip() + else None + ) + + test_case = LLMTestCase( + input=str(row["input"]), + actual_output=str(row["actual_output"]) + if pd.notna(row["actual_output"]) + else "", + expected_output=expected_output_val, + context=self._deserialize_list_field(context_val) + if context_val + else None, + retrieval_context=self._deserialize_list_field( + retrieval_context_val + ) + if retrieval_context_val + else None, + # Note: tools_called deserialization would need more complex logic + # for now we'll keep it simple + ) + test_cases.append(test_case) + else: + raise ImportError("DeepEval LLMTestCase is not available") + + return test_cases + + def __repr__(self) -> str: + return ( + f"LLMAgentDataset(input_id='{self.input_id}', " + f"test_cases={len(self.test_cases)}, " + f"goldens={len(self.goldens)})" + ) diff --git a/validmind/scorer/__init__.py b/validmind/scorer/__init__.py new file mode 100644 index 000000000..51032d109 --- /dev/null +++ b/validmind/scorer/__init__.py @@ -0,0 +1,69 @@ +# Copyright © 2023-2024 ValidMind Inc. All rights reserved. +# See the LICENSE file in the root of this repository for details. +# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial + +from validmind.tests._store import test_provider_store +from validmind.tests.decorator import scorer +from validmind.tests.load import describe_test +from validmind.tests.run import run_test + + +def list_scorers(**kwargs): + """List all scorers""" + vm_provider = test_provider_store.get_test_provider("validmind") + vm_scorers_provider = vm_provider.scorers_provider + + prefix = "validmind.scorer." + + return [ + f"{prefix}{test_id}" for test_id in vm_scorers_provider.list_tests(**kwargs) + ] + + +def describe_scorer(scorer_id: str, **kwargs): + """Describe a scorer""" + return describe_test(scorer_id, **kwargs) + + +def run_scorer(scorer_id: str, **kwargs): + """Run a scorer""" + from validmind.tests._store import scorer_store + + # First check if it's a custom scorer in the scorer store + custom_scorer = scorer_store.get_scorer(scorer_id) + if custom_scorer is not None: + # Run the custom scorer directly + from inspect import getdoc + + from validmind.tests.load import _inspect_signature + from validmind.tests.run import _get_test_kwargs, build_test_result + + # Set inputs and params attributes on the scorer function (like load_test does) + if not hasattr(custom_scorer, "inputs") or not hasattr(custom_scorer, "params"): + custom_scorer.inputs, custom_scorer.params = _inspect_signature( + custom_scorer + ) + + input_kwargs, param_kwargs = _get_test_kwargs( + test_func=custom_scorer, + inputs=kwargs.get("inputs", {}), + params=kwargs.get("params", {}), + ) + + raw_result = custom_scorer(**input_kwargs, **param_kwargs) + + return build_test_result( + outputs=raw_result, + test_id=scorer_id, + test_doc=getdoc(custom_scorer), + inputs=input_kwargs, + params=param_kwargs, + title=kwargs.get("title"), + test_func=custom_scorer, + ) + + # Fall back to the test system for built-in scorers + return run_test(scorer_id, **kwargs) + + +__all__ = ["list_scorers", "describe_scorer", "run_scorer", "scorer"] diff --git a/validmind/unit_metrics/classification/individual/AbsoluteError.py b/validmind/scorer/classification/AbsoluteError.py similarity index 96% rename from validmind/unit_metrics/classification/individual/AbsoluteError.py rename to validmind/scorer/classification/AbsoluteError.py index 403e10657..8c31c8b52 100644 --- a/validmind/unit_metrics/classification/individual/AbsoluteError.py +++ b/validmind/scorer/classification/AbsoluteError.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def AbsoluteError(model: VMModel, dataset: VMDataset, **kwargs) -> List[float]: diff --git a/validmind/unit_metrics/classification/individual/BrierScore.py b/validmind/scorer/classification/BrierScore.py similarity index 97% rename from validmind/unit_metrics/classification/individual/BrierScore.py rename to validmind/scorer/classification/BrierScore.py index 279cfa500..d383f87c0 100644 --- a/validmind/unit_metrics/classification/individual/BrierScore.py +++ b/validmind/scorer/classification/BrierScore.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def BrierScore(model: VMModel, dataset: VMDataset, **kwargs) -> List[float]: diff --git a/validmind/unit_metrics/classification/individual/CalibrationError.py b/validmind/scorer/classification/CalibrationError.py similarity index 98% rename from validmind/unit_metrics/classification/individual/CalibrationError.py rename to validmind/scorer/classification/CalibrationError.py index ba05c83fc..411bf63b9 100644 --- a/validmind/unit_metrics/classification/individual/CalibrationError.py +++ b/validmind/scorer/classification/CalibrationError.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def CalibrationError( diff --git a/validmind/unit_metrics/classification/individual/ClassBalance.py b/validmind/scorer/classification/ClassBalance.py similarity index 97% rename from validmind/unit_metrics/classification/individual/ClassBalance.py rename to validmind/scorer/classification/ClassBalance.py index 1c38da453..4058e79b2 100644 --- a/validmind/unit_metrics/classification/individual/ClassBalance.py +++ b/validmind/scorer/classification/ClassBalance.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def ClassBalance(model: VMModel, dataset: VMDataset, **kwargs) -> List[float]: diff --git a/validmind/unit_metrics/classification/individual/Confidence.py b/validmind/scorer/classification/Confidence.py similarity index 97% rename from validmind/unit_metrics/classification/individual/Confidence.py rename to validmind/scorer/classification/Confidence.py index a60394525..e54ef9f94 100644 --- a/validmind/unit_metrics/classification/individual/Confidence.py +++ b/validmind/scorer/classification/Confidence.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def Confidence(model: VMModel, dataset: VMDataset, **kwargs) -> List[float]: diff --git a/validmind/unit_metrics/classification/individual/Correctness.py b/validmind/scorer/classification/Correctness.py similarity index 96% rename from validmind/unit_metrics/classification/individual/Correctness.py rename to validmind/scorer/classification/Correctness.py index 81d45368c..b969007a7 100644 --- a/validmind/unit_metrics/classification/individual/Correctness.py +++ b/validmind/scorer/classification/Correctness.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def Correctness(model: VMModel, dataset: VMDataset, **kwargs) -> List[int]: diff --git a/validmind/unit_metrics/classification/individual/LogLoss.py b/validmind/scorer/classification/LogLoss.py similarity index 97% rename from validmind/unit_metrics/classification/individual/LogLoss.py rename to validmind/scorer/classification/LogLoss.py index 9a9b61a9b..8347e9423 100644 --- a/validmind/unit_metrics/classification/individual/LogLoss.py +++ b/validmind/scorer/classification/LogLoss.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def LogLoss( diff --git a/validmind/scorer/classification/OutlierScore.py b/validmind/scorer/classification/OutlierScore.py new file mode 100644 index 000000000..14685ad57 --- /dev/null +++ b/validmind/scorer/classification/OutlierScore.py @@ -0,0 +1,158 @@ +# Copyright © 2023-2024 ValidMind Inc. All rights reserved. +# See the LICENSE file in the root of this repository for details. +# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial + +from typing import Any, Dict, List + +import numpy as np +from sklearn.ensemble import IsolationForest +from sklearn.preprocessing import StandardScaler + +from validmind import tags, tasks +from validmind.tests.decorator import scorer +from validmind.vm_models import VMDataset + + +@scorer() +@tasks("classification") +@tags("classification", "outlier", "anomaly") +def OutlierScore( + dataset: VMDataset, contamination: float = 0.1, **kwargs +) -> List[Dict[str, Any]]: + """Calculates outlier scores and isolation paths for a classification model. + + Uses Isolation Forest to identify samples that deviate significantly from + the typical patterns in the feature space. Returns both outlier scores and + isolation paths, which provide insights into how anomalous each sample is + and the path length through the isolation forest trees. + + Args: + dataset: The dataset containing feature data + contamination: Expected proportion of outliers, defaults to 0.1 + **kwargs: Additional parameters (unused for compatibility) + + Returns: + List[Dict[str, Any]]: Per-row outlier metrics as a list of dictionaries. + Each dictionary contains: + - "outlier_score": float - Normalized outlier score (0-1, higher = more outlier-like) + - "isolation_path": float - Average path length through isolation forest trees + - "anomaly_score": float - Raw anomaly score from isolation forest + - "is_outlier": bool - Whether the sample is classified as an outlier + + Note: + Outlier scores are normalized to [0, 1] where higher values indicate more outlier-like samples. + Isolation paths represent the average number of splits required to isolate a sample. + """ + # Get feature data + X = dataset.x_df() + + # Handle case where we have no features or only categorical features + if X.empty or X.shape[1] == 0: + # Return zero outlier scores if no features available + return [ + { + "outlier_score": 0.0, + "isolation_path": 0.0, + "anomaly_score": 0.0, + "is_outlier": False, + } + ] * len(dataset.y) + + # Select only numeric features for outlier detection + numeric_features = dataset.feature_columns_numeric + if not numeric_features: + # If no numeric features, return zero outlier scores + return [ + { + "outlier_score": 0.0, + "isolation_path": 0.0, + "anomaly_score": 0.0, + "is_outlier": False, + } + ] * len(dataset.y) + + X_numeric = X[numeric_features] + + # Handle missing values by filling with median + X_filled = X_numeric.fillna(X_numeric.median()) + + # Standardize features for better outlier detection + scaler = StandardScaler() + X_scaled = scaler.fit_transform(X_filled) + + # Fit Isolation Forest + isolation_forest = IsolationForest( + contamination=contamination, random_state=42, n_estimators=100 + ) + + # Fit the model on the data + isolation_forest.fit(X_scaled) + + # Get anomaly scores (negative values for outliers) + anomaly_scores = isolation_forest.decision_function(X_scaled) + + # Get outlier predictions (True for outliers) + outlier_predictions = isolation_forest.predict(X_scaled) == -1 + + # Calculate isolation paths (average path length through trees) + isolation_paths = _calculate_isolation_paths(isolation_forest, X_scaled) + + # Convert to outlier scores (0 to 1, where 1 is most outlier-like) + # Normalize using min-max scaling + min_score = np.min(anomaly_scores) + max_score = np.max(anomaly_scores) + + if max_score == min_score: + # All samples have same score, no outliers detected + outlier_scores = np.zeros_like(anomaly_scores) + else: + # Invert and normalize: higher values = more outlier-like + outlier_scores = (max_score - anomaly_scores) / (max_score - min_score) + + # Create list of dictionaries with all metrics + results = [] + for i in range(len(outlier_scores)): + results.append( + { + "outlier_score": float(outlier_scores[i]), + "isolation_path": float(isolation_paths[i]), + "anomaly_score": float(anomaly_scores[i]), + "is_outlier": bool(outlier_predictions[i]), + } + ) + + return results + + +def _calculate_isolation_paths(isolation_forest, X): + """Calculate average isolation path lengths for each sample.""" + paths = [] + + for sample in X: + # Get path lengths from all trees + sample_paths = [] + for tree in isolation_forest.estimators_: + # Get the path length for this sample in this tree + path_length = _get_path_length(tree, sample.reshape(1, -1)) + sample_paths.append(path_length) + + # Average path length across all trees + avg_path_length = np.mean(sample_paths) + paths.append(avg_path_length) + + return np.array(paths) + + +def _get_path_length(tree, X): + """Get the path length for a sample in a single tree.""" + # This is a simplified version - in practice, you might want to use + # the tree's decision_path method for more accurate path lengths + try: + # Use the tree's decision_path to get the path + path = tree.decision_path(X) + # Count the number of nodes in the path (excluding leaf) + path_length = path.nnz - 1 + return path_length + except Exception: + # Fallback: estimate path length based on tree depth + return tree.get_depth() diff --git a/validmind/unit_metrics/classification/individual/ProbabilityError.py b/validmind/scorer/classification/ProbabilityError.py similarity index 97% rename from validmind/unit_metrics/classification/individual/ProbabilityError.py rename to validmind/scorer/classification/ProbabilityError.py index c96929820..a32a7b9a6 100644 --- a/validmind/unit_metrics/classification/individual/ProbabilityError.py +++ b/validmind/scorer/classification/ProbabilityError.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def ProbabilityError(model: VMModel, dataset: VMDataset, **kwargs) -> List[float]: diff --git a/validmind/unit_metrics/classification/individual/Uncertainty.py b/validmind/scorer/classification/Uncertainty.py similarity index 97% rename from validmind/unit_metrics/classification/individual/Uncertainty.py rename to validmind/scorer/classification/Uncertainty.py index 0d28fbac8..9bbceba6a 100644 --- a/validmind/unit_metrics/classification/individual/Uncertainty.py +++ b/validmind/scorer/classification/Uncertainty.py @@ -7,9 +7,11 @@ import numpy as np from validmind import tags, tasks +from validmind.tests.decorator import scorer from validmind.vm_models import VMDataset, VMModel +@scorer() @tasks("classification") @tags("classification") def Uncertainty(model: VMModel, dataset: VMDataset, **kwargs) -> List[float]: diff --git a/validmind/unit_metrics/classification/individual/__init__.py b/validmind/scorer/classification/__init__.py similarity index 100% rename from validmind/unit_metrics/classification/individual/__init__.py rename to validmind/scorer/classification/__init__.py diff --git a/validmind/scorer/llm/deepeval/AnswerRelevancy.py b/validmind/scorer/llm/deepeval/AnswerRelevancy.py new file mode 100644 index 000000000..86addeb88 --- /dev/null +++ b/validmind/scorer/llm/deepeval/AnswerRelevancy.py @@ -0,0 +1,96 @@ +# Copyright © 2023-2024 ValidMind Inc. All rights reserved. +# See the LICENSE file in the root of this repository for details. +# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial + +from typing import Any, Dict, List + +from validmind import tags, tasks +from validmind.ai.utils import get_client_and_model +from validmind.errors import MissingDependencyError +from validmind.tests.decorator import scorer +from validmind.vm_models.dataset import VMDataset + +try: + from deepeval import evaluate + from deepeval.metrics import AnswerRelevancyMetric + from deepeval.test_case import LLMTestCase +except ImportError as e: + if "deepeval" in str(e): + raise MissingDependencyError( + "Missing required package `deepeval` for AnswerRelevancy. " + "Please run `pip install validmind[llm]` to use LLM tests", + required_dependencies=["deepeval"], + extra="llm", + ) from e + + raise e + + +# Create custom ValidMind tests for DeepEval metrics +@scorer() +@tags("llm", "AnswerRelevancy", "deepeval") +@tasks("llm") +def AnswerRelevancy( + dataset: VMDataset, + threshold: float = 0.8, + input_column: str = "input", + actual_output_column: str = "actual_output", +) -> List[Dict[str, Any]]: + """Calculates answer relevancy scores with explanations for LLM responses. + + This scorer evaluates how relevant an LLM's answer is to the given input question. + It returns a list of dictionaries, where each dictionary contains both the relevancy + score and the reasoning behind the score for each row in the dataset. + + Args: + dataset: The dataset containing input questions and LLM responses + threshold: The threshold for determining relevancy (default: 0.8) + input_column: Name of the column containing input questions (default: "input") + actual_output_column: Name of the column containing LLM responses (default: "actual_output") + + Returns: + List[Dict[str, Any]]: Per-row relevancy scores and reasons as a list of dictionaries. + Each dictionary contains: + - "score": float - The relevancy score (0.0 to 1.0) + - "reason": str - Explanation of why the score was assigned + + Raises: + ValueError: If required columns are not found in the dataset + """ + + # Validate required columns exist in dataset + if input_column not in dataset.df.columns: + raise ValueError( + f"Input column '{input_column}' not found in dataset. Available columns: {dataset.df.columns.tolist()}" + ) + + if actual_output_column not in dataset.df.columns: + raise ValueError( + f"Actual output column '{actual_output_column}' not found in dataset. Available columns: {dataset.df.columns.tolist()}" + ) + + _, model = get_client_and_model() + + metric = AnswerRelevancyMetric( + threshold=threshold, model=model, include_reason=True, verbose_mode=False + ) + results = [] + for _, test_case in dataset.df.iterrows(): + input = test_case["input"] + actual_output = test_case["actual_output"] + + test_case = LLMTestCase( + input=input, + actual_output=actual_output, + ) + result = evaluate(test_cases=[test_case], metrics=[metric]) + + # Extract score and reason from the metric result + metric_data = result.test_results[0].metrics_data[0] + score = metric_data.score + reason = getattr(metric_data, "reason", "No reason provided") + + # Create dictionary with score and reason + results.append({"score": score, "reason": reason}) + + return results diff --git a/validmind/scorer/llm/deepeval/__init__.py b/validmind/scorer/llm/deepeval/__init__.py new file mode 100644 index 000000000..0b0547949 --- /dev/null +++ b/validmind/scorer/llm/deepeval/__init__.py @@ -0,0 +1,7 @@ +# Copyright © 2023-2024 ValidMind Inc. All rights reserved. +# See the LICENSE file in the root of this repository for details. +# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial + +from .AnswerRelevancy import AnswerRelevancy + +__all__ = ["AnswerRelevancy"] diff --git a/validmind/tests/__init__.py b/validmind/tests/__init__.py index 5112a527e..ae4fb385b 100644 --- a/validmind/tests/__init__.py +++ b/validmind/tests/__init__.py @@ -7,7 +7,7 @@ from ..errors import LoadTestError from ..logging import get_logger from ._store import test_provider_store -from .decorator import tags, tasks, test +from .decorator import scorer, tags, tasks, test from .load import ( describe_test, list_tags, @@ -59,6 +59,7 @@ def register_test_provider(namespace: str, test_provider: TestProvider) -> None: "list_tasks_and_tags", # Decorators for functional metrics "test", + "scorer", "tags", "tasks", ] diff --git a/validmind/tests/__types__.py b/validmind/tests/__types__.py index dd919a68b..43c084c42 100644 --- a/validmind/tests/__types__.py +++ b/validmind/tests/__types__.py @@ -207,16 +207,17 @@ "validmind.unit_metrics.classification.Precision", "validmind.unit_metrics.classification.ROC_AUC", "validmind.unit_metrics.classification.Recall", - "validmind.unit_metrics.classification.individual.AbsoluteError", - "validmind.unit_metrics.classification.individual.BrierScore", - "validmind.unit_metrics.classification.individual.CalibrationError", - "validmind.unit_metrics.classification.individual.ClassBalance", - "validmind.unit_metrics.classification.individual.Confidence", - "validmind.unit_metrics.classification.individual.Correctness", - "validmind.unit_metrics.classification.individual.LogLoss", - "validmind.unit_metrics.classification.individual.OutlierScore", - "validmind.unit_metrics.classification.individual.ProbabilityError", - "validmind.unit_metrics.classification.individual.Uncertainty", + "validmind.scorer.classification.AbsoluteError", + "validmind.scorer.classification.BrierScore", + "validmind.scorer.classification.CalibrationError", + "validmind.scorer.classification.ClassBalance", + "validmind.scorer.classification.Confidence", + "validmind.scorer.classification.Correctness", + "validmind.scorer.classification.LogLoss", + "validmind.scorer.classification.OutlierScore", + "validmind.scorer.classification.ProbabilityError", + "validmind.scorer.classification.Uncertainty", + "validmind.scorer.llm.deepeval.AnswerRelevancy", "validmind.unit_metrics.regression.AdjustedRSquaredScore", "validmind.unit_metrics.regression.GiniCoefficient", "validmind.unit_metrics.regression.HuberLoss", diff --git a/validmind/tests/_store.py b/validmind/tests/_store.py index 569094d6f..ae6fb9273 100644 --- a/validmind/tests/_store.py +++ b/validmind/tests/_store.py @@ -90,7 +90,38 @@ def register_test( self.tests[test_id] = test +@singleton +class ScorerStore: + """Singleton class for storing loaded scorers""" + + def __init__(self): + self.scorers = {} + + def get_scorer(self, scorer_id: str) -> Optional[Callable[..., Any]]: + """Get a scorer by scorer ID + + Args: + scorer_id (str): The scorer ID + + Returns: + Optional[Callable[..., Any]]: The scorer function if found, None otherwise + """ + return self.scorers.get(scorer_id) + + def register_scorer( + self, scorer_id: str, scorer: Optional[Callable[..., Any]] = None + ) -> None: + """Register a scorer + + Args: + scorer_id (str): The scorer ID + scorer (Optional[Callable[..., Any]], optional): The scorer function. Defaults to None. + """ + self.scorers[scorer_id] = scorer + + test_store = TestStore() +scorer_store = ScorerStore() test_provider_store = TestProviderStore() # setup built-in test providers diff --git a/validmind/tests/decorator.py b/validmind/tests/decorator.py index 26aa78f90..a7d5e8279 100644 --- a/validmind/tests/decorator.py +++ b/validmind/tests/decorator.py @@ -11,7 +11,7 @@ from validmind.logging import get_logger -from ._store import test_store +from ._store import scorer_store, test_store from .load import load_test logger = get_logger(__name__) @@ -165,3 +165,132 @@ def decorator(func: F) -> F: return func return decorator + + +def scorer(func_or_id: Union[Callable[..., Any], str, None] = None) -> Callable[[F], F]: + """Decorator for creating and registering custom scorers + + This decorator registers the function it wraps as a scorer function within ValidMind + under the provided ID. Once decorated, the function can be run using the + `validmind.scorer.run_scorer` function. + + The scorer ID can be provided in three ways: + 1. Explicit ID: `@scorer("validmind.scorer.classification.BrierScore")` + 2. Auto-generated from path: `@scorer()` - automatically generates ID from file path + 3. Function name only: `@scorer` - uses function name with validmind.scorer prefix + + The function can take two different types of arguments: + + - Inputs: ValidMind model or dataset (or list of models/datasets). These arguments + must use the following names: `model`, `models`, `dataset`, `datasets`. + - Parameters: Any additional keyword arguments of any type (must have a default + value) that can have any name. + + The function should return one of the following types: + + - Table: Either a list of dictionaries or a pandas DataFrame + - Plot: Either a matplotlib figure or a plotly figure + - Scalar: A single number (int or float) + - Boolean: A single boolean value indicating whether the test passed or failed + - List: A list of values (for row-level metrics) or a list of dictionaries with consistent keys + - Any other type: The output will be stored as raw data for use by calling code + + When returning a list of dictionaries: + - All dictionaries must have the same keys + - The list length must match the number of rows in the dataset + - Each dictionary key will become a separate column when using assign_scores + - Column naming follows the pattern: {model_id}_{metric_name}_{dict_key} + + Note: Scorer outputs are not logged to the backend and are intended for use + by other parts of the system (e.g., assign_scores method). + + The function may also include a docstring. This docstring will be used and logged + as the scorer's description. + + Args: + func_or_id (Union[Callable[..., Any], str, None], optional): Either the function to decorate + or the scorer ID. If None or empty string, the ID is auto-generated from the file path. + Defaults to None. + + Returns: + Callable[[F], F]: The decorated function. + """ + + def decorator(func: F) -> F: + # Determine the scorer ID + if func_or_id is None or func_or_id == "": + # Auto-generate ID from file path + scorer_id = _generate_scorer_id_from_path(func) + elif isinstance(func_or_id, str): + scorer_id = func_or_id + else: + # func_or_id is the function itself, auto-generate ID + scorer_id = _generate_scorer_id_from_path(func) + + # Don't call load_test during registration to avoid circular imports + # Just register the function directly in the scorer store + # Scorers should only be stored in the scorer store, not the test store + scorer_store.register_scorer(scorer_id, func) + + # special function to allow the function to be saved to a file + save_func = _get_save_func(func, scorer_id) + + # Add attributes to the function + func.scorer_id = scorer_id + func.save = save_func + func._is_scorer = True # Mark this function as a scorer + + return func + + if callable(func_or_id): + return decorator(func_or_id) + elif func_or_id is None: + # Handle @scorer() case - return decorator that will auto-generate ID + return decorator + + return decorator + + +def _generate_scorer_id_from_path(func: Callable[..., Any]) -> str: + """Generate a scorer ID from the function's file path. + + This function automatically generates a scorer ID based on the file path + where the function is defined, following the same pattern as the test system. + + Args: + func: The function to generate an ID for + + Returns: + str: The generated scorer ID in the format validmind.scorer.path.to.function + """ + import inspect + + try: + # Get the file path of the function + file_path = inspect.getfile(func) + + # Find the scorer directory in the path + scorer_dir = os.path.join(os.path.dirname(__file__), "..", "scorer") + scorer_dir = os.path.abspath(scorer_dir) + + # Get relative path from scorer directory + try: + rel_path = os.path.relpath(file_path, scorer_dir) + except ValueError: + # If file is not under scorer directory, fall back to function name + return f"validmind.scorer.{func.__name__}" + + # Convert path to scorer ID + # Remove .py extension and replace path separators with dots + scorer_path = os.path.splitext(rel_path)[0].replace(os.sep, ".") + + # If the path is just the filename (no subdirectories), use it as is + if scorer_path == func.__name__: + return f"validmind.scorer.{func.__name__}" + + # Otherwise, use the full path + return f"validmind.scorer.{scorer_path}" + + except (OSError, TypeError): + # Fallback to function name if we can't determine the path + return f"validmind.scorer.{func.__name__}" diff --git a/validmind/tests/load.py b/validmind/tests/load.py index f1b1e7b84..04e7088e0 100644 --- a/validmind/tests/load.py +++ b/validmind/tests/load.py @@ -27,8 +27,9 @@ from ..html_templates.content_blocks import test_content_block_html from ..logging import get_logger from ..utils import display, format_dataframe, fuzzy_match, md_to_html, test_id_to_name -from ..vm_models import VMDataset, VMModel +from ..vm_models.dataset.dataset import VMDataset from ..vm_models.figure import Figure +from ..vm_models.model import VMModel from ..vm_models.result import ResultTable from .__types__ import TestID from ._store import test_provider_store, test_store diff --git a/validmind/tests/output.py b/validmind/tests/output.py index 6c428930d..2837de9ca 100644 --- a/validmind/tests/output.py +++ b/validmind/tests/output.py @@ -3,7 +3,7 @@ # SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial from abc import ABC, abstractmethod -from typing import Any, Dict, List, Union +from typing import Any, Callable, Dict, List, Optional, Union from uuid import uuid4 import numpy as np @@ -45,13 +45,7 @@ def process(self, item: Any, result: TestResult) -> None: class MetricOutputHandler(OutputHandler): def can_handle(self, item: Any) -> bool: - # Accept individual numbers - if isinstance(item, (int, float)): - return True - # Accept lists/arrays of numbers for per-row metrics - if isinstance(item, (list, tuple, np.ndarray)): - return all(isinstance(x, (int, float, np.number)) for x in item) - return False + return isinstance(item, (int, float)) def process(self, item: Any, result: TestResult) -> None: if result.metric is not None: @@ -171,7 +165,24 @@ def process(self, item: Any, result: TestResult) -> None: result.description = item -def process_output(item: Any, result: TestResult) -> None: +class ScorerOutputHandler(OutputHandler): + """Handler for scorer outputs that should not be logged to backend""" + + def can_handle(self, item: Any) -> bool: + # This handler is only called when we've already determined it's a scorer + # based on the _is_scorer marker on the test function + return True + + def process(self, item: Any, result: TestResult) -> None: + # For scorers, we just store the raw output without special processing + # The output will be used by the calling code (e.g., assign_scores) + # but won't be logged to the backend + result.raw_data = RawData(scorer_output=item) + + +def process_output( + item: Any, result: TestResult, test_func: Optional[Callable] = None +) -> None: """Process a single test output item and update the TestResult.""" handlers = [ BooleanOutputHandler(), @@ -183,6 +194,15 @@ def process_output(item: Any, result: TestResult) -> None: MetricOutputHandler(), ] + # Check if this is a scorer first by looking for the _is_scorer marker + if test_func and hasattr(test_func, "_is_scorer") and test_func._is_scorer: + # For scorers, handle the output specially + scorer_handler = ScorerOutputHandler() + scorer_handler._result = result + if scorer_handler.can_handle(item): + scorer_handler.process(item, result) + return + for handler in handlers: if handler.can_handle(item): handler.process(item, result) diff --git a/validmind/tests/run.py b/validmind/tests/run.py index 2a32a3a81..5fc7d8145 100644 --- a/validmind/tests/run.py +++ b/validmind/tests/run.py @@ -22,7 +22,7 @@ from .__types__ import TestID from .comparison import combine_results, get_comparison_test_configs -from .load import _test_description, describe_test, load_test +from .load import _test_description from .output import process_output logger = get_logger(__name__) @@ -139,6 +139,7 @@ def build_test_result( inputs: Dict[str, Union[VMInput, List[VMInput]]], params: Union[Dict[str, Any], None], title: Optional[str] = None, + test_func: Optional[Callable] = None, ): """Build a TestResult object from a set of raw test function outputs""" ref_id = str(uuid4()) @@ -150,13 +151,16 @@ def build_test_result( inputs=inputs, params=params if params else None, # None if empty dict or None doc=test_doc, + _is_scorer_result=test_func is not None + and hasattr(test_func, "_is_scorer") + and test_func._is_scorer, ) if not isinstance(outputs, tuple): outputs = (outputs,) for item in outputs: - process_output(item, result) + process_output(item, result, test_func) return result @@ -171,6 +175,8 @@ def _run_composite_test( title: Optional[str] = None, ): """Run a composite test i.e. a test made up of multiple metrics""" + # no-op: _test_description imported at module scope now that circular import is resolved + results = [ run_test( test_id=metric_id, @@ -226,6 +232,8 @@ def _run_comparison_test( ): """Run a comparison test i.e. a test that compares multiple outputs of a test across different input and/or param combinations""" + from .load import describe_test + run_test_configs = get_comparison_test_configs( input_grid=input_grid, param_grid=param_grid, @@ -276,6 +284,8 @@ def _run_test( title: Optional[str] = None, ): """Run a standard test and return a TestResult object""" + from .load import load_test + test_func = load_test(test_id) input_kwargs, param_kwargs = _get_test_kwargs( test_func=test_func, @@ -292,6 +302,7 @@ def _run_test( inputs=input_kwargs, params=param_kwargs, title=title, + test_func=test_func, ) diff --git a/validmind/tests/test_providers.py b/validmind/tests/test_providers.py index 47bf8470e..06c67c139 100644 --- a/validmind/tests/test_providers.py +++ b/validmind/tests/test_providers.py @@ -158,25 +158,32 @@ class ValidMindTestProvider: """Provider for built-in ValidMind tests""" def __init__(self) -> None: - # two subproviders: unit_metrics and normal tests + # three subproviders: unit_metrics, scorers, and normal tests self.unit_metrics_provider = LocalTestProvider( os.path.join(os.path.dirname(__file__), "..", "unit_metrics") ) + self.scorers_provider = LocalTestProvider( + os.path.join(os.path.dirname(__file__), "..", "scorer") + ) self.test_provider = LocalTestProvider(os.path.dirname(__file__)) def list_tests(self) -> List[str]: - """List all tests in the given namespace""" - metric_ids = [ + """List all tests in the given namespace (excludes scorers)""" + unit_metric_ids = [ f"unit_metrics.{test}" for test in self.unit_metrics_provider.list_tests() ] + # Exclude scorers from general test list - they have their own list_scorers() function test_ids = self.test_provider.list_tests() - return metric_ids + test_ids + return unit_metric_ids + test_ids def load_test(self, test_id: str) -> Callable[..., Any]: """Load the test function identified by the given test_id""" - return ( - self.unit_metrics_provider.load_test(test_id.replace("unit_metrics.", "")) - if test_id.startswith("unit_metrics.") - else self.test_provider.load_test(test_id) - ) + if test_id.startswith("unit_metrics."): + return self.unit_metrics_provider.load_test( + test_id.replace("unit_metrics.", "") + ) + elif test_id.startswith("scorer."): + return self.scorers_provider.load_test(test_id.replace("scorer.", "")) + else: + return self.test_provider.load_test(test_id) diff --git a/validmind/unit_metrics/classification/individual/OutlierScore.py b/validmind/unit_metrics/classification/individual/OutlierScore.py deleted file mode 100644 index 1e54fbc38..000000000 --- a/validmind/unit_metrics/classification/individual/OutlierScore.py +++ /dev/null @@ -1,86 +0,0 @@ -# Copyright © 2023-2024 ValidMind Inc. All rights reserved. -# See the LICENSE file in the root of this repository for details. -# SPDX-License-Identifier: AGPL-3.0 AND ValidMind Commercial - -from typing import List - -import numpy as np -from sklearn.ensemble import IsolationForest -from sklearn.preprocessing import StandardScaler - -from validmind import tags, tasks -from validmind.vm_models import VMDataset, VMModel - - -@tasks("classification") -@tags("classification") -def OutlierScore( - model: VMModel, dataset: VMDataset, contamination: float = 0.1, **kwargs -) -> List[float]: - """Calculates the outlier score per row for a classification model. - - Uses Isolation Forest to identify samples that deviate significantly from - the typical patterns in the feature space. Higher scores indicate more - anomalous/outlier-like samples. This can help identify out-of-distribution - samples or data points that might be harder to predict accurately. - - Args: - model: The classification model to evaluate (unused but kept for consistency) - dataset: The dataset containing feature data - contamination: Expected proportion of outliers, defaults to 0.1 - **kwargs: Additional parameters (unused for compatibility) - - Returns: - List[float]: Per-row outlier scores as a list of float values - - Note: - Scores are normalized to [0, 1] where higher values indicate more outlier-like samples - """ - # Get feature data - X = dataset.x_df() - - # Handle case where we have no features or only categorical features - if X.empty or X.shape[1] == 0: - # Return zero outlier scores if no features available - return [0.0] * len(dataset.y) - - # Select only numeric features for outlier detection - numeric_features = dataset.feature_columns_numeric - if not numeric_features: - # If no numeric features, return zero outlier scores - return [0.0] * len(dataset.y) - - X_numeric = X[numeric_features] - - # Handle missing values by filling with median - X_filled = X_numeric.fillna(X_numeric.median()) - - # Standardize features for better outlier detection - scaler = StandardScaler() - X_scaled = scaler.fit_transform(X_filled) - - # Fit Isolation Forest - isolation_forest = IsolationForest( - contamination=contamination, random_state=42, n_estimators=100 - ) - - # Fit the model on the data - isolation_forest.fit(X_scaled) - - # Get anomaly scores (negative values for outliers) - anomaly_scores = isolation_forest.decision_function(X_scaled) - - # Convert to outlier scores (0 to 1, where 1 is most outlier-like) - # Normalize using min-max scaling - min_score = np.min(anomaly_scores) - max_score = np.max(anomaly_scores) - - if max_score == min_score: - # All samples have same score, no outliers detected - outlier_scores = np.zeros_like(anomaly_scores) - else: - # Invert and normalize: higher values = more outlier-like - outlier_scores = (max_score - anomaly_scores) / (max_score - min_score) - - # Return as a list of floats - return outlier_scores.tolist() diff --git a/validmind/vm_models/__init__.py b/validmind/vm_models/__init__.py index 9961db7e0..afa7d1a6d 100644 --- a/validmind/vm_models/__init__.py +++ b/validmind/vm_models/__init__.py @@ -11,8 +11,6 @@ from .input import VMInput from .model import R_MODEL_TYPES, ModelAttributes, VMModel from .result import ResultTable, TestResult -from .test_suite.runner import TestSuiteRunner -from .test_suite.test_suite import TestSuite __all__ = [ "VMInput", @@ -26,3 +24,15 @@ "TestSuite", "TestSuiteRunner", ] + + +def __getattr__(name): # Lazy access to avoid circular imports at module import time + if name == "TestSuite": + from .test_suite.test_suite import TestSuite as _TestSuite + + return _TestSuite + if name == "TestSuiteRunner": + from .test_suite.runner import TestSuiteRunner as _TestSuiteRunner + + return _TestSuiteRunner + raise AttributeError(f"module 'validmind.vm_models' has no attribute {name!r}") diff --git a/validmind/vm_models/dataset/dataset.py b/validmind/vm_models/dataset/dataset.py index 9e597ba19..168094ffe 100644 --- a/validmind/vm_models/dataset/dataset.py +++ b/validmind/vm_models/dataset/dataset.py @@ -460,49 +460,47 @@ def probability_column(self, model: VMModel, column_name: str = None) -> str: def assign_scores( self, - model: VMModel, metrics: Union[str, List[str]], + model: Optional[VMModel] = None, **kwargs: Dict[str, Any], ) -> None: - """Assign computed unit metric scores to the dataset as new columns. + """Assign computed row metric scores to the dataset as new columns. - This method computes unit metrics for the given model and dataset, then adds + This method computes row metrics for the given model and dataset, then adds the computed scores as new columns to the dataset using the naming convention: {model.input_id}_{metric_name} Args: - model (VMModel): The model used to compute the scores. + model (Optional[VMModel]): Optional model used to compute the scores. If provided and + it has a valid `input_id`, that will be used as a prefix for column names. + If not provided (or no `input_id`), columns will be created without a prefix. metrics (Union[str, List[str]]): Single metric ID or list of metric IDs. Can be either: - - Short name (e.g., "F1", "Precision") - - Full metric ID (e.g., "validmind.unit_metrics.classification.F1") - **kwargs: Additional parameters passed to the unit metrics. + - Short name (e.g., "BrierScore", "LogLoss") + - Full metric ID (e.g., "validmind.scorer.classification.BrierScore") + **kwargs: Additional parameters passed to the row metrics. Examples: # Single metric - dataset.assign_scores(model, "F1") + dataset.assign_scores(model, "BrierScore") # Multiple metrics - dataset.assign_scores(model, ["F1", "Precision", "Recall"]) + dataset.assign_scores(model, ["BrierScore", "LogLoss"]) # With parameters - dataset.assign_scores(model, "ROC_AUC", average="weighted") + dataset.assign_scores(model, "ClassBalance", threshold=0.5) Raises: - ValueError: If the model input_id is None or if metric computation fails. - ImportError: If unit_metrics module cannot be imported. + ValueError: If metric computation fails. + ImportError: If scorer module cannot be imported. """ - if model.input_id is None: - raise ValueError("Model input_id must be set to use assign_scores") - - # Import unit_metrics module - try: - from validmind.unit_metrics import run_metric - except ImportError as e: - raise ImportError( - f"Failed to import unit_metrics module: {e}. " - "Make sure validmind.unit_metrics is available." - ) from e + model_input_id = None + if model is not None: + model_input_id = getattr(model, "input_id", None) + if not model_input_id: + logger.warning( + "Model has no input_id; creating score columns without prefix." + ) # Normalize metrics to a list if isinstance(metrics, str): @@ -510,50 +508,258 @@ def assign_scores( # Process each metric for metric in metrics: - # Normalize metric ID - metric_id = self._normalize_metric_id(metric) + self._assign_single_score(metric, model, model_input_id, kwargs) - # Extract metric name for column naming - metric_name = self._extract_metric_name(metric_id) + def _assign_single_score( + self, + metric: str, + model: Optional[VMModel], + model_input_id: Optional[str], + params: Dict[str, Any], + ) -> None: + """Compute and add a single metric's scores as dataset columns.""" + # Import scorer module + try: + from validmind.scorer import run_scorer + except ImportError as e: + raise ImportError( + f"Failed to import scorer module: {e}. " + "Make sure validmind.scorer is available." + ) from e - # Generate column name - column_name = f"{model.input_id}_{metric_name}" + # Normalize metric ID and name + metric_id = self._normalize_metric_id(metric) + metric_name = self._extract_metric_name(metric_id) + column_name = self._build_score_column_name(model_input_id, metric_name) - try: - # Run the unit metric - result = run_metric( - metric_id, - inputs={ - "model": model, - "dataset": self, - }, - params=kwargs, - show=False, # Don't show widget output + try: + inputs = {"dataset": self} + if model is not None: + inputs["model"] = model + result = run_scorer( + metric_id, + inputs=inputs, + params=params, + show=False, + ) + + if result.raw_data and hasattr(result.raw_data, "scorer_output"): + scorer_output = result.raw_data.scorer_output + self._process_and_add_scorer_output( + scorer_output, model_input_id, metric_name ) + else: + column_values = self._process_metric_value(result.metric) + self.add_extra_column(column_name, column_values) - # Extract the metric value - metric_value = result.metric + logger.info(f"Added metric column(s) for '{metric_name}'") + except Exception as e: + logger.error(f"Failed to compute metric {metric_id}: {e}") + raise ValueError(f"Failed to compute metric {metric_id}: {e}") from e - # Create column values (repeat the scalar value for all rows) - if np.isscalar(metric_value): - column_values = np.full(len(self._df), metric_value) - else: - if len(metric_value) != len(self._df): - raise ValueError( - f"Metric value length {len(metric_value)} does not match dataset length {len(self._df)}" - ) - column_values = metric_value + def _process_and_add_scorer_output( + self, scorer_output: Any, model_input_id: Optional[str], metric_name: str + ) -> None: + """Process scorer output and add appropriate columns to the dataset. - # Add the column to the dataset - self.add_extra_column(column_name, column_values) + Args: + scorer_output: The raw scorer output (list, scalar, list of dicts, etc.) + model_input_id: The model input ID for column naming + metric_name: The metric name for column naming + + Raises: + ValueError: If scorer output length doesn't match dataset length or + if list of dictionaries has inconsistent keys + """ + if isinstance(scorer_output, list): + self._process_list_scorer_output(scorer_output, model_input_id, metric_name) + elif np.isscalar(scorer_output): + self._process_scalar_scorer_output( + scorer_output, model_input_id, metric_name + ) + else: + self._process_other_scorer_output( + scorer_output, model_input_id, metric_name + ) + + def _process_list_scorer_output( + self, scorer_output: list, model_input_id: Optional[str], metric_name: str + ) -> None: + """Process list scorer output and add appropriate columns.""" + if len(scorer_output) != len(self._df): + raise ValueError( + f"Scorer output length {len(scorer_output)} does not match dataset length {len(self._df)}" + ) + + if scorer_output and isinstance(scorer_output[0], dict): + self._process_dict_list_scorer_output( + scorer_output, model_input_id, metric_name + ) + else: + self._process_regular_list_scorer_output( + scorer_output, model_input_id, metric_name + ) + + def _process_dict_list_scorer_output( + self, scorer_output: list, model_input_id: Optional[str], metric_name: str + ) -> None: + """Process list of dictionaries scorer output.""" + # Validate that all dictionaries have the same keys + first_keys = set(scorer_output[0].keys()) + for i, item in enumerate(scorer_output): + if not isinstance(item, dict): + raise ValueError( + f"All items in list must be dictionaries, but item at index {i} is {type(item)}" + ) + if set(item.keys()) != first_keys: + raise ValueError( + f"All dictionaries must have the same keys. " + f"First dict has keys {sorted(first_keys)}, " + f"but dict at index {i} has keys {sorted(item.keys())}" + ) + + # Add a column for each key in the dictionaries + for key in first_keys: + column_name = self._build_score_column_name( + model_input_id, metric_name, key + ) + column_values = np.array([item[key] for item in scorer_output]) + self.add_extra_column(column_name, column_values) + logger.info(f"Added metric column '{column_name}'") + + def _process_regular_list_scorer_output( + self, scorer_output: list, model_input_id: Optional[str], metric_name: str + ) -> None: + """Process regular list scorer output.""" + column_name = self._build_score_column_name(model_input_id, metric_name) + column_values = np.array(scorer_output) + self.add_extra_column(column_name, column_values) + logger.info(f"Added metric column '{column_name}'") + + def _process_scalar_scorer_output( + self, scorer_output: Any, model_input_id: Optional[str], metric_name: str + ) -> None: + """Process scalar scorer output.""" + column_name = self._build_score_column_name(model_input_id, metric_name) + column_values = np.full(len(self._df), scorer_output) + self.add_extra_column(column_name, column_values) + logger.info(f"Added metric column '{column_name}'") + + def _process_other_scorer_output( + self, scorer_output: Any, model_input_id: Optional[str], metric_name: str + ) -> None: + """Process other types of scorer output.""" + try: + output_array = np.array(scorer_output) + if len(output_array) != len(self._df): + raise ValueError( + f"Scorer output length {len(output_array)} does not match dataset length {len(self._df)}" + ) + column_name = self._build_score_column_name(model_input_id, metric_name) + self.add_extra_column(column_name, output_array) + logger.info(f"Added metric column '{column_name}'") + except Exception as e: + raise ValueError(f"Could not process scorer output: {e}") from e + + def _build_score_column_name( + self, model_input_id: Optional[str], metric_name: str, key: Optional[str] = None + ) -> str: + """Build a score column name with optional model prefix and optional key suffix. + + Args: + model_input_id: Optional model input_id to prefix the column name. + metric_name: The metric name. + key: Optional sub-key to append (for dict outputs). + + Returns: + str: The constructed column name. + """ + parts: List[str] = [] + if model_input_id: + parts.append(model_input_id) + parts.append(metric_name) + if key: + parts.append(str(key)) + return "_".join(parts) + + def _process_scorer_output(self, scorer_output: Any) -> np.ndarray: + """Process scorer output and return column values for the dataset. + + Args: + scorer_output: The raw scorer output (list, scalar, etc.) + + Returns: + np.ndarray: Column values for the dataset - logger.info(f"Added metric column '{column_name}'") + Raises: + ValueError: If scorer output length doesn't match dataset length + """ + if isinstance(scorer_output, list): + # List output - should be one value per row + if len(scorer_output) != len(self._df): + raise ValueError( + f"Scorer output length {len(scorer_output)} does not match dataset length {len(self._df)}" + ) + return np.array(scorer_output) + elif np.isscalar(scorer_output): + # Scalar output - repeat for all rows + return np.full(len(self._df), scorer_output) + else: + # Other types - try to convert to array + try: + output_array = np.array(scorer_output) + if len(output_array) != len(self._df): + raise ValueError( + f"Scorer output length {len(output_array)} does not match dataset length {len(self._df)}" + ) + return output_array except Exception as e: - logger.error(f"Failed to compute metric {metric_id}: {e}") - raise ValueError(f"Failed to compute metric {metric_id}: {e}") from e + raise ValueError(f"Could not process scorer output: {e}") from e + + def _process_metric_value(self, metric_value: Any) -> np.ndarray: + """Process metric value and return column values for the dataset. + + Args: + metric_value: The metric value to process (could be MetricValues object or raw value) + + Returns: + np.ndarray: Column values for the dataset + + Raises: + ValueError: If metric value length doesn't match dataset length + """ + # Handle None case (some tests don't return metric values) + if metric_value is None: + # Return zeros for all rows as a default + return np.zeros(len(self._df)) + + # Handle different metric value types + if hasattr(metric_value, "get_values"): + # New MetricValues object (UnitMetricValue or RowMetricValues) + values = metric_value.get_values() + if metric_value.is_list(): + # Row metrics - should be one value per row + if len(values) != len(self._df): + raise ValueError( + f"Row metric value length {len(values)} does not match dataset length {len(self._df)}" + ) + return np.array(values) + else: + # Unit metrics - repeat scalar value for all rows + return np.full(len(self._df), values) + elif np.isscalar(metric_value): + # Legacy scalar value - repeat for all rows + return np.full(len(self._df), metric_value) + else: + # Legacy list value - use directly + if len(metric_value) != len(self._df): + raise ValueError( + f"Metric value length {len(metric_value)} does not match dataset length {len(self._df)}" + ) + return np.array(metric_value) def _normalize_metric_id(self, metric: str) -> str: - """Normalize metric identifier to full validmind unit metric ID. + """Normalize metric identifier to full validmind row metric ID. Args: metric (str): Metric identifier (short name or full ID) @@ -562,35 +768,44 @@ def _normalize_metric_id(self, metric: str) -> str: str: Full metric ID """ # If already a full ID, return as-is - if metric.startswith("validmind.unit_metrics."): + if metric.startswith("validmind.scorer."): return metric # Try to find the metric by short name try: - from validmind.unit_metrics import list_metrics + from validmind.scorer import list_scorers + from validmind.tests._store import scorer_store + + # Get built-in scorers + available_metrics = list_scorers() - available_metrics = list_metrics() + # Add custom scorers from scorer store + # Register custom metric if not already in scorer store + if metric not in scorer_store.scorers: + scorer_store.register_scorer(metric) + all_scorers = list(scorer_store.scorers.keys()) + # Find metrics in custom_scorers that aren't already in available_metrics + new_metrics = [m for m in all_scorers if m not in available_metrics] + available_metrics.extend(new_metrics) # Look for exact match with short name for metric_id in available_metrics: - if metric_id.endswith(f".{metric}"): + if metric_id == metric: return metric_id # If no exact match found, raise error with suggestions suggestions = [m for m in available_metrics if metric.lower() in m.lower()] if suggestions: raise ValueError( - f"Metric '{metric}' not found. Did you mean one of: {suggestions[:5]}" + f"Metric '{metric}' not found in scorer. Did you mean one of: {suggestions[:5]}" ) else: raise ValueError( - f"Metric '{metric}' not found. Available metrics: {available_metrics[:10]}..." + f"Metric '{metric}' not found in scorer. Available metrics: {available_metrics[:10]}..." ) except ImportError as e: - raise ImportError( - f"Failed to import unit_metrics for metric lookup: {e}" - ) from e + raise ImportError(f"Failed to import scorer for metric lookup: {e}") from e def _extract_metric_name(self, metric_id: str) -> str: """Extract the metric name from a full metric ID. diff --git a/validmind/vm_models/result/result.py b/validmind/vm_models/result/result.py index db7000902..4b4ee82dd 100644 --- a/validmind/vm_models/result/result.py +++ b/validmind/vm_models/result/result.py @@ -8,7 +8,6 @@ import asyncio import json import os -from abc import abstractmethod from dataclasses import dataclass from typing import Any, Dict, List, Optional, Union from uuid import uuid4 @@ -136,12 +135,10 @@ def __str__(self) -> str: """May be overridden by subclasses.""" return self.__class__.__name__ - @abstractmethod def to_widget(self): """Create an ipywidget representation of the result... Must be overridden by subclasses.""" raise NotImplementedError - @abstractmethod def log(self): """Log the result... Must be overridden by subclasses.""" raise NotImplementedError @@ -178,7 +175,8 @@ class TestResult(Result): title: Optional[str] = None doc: Optional[str] = None description: Optional[Union[str, DescriptionFuture]] = None - metric: Optional[Union[int, float, List[Union[int, float]]]] = None + metric: Optional[Union[int, float]] = None + scorer: Optional[List[Union[int, float]]] = None tables: Optional[List[ResultTable]] = None raw_data: Optional[RawData] = None figures: Optional[List[Figure]] = None @@ -189,6 +187,7 @@ class TestResult(Result): _was_description_generated: bool = False _unsafe: bool = False _client_config_cache: Optional[Any] = None + _is_scorer_result: bool = False def __post_init__(self): if self.ref_id is None: @@ -246,6 +245,67 @@ def _get_flat_inputs(self): return list(inputs.values()) + def set_metric(self, values: Union[int, float, List[Union[int, float]]]) -> None: + """Set the metric value. + Args: + values: The metric values to set. Can be int, float, or List[Union[int, float]]. + """ + if isinstance(values, list): + # Lists should be stored in scorer + self.scorer = values + self.metric = None # Clear metric field when using scorer + else: + # Single values should be stored in metric + self.metric = values + self.scorer = None # Clear scorer field when using metric + + def _get_metric_display_value( + self, + ) -> Union[int, float, List[Union[int, float]], None]: + """Get the metric value for display purposes. + Returns: + The raw metric value, handling both metric and scorer fields. + """ + # Check metric field first + if self.metric is not None: + return self.metric + + # Check scorer field + if self.scorer is not None: + return self.scorer + + return None + + def _get_metric_serialized_value( + self, + ) -> Union[int, float, List[Union[int, float]], None]: + """Get the metric value for API serialization. + Returns: + The serialized metric value, handling both metric and scorer fields. + """ + # Check metric field first + if self.metric is not None: + return self.metric + + # Check scorer field + if self.scorer is not None: + return self.scorer + + return None + + def _get_metric_type(self) -> Optional[str]: + """Get the type of metric being stored. + Returns: + The metric type identifier or None if no metric is set. + """ + if self.metric is not None: + return "unit_metric" + + if self.scorer is not None: + return "scorer" + + return None + def add_table( self, table: Union[ResultTable, pd.DataFrame, List[Dict[str, Any]]], @@ -328,8 +388,15 @@ def remove_figure(self, index: int = 0): self.figures.pop(index) def to_widget(self): - if self.metric is not None and not self.tables and not self.figures: - return HTML(f"

{self.test_name}: {self.metric}

") + metric_display_value = self._get_metric_display_value() + if ( + (self.metric is not None or self.scorer is not None) + and not self.tables + and not self.figures + ): + return HTML( + f"

{self.test_name}: {metric_display_value}

" + ) template_data = { "test_name": self.test_name, @@ -341,7 +408,7 @@ def to_widget(self): else None ), "show_metric": self.metric is not None, - "metric": self.metric, + "metric": metric_display_value, } rendered = get_result_template().render(**template_data) @@ -435,7 +502,7 @@ def _validate_section_id_for_block( def serialize(self): """Serialize the result for the API.""" - return { + serialized = { "test_name": self.result_id, "title": self.title, "ref_id": self.ref_id, @@ -446,6 +513,13 @@ def serialize(self): "metadata": self.metadata, } + # Add metric type information if available + metric_type = self._get_metric_type() + if metric_type: + serialized["metric_type"] = metric_type + + return serialized + async def log_async( self, section_id: str = None, @@ -453,6 +527,10 @@ async def log_async( position: int = None, config: Dict[str, bool] = None, ): + # Skip logging for scorers - they should not be saved to the backend + if self._is_scorer_result: + return + tasks = [] # collect tasks to run in parallel (async) # Default empty dict if None @@ -467,14 +545,20 @@ async def log_async( ) ) - # Only log unit metrics when the metric is a scalar value. - # Some tests may assign a list/array of per-row metrics to `self.metric`. - # Those should not be sent to the unit-metric endpoint which expects scalars. - if self.metric is not None and not hasattr(self.metric, "__len__"): + if self.metric is not None or self.scorer is not None: + # metrics are logged as separate entities + metric_value = self._get_metric_serialized_value() + metric_type = self._get_metric_type() + + # Use appropriate metric key based on type + metric_key = self.result_id + if metric_type == "scorer": + metric_key = f"{self.result_id}_scorer" + tasks.append( api_client.alog_metric( - key=self.result_id, - value=self.metric, + key=metric_key, + value=metric_value, inputs=[input.input_id for input in self._get_flat_inputs()], params=self.params, ) diff --git a/validmind/vm_models/test_suite/test.py b/validmind/vm_models/test_suite/test.py index 76acddbae..2c4687230 100644 --- a/validmind/vm_models/test_suite/test.py +++ b/validmind/vm_models/test_suite/test.py @@ -6,7 +6,6 @@ from ...errors import LoadTestError, should_raise_on_fail_fast from ...logging import get_logger, log_performance -from ...tests.load import load_test from ...tests.run import run_test from ...utils import test_id_to_name from ..result import ErrorResult, Result, TestResult @@ -43,6 +42,8 @@ def __init__(self, test_id_or_obj): def get_default_config(self): """Returns the default configuration for the test.""" + from ...tests.load import load_test + try: test_func = load_test(self.test_id) except LoadTestError as e: