feat: Add Chat and Judge supporting methods by edwinokonkwo · Pull Request #64 · launchdarkly/python-server-sdk-ai

edwinokonkwo · 2025-11-18T15:18:56Z

Tracking internally: REL-10772

Summary

This PR puts us in a known state , next step is to split the package structure to follow the structure below

packages
- ai-providers
    - server-ai-langchain
- sdk
    - server-ai

Follows on from #64 Arranging the project structure to align with the other SDK projects

**Requirements** - [X] I have added test coverage for new or changed functionality - [X] I have followed the repository's [pull request submission guidelines](../blob/v5/CONTRIBUTING.md#submitting-pull-requests) - [X] I have validated my changes against all supported platform versions **Related issues** See https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.5d8l30brvyuw for context For other SDK implementations, see: - launchdarkly/js-core#1073 - launchdarkly/python-server-sdk-ai#86 & launchdarkly/python-server-sdk-ai#64 **Describe the solution you've provided** Extending the Go SDK to support AI Config evaluations. This includes custom evaluator support as well. This SDK was written with hopes to be congruent with the python and node implementations. Changes were verified by a local app that was created; [the resultant data can be observed in the evaluator metrics for this AI config](https://ld-stg.launchdarkly.com/projects/default/ai-configs/kf-comp-feb-3/monitoring?from_ts=1770094800000&to_ts=1770353999999&env=staging&selected-env=staging&chartTypes=Tokens%2CSatisfaction%2CGenerations%2CTime+to+generate%2CError+rate%2CTime+to+first+token%2CCosts%2CEvaluator+metrics+%28avg%29). **Describe alternatives you've considered** Provide a clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context about the pull request here.  --- > [!NOTE] > **Medium Risk** > Adds new evaluation and metric-tracking paths (including dynamic metric keys and new event payload fields), which could affect analytics correctness and runtime behavior if misconfigured. Changes are well-covered by tests but touch core SDK tracking surfaces. > > **Overview** > Adds **judge-mode support** to AI Configs by extending the config datamodel and builder with `mode`, `evaluationMetricKey`/`evaluationMetricKeys`, and `judgeConfiguration` (with defensive copying to keep configs immutable). > > Introduces `Client.JudgeConfig` to fetch judge configs while preserving `{{message_history}}` / `{{response_to_evaluate}}` placeholders for a second Mustache interpolation pass during evaluation, and adds a new `ldai/judge` package that samples, interpolates, invokes a structured provider, and parses judge responses. > > Extends `Tracker` with `TrackJudgeResponse` to emit evaluation scores as metrics (including optional `judgeConfigKey` in event data), and adds comprehensive tests covering parsing, placeholder preservation, schema generation, sampling, and response validation. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 41141b9. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>

jsonbailey added 13 commits November 7, 2025 21:05

move dataclass into models

5f924ab

create new config types completion, agent, and judges

951eda1

use inheritance for configs for consistency

ae7516b

added deprecations for old types

0d933d2

create the ai provider interface and factory

8271807

create a langchain implementation of the ai provider

6ee62b4

Add Judge and evaluation metric tracking

231ae2e

Add Chat implementation

445ab8c

Set a default for evaluation metircs

5446222

add the logger

bc46608

adjust langchain import

fd0aff4

fix structure response

c3c939f

judge respose should be async

125bb66

edwinokonkwo requested a review from a team as a code owner November 18, 2025 15:18

edwinokonkwo marked this pull request as draft November 19, 2025 07:34

edwinokonkwo changed the title ~~[REL-10772] Implement Langchain provider for online evals~~ feat: [REL-10772] Implement Langchain provider for online evals Nov 19, 2025