Skip to content

feat: Add Chat and Judge supporting methods#64

Merged
jsonbailey merged 28 commits intomainfrom
jb/temp-judge
Dec 17, 2025
Merged

feat: Add Chat and Judge supporting methods#64
jsonbailey merged 28 commits intomainfrom
jb/temp-judge

Conversation

@edwinokonkwo
Copy link
Copy Markdown
Contributor

@edwinokonkwo edwinokonkwo commented Nov 18, 2025

Tracking internally: REL-10772

Summary

This PR puts us in a known state , next step is to split the package structure to follow the structure below

packages
- ai-providers
    - server-ai-langchain
- sdk
    - server-ai

@edwinokonkwo edwinokonkwo requested a review from a team as a code owner November 18, 2025 15:18
@edwinokonkwo edwinokonkwo marked this pull request as draft November 19, 2025 07:34
@edwinokonkwo edwinokonkwo changed the title [REL-10772] Implement Langchain provider for online evals feat: [REL-10772] Implement Langchain provider for online evals Nov 19, 2025
Comment thread .github/workflows/release-please.yml Outdated
Comment thread .github/workflows/manual-publish.yml Outdated
@edwinokonkwo edwinokonkwo marked this pull request as ready for review November 19, 2025 17:54
Comment thread .github/workflows/release-please.yml Outdated
Comment thread .github/workflows/release-please.yml Outdated
Comment thread .github/workflows/manual-publish.yml Outdated
Comment thread .github/workflows/manual-publish.yml Outdated
Comment thread packages/core/ldai/client.py
Comment thread packages/core/ldai/client.py
Comment thread packages/core/ldai/client.py
Comment thread packages/core/ldai/client.py
Comment thread ldai/judge/__init__.py
Comment thread ldai/chat/__init__.py Outdated
Comment thread ldai/chat/__init__.py Outdated
Comment thread ldai/providers/types.py Outdated
Comment thread ldai/judge/__init__.py Outdated
Comment thread ldai/providers/ai_provider_factory.py
Comment thread ldai/tracker.py Outdated
Comment thread ldai/tracker.py Outdated
Comment thread ldai/tracker.py Outdated
Comment thread ldai/testing/test_tracker.py Outdated
Comment thread ldai/chat/__init__.py Outdated
Comment thread ldai/models.py
@jsonbailey jsonbailey changed the title feat: [REL-10772] Implement Langchain provider for online evals feat: Add Chat and Judge supporting methods Dec 17, 2025
@jsonbailey jsonbailey merged commit b63dbb5 into main Dec 17, 2025
15 checks passed
@jsonbailey jsonbailey deleted the jb/temp-judge branch December 17, 2025 04:18
edwinokonkwo added a commit that referenced this pull request Dec 18, 2025
Follows on from
#64
Arranging the project structure to align with the other SDK projects
knfreemLD added a commit to launchdarkly/go-server-sdk that referenced this pull request Feb 10, 2026
**Requirements**

- [X] I have added test coverage for new or changed functionality
- [X] I have followed the repository's [pull request submission
guidelines](../blob/v5/CONTRIBUTING.md#submitting-pull-requests)
- [X] I have validated my changes against all supported platform
versions

**Related issues**

See
https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.5d8l30brvyuw
for context

For other SDK implementations, see:
- launchdarkly/js-core#1073
- launchdarkly/python-server-sdk-ai#86 &
launchdarkly/python-server-sdk-ai#64

**Describe the solution you've provided**

Extending the Go SDK to support AI Config evaluations. This includes
custom evaluator support as well.

This SDK was written with hopes to be congruent with the python and node
implementations. Changes were verified by a local app that was created;
[the resultant data can be observed in the evaluator metrics for this AI
config](https://ld-stg.launchdarkly.com/projects/default/ai-configs/kf-comp-feb-3/monitoring?from_ts=1770094800000&to_ts=1770353999999&env=staging&selected-env=staging&chartTypes=Tokens%2CSatisfaction%2CGenerations%2CTime+to+generate%2CError+rate%2CTime+to+first+token%2CCosts%2CEvaluator+metrics+%28avg%29).

**Describe alternatives you've considered**

Provide a clear and concise description of any alternative solutions or
features you've considered.

**Additional context**

Add any other context about the pull request here.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new evaluation and metric-tracking paths (including dynamic
metric keys and new event payload fields), which could affect analytics
correctness and runtime behavior if misconfigured. Changes are
well-covered by tests but touch core SDK tracking surfaces.
> 
> **Overview**
> Adds **judge-mode support** to AI Configs by extending the config
datamodel and builder with `mode`,
`evaluationMetricKey`/`evaluationMetricKeys`, and `judgeConfiguration`
(with defensive copying to keep configs immutable).
> 
> Introduces `Client.JudgeConfig` to fetch judge configs while
preserving `{{message_history}}` / `{{response_to_evaluate}}`
placeholders for a second Mustache interpolation pass during evaluation,
and adds a new `ldai/judge` package that samples, interpolates, invokes
a structured provider, and parses judge responses.
> 
> Extends `Tracker` with `TrackJudgeResponse` to emit evaluation scores
as metrics (including optional `judgeConfigKey` in event data), and adds
comprehensive tests covering parsing, placeholder preservation, schema
generation, sampling, and response validation.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
41141b9. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants