Skip to content

feat: Added custom judge support for ai configs#1073

Merged
knfreemLD merged 2 commits intomainfrom
kfreeman/REL-11510/custom-judge-node
Jan 27, 2026
Merged

feat: Added custom judge support for ai configs#1073
knfreemLD merged 2 commits intomainfrom
kfreeman/REL-11510/custom-judge-node

Conversation

@knfreemLD
Copy link
Contributor

@knfreemLD knfreemLD commented Jan 22, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Related issues

Node version of launchdarkly/python-server-sdk-ai#86

Describe the solution you've provided

See launchdarkly/python-server-sdk-ai#86

Describe alternatives you've considered

Provide a clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the pull request here.


Note

Switches judge evaluation to a single metric key while preserving backward compatibility.

  • API/types: Add optional evaluationMetricKey to LDAIJudgeConfig(Default) and deprecate array usage; examples updated in LDAIClient.ts
  • Config utils: Map flag values to prefer evaluationMetricKey and fallback to first valid entry in evaluationMetricKeys; include key when converting defaults
  • Schema: EvaluationSchemaBuilder now builds response schema for one required metric key
  • Judge behavior: Determine metric via _getEvaluationMetricKey; require messages; parse/validate only that key; mark success: false if missing/invalid; updated warnings
  • Client/tests: LDAIClientImpl and Judge tests updated for new key semantics and legacy fallbacks; added tests for invalid/whitespace keys and sampling
  • Tracking: Add tests for trackJudgeResponse handling single/multiple eval metrics

Written by Cursor Bugbot for commit 288ee6d. This will update automatically on new commits. Configure here.

@knfreemLD knfreemLD requested a review from a team as a code owner January 22, 2026 20:17
@github-actions
Copy link
Contributor

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 25394 bytes
Compressed size limit: 26000
Uncompressed size: 124693 bytes

@github-actions
Copy link
Contributor

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 171289 bytes
Compressed size limit: 200000
Uncompressed size: 798441 bytes

@github-actions
Copy link
Contributor

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 23330 bytes
Compressed size limit: 25000
Uncompressed size: 81328 bytes

@github-actions
Copy link
Contributor

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 19322 bytes
Compressed size limit: 20000
Uncompressed size: 99589 bytes

@joker23 joker23 requested review from a team and jsonbailey January 22, 2026 21:23
let { success } = response.metrics;

const evals = this._parseEvaluationResponse(response.data);
const evals = this._parseEvaluationResponse(response.data, evaluationMetricKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't call this out in python and but we used evals originally to support multiple metric keys in a judge response. We can likely flatten this structure out now that there will be only one. Doesn't need to be done in this PR but I think its more of a push to make the breaking change sooner than later so less people are relying on this code. We might consider adding the eval directly to the judge response and marking evals as deprecated.

@knfreemLD knfreemLD merged commit 2066e70 into main Jan 27, 2026
36 checks passed
@knfreemLD knfreemLD deleted the kfreeman/REL-11510/custom-judge-node branch January 27, 2026 03:37
@github-actions github-actions bot mentioned this pull request Jan 27, 2026
jsonbailey pushed a commit that referenced this pull request Jan 27, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>server-sdk-ai: 0.16.0</summary>

##
[0.16.0](server-sdk-ai-v0.15.2...server-sdk-ai-v0.16.0)
(2026-01-27)


### Features

* Added custom judge support for ai configs
([#1073](#1073))
([2066e70](2066e70))
</details>

<details><summary>server-sdk-ai-langchain: 0.4.3</summary>

##
[0.4.3](server-sdk-ai-langchain-v0.4.2...server-sdk-ai-langchain-v0.4.3)
(2026-01-27)


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.15.2 to ^0.16.0
  * peerDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.15.0 to ^0.16.0
</details>

<details><summary>server-sdk-ai-openai: 0.4.3</summary>

##
[0.4.3](server-sdk-ai-openai-v0.4.2...server-sdk-ai-openai-v0.4.3)
(2026-01-27)


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.15.2 to ^0.16.0
  * peerDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.15.0 to ^0.16.0
</details>

<details><summary>server-sdk-ai-vercel: 0.4.3</summary>

##
[0.4.3](server-sdk-ai-vercel-v0.4.2...server-sdk-ai-vercel-v0.4.3)
(2026-01-27)


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.15.2 to ^0.16.0
  * peerDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.15.0 to ^0.16.0
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Releases `@launchdarkly/server-sdk-ai` 0.16.0 and aligns provider
packages and examples to this version.
> 
> - **Feature:** `server-ai` 0.16.0 adds custom judge support for AI
configs
> - Bumps `server-ai-langchain`, `server-ai-openai`, `server-ai-vercel`
to 0.4.3 with deps/peers updated to `@launchdarkly/server-sdk-ai`
^0.16.0
> - Updates example apps and `sdkInfo` to reference 0.16.0; manifest
updated accordingly
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
04e6ea2. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
knfreemLD added a commit to launchdarkly/go-server-sdk that referenced this pull request Feb 10, 2026
**Requirements**

- [X] I have added test coverage for new or changed functionality
- [X] I have followed the repository's [pull request submission
guidelines](../blob/v5/CONTRIBUTING.md#submitting-pull-requests)
- [X] I have validated my changes against all supported platform
versions

**Related issues**

See
https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.5d8l30brvyuw
for context

For other SDK implementations, see:
- launchdarkly/js-core#1073
- launchdarkly/python-server-sdk-ai#86 &
launchdarkly/python-server-sdk-ai#64

**Describe the solution you've provided**

Extending the Go SDK to support AI Config evaluations. This includes
custom evaluator support as well.

This SDK was written with hopes to be congruent with the python and node
implementations. Changes were verified by a local app that was created;
[the resultant data can be observed in the evaluator metrics for this AI
config](https://ld-stg.launchdarkly.com/projects/default/ai-configs/kf-comp-feb-3/monitoring?from_ts=1770094800000&to_ts=1770353999999&env=staging&selected-env=staging&chartTypes=Tokens%2CSatisfaction%2CGenerations%2CTime+to+generate%2CError+rate%2CTime+to+first+token%2CCosts%2CEvaluator+metrics+%28avg%29).

**Describe alternatives you've considered**

Provide a clear and concise description of any alternative solutions or
features you've considered.

**Additional context**

Add any other context about the pull request here.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new evaluation and metric-tracking paths (including dynamic
metric keys and new event payload fields), which could affect analytics
correctness and runtime behavior if misconfigured. Changes are
well-covered by tests but touch core SDK tracking surfaces.
> 
> **Overview**
> Adds **judge-mode support** to AI Configs by extending the config
datamodel and builder with `mode`,
`evaluationMetricKey`/`evaluationMetricKeys`, and `judgeConfiguration`
(with defensive copying to keep configs immutable).
> 
> Introduces `Client.JudgeConfig` to fetch judge configs while
preserving `{{message_history}}` / `{{response_to_evaluate}}`
placeholders for a second Mustache interpolation pass during evaluation,
and adds a new `ldai/judge` package that samples, interpolates, invokes
a structured provider, and parses judge responses.
> 
> Extends `Tracker` with `TrackJudgeResponse` to emit evaluation scores
as metrics (including optional `judgeConfigKey` in event data), and adds
comprehensive tests covering parsing, placeholder preservation, schema
generation, sampling, and response validation.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
41141b9. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments