feat: Added custom judge support for ai configs#1073
Conversation
|
@launchdarkly/js-sdk-common size report |
|
@launchdarkly/browser size report |
|
@launchdarkly/js-client-sdk size report |
|
@launchdarkly/js-client-sdk-common size report |
| let { success } = response.metrics; | ||
|
|
||
| const evals = this._parseEvaluationResponse(response.data); | ||
| const evals = this._parseEvaluationResponse(response.data, evaluationMetricKey); |
There was a problem hiding this comment.
I didn't call this out in python and but we used evals originally to support multiple metric keys in a judge response. We can likely flatten this structure out now that there will be only one. Doesn't need to be done in this PR but I think its more of a push to make the breaking change sooner than later so less people are relying on this code. We might consider adding the eval directly to the judge response and marking evals as deprecated.
🤖 I have created a release *beep* *boop* --- <details><summary>server-sdk-ai: 0.16.0</summary> ## [0.16.0](server-sdk-ai-v0.15.2...server-sdk-ai-v0.16.0) (2026-01-27) ### Features * Added custom judge support for ai configs ([#1073](#1073)) ([2066e70](2066e70)) </details> <details><summary>server-sdk-ai-langchain: 0.4.3</summary> ## [0.4.3](server-sdk-ai-langchain-v0.4.2...server-sdk-ai-langchain-v0.4.3) (2026-01-27) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.2 to ^0.16.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.0 to ^0.16.0 </details> <details><summary>server-sdk-ai-openai: 0.4.3</summary> ## [0.4.3](server-sdk-ai-openai-v0.4.2...server-sdk-ai-openai-v0.4.3) (2026-01-27) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.2 to ^0.16.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.0 to ^0.16.0 </details> <details><summary>server-sdk-ai-vercel: 0.4.3</summary> ## [0.4.3](server-sdk-ai-vercel-v0.4.2...server-sdk-ai-vercel-v0.4.3) (2026-01-27) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.2 to ^0.16.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.15.0 to ^0.16.0 </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Releases `@launchdarkly/server-sdk-ai` 0.16.0 and aligns provider packages and examples to this version. > > - **Feature:** `server-ai` 0.16.0 adds custom judge support for AI configs > - Bumps `server-ai-langchain`, `server-ai-openai`, `server-ai-vercel` to 0.4.3 with deps/peers updated to `@launchdarkly/server-sdk-ai` ^0.16.0 > - Updates example apps and `sdkInfo` to reference 0.16.0; manifest updated accordingly > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 04e6ea2. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
**Requirements** - [X] I have added test coverage for new or changed functionality - [X] I have followed the repository's [pull request submission guidelines](../blob/v5/CONTRIBUTING.md#submitting-pull-requests) - [X] I have validated my changes against all supported platform versions **Related issues** See https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.5d8l30brvyuw for context For other SDK implementations, see: - launchdarkly/js-core#1073 - launchdarkly/python-server-sdk-ai#86 & launchdarkly/python-server-sdk-ai#64 **Describe the solution you've provided** Extending the Go SDK to support AI Config evaluations. This includes custom evaluator support as well. This SDK was written with hopes to be congruent with the python and node implementations. Changes were verified by a local app that was created; [the resultant data can be observed in the evaluator metrics for this AI config](https://ld-stg.launchdarkly.com/projects/default/ai-configs/kf-comp-feb-3/monitoring?from_ts=1770094800000&to_ts=1770353999999&env=staging&selected-env=staging&chartTypes=Tokens%2CSatisfaction%2CGenerations%2CTime+to+generate%2CError+rate%2CTime+to+first+token%2CCosts%2CEvaluator+metrics+%28avg%29). **Describe alternatives you've considered** Provide a clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context about the pull request here. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Adds new evaluation and metric-tracking paths (including dynamic metric keys and new event payload fields), which could affect analytics correctness and runtime behavior if misconfigured. Changes are well-covered by tests but touch core SDK tracking surfaces. > > **Overview** > Adds **judge-mode support** to AI Configs by extending the config datamodel and builder with `mode`, `evaluationMetricKey`/`evaluationMetricKeys`, and `judgeConfiguration` (with defensive copying to keep configs immutable). > > Introduces `Client.JudgeConfig` to fetch judge configs while preserving `{{message_history}}` / `{{response_to_evaluate}}` placeholders for a second Mustache interpolation pass during evaluation, and adds a new `ldai/judge` package that samples, interpolates, invokes a structured provider, and parses judge responses. > > Extends `Tracker` with `TrackJudgeResponse` to emit evaluation scores as metrics (including optional `judgeConfigKey` in event data), and adds comprehensive tests covering parsing, placeholder preservation, schema generation, sampling, and response validation. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 41141b9. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
Requirements
Related issues
Node version of launchdarkly/python-server-sdk-ai#86
Describe the solution you've provided
See launchdarkly/python-server-sdk-ai#86
Describe alternatives you've considered
Provide a clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the pull request here.
Note
Switches judge evaluation to a single metric key while preserving backward compatibility.
evaluationMetricKeytoLDAIJudgeConfig(Default)and deprecate array usage; examples updated inLDAIClient.tsevaluationMetricKeyand fallback to first valid entry inevaluationMetricKeys; include key when converting defaultsEvaluationSchemaBuildernow builds response schema for one required metric key_getEvaluationMetricKey; require messages; parse/validate only that key; marksuccess: falseif missing/invalid; updated warningsLDAIClientImplandJudgetests updated for new key semantics and legacy fallbacks; added tests for invalid/whitespace keys and samplingtrackJudgeResponsehandling single/multiple eval metricsWritten by Cursor Bugbot for commit 288ee6d. This will update automatically on new commits. Configure here.