Skip to content

Add public docs page for hosted scorers (preview)#2049

Draft
dmontagu wants to merge 2 commits into
mainfrom
dm/hosted-scorers-docs
Draft

Add public docs page for hosted scorers (preview)#2049
dmontagu wants to merge 2 commits into
mainfrom
dm/hosted-scorers-docs

Conversation

@dmontagu

@dmontagu dmontagu commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What

Adds docs/guides/web-ui/scorers.md — the public docs page for the hosted-scorers preview — plus the mkdocs nav entries (site nav + llmstxt), placed right after Live Evaluations.

The page covers:

  • What scorers are (platform-run LLM-as-judge over live agent runs; counterpart to Pydantic AI online evals, but with no evaluator code and no judge API key of your own).
  • How the select → judge → write-back loop works, and that scores land as gen_ai.evaluation.result OTel log events.
  • Creating a scorer from the agent detail page's Scorers tab (field list matches the live UI form: Score name / Rubric / Sample rate / Enabled).
  • The dry-run-before-enable loop (Dry-run on recent runs; nothing written back, no quota spent).
  • Preview quota: free, hard cap of 10,000 scores per project per month.
  • Transparency note: written-back scores are ordinary ingested telemetry and count toward ingest usage.
  • Viewing scores in Live Evaluations, the trace view, and via SQL in Explore.

Verification

Drafted and then verified end-to-end against a running platform stack with the feature enabled: UI labels and form fields checked against the real Scorers tab (one inaccuracy fixed in the second commit — the form has no separate "Name" field), and the example SQL query was run verbatim and returned real score rows.

Ships together with

  • pydantic/unified-docs#53 — a one-line addition to the Evaluate section's include: allow-list in src/config/libraries.ts. Without it this page never renders on the docs site (mkdocs nav only controls ordering there). This PR must merge first; the include references the page it adds.
  • The platform-side feature is in preview behind the hosted_scorers flag (platform draft PR #25540 and follow-ups). This PR should merge as part of opening the public preview, not before.

https://claude.ai/code/session_01JsVLds2HfEKkcBU9t37H71

dmontagu added 2 commits July 1, 2026 17:15
Document the preview "Scorers" feature — platform-run LLM-judge
evaluators that continuously score an agent's live runs server-side,
with no evaluator code or judge API key of your own.

The page mirrors the internal walkthrough at
`src/walkthroughs/hosted-evaluators/` in the platform repo and covers:
creating a scorer from the agent's Scorers tab (name, rubric, score
name, sample rate, enable toggle), the dry-run → refine → enable loop,
the free 10,000-scores/project/month preview quota, the transparency
note that scores are written back as `gen_ai.evaluation.result`
telemetry and count toward ingest usage, and how to view scores in
Live Evaluations, the trace view, and SQL.

Added to the "Evaluate" nav in `mkdocs.yml` (both the site nav and the
llmstxt sections), alongside the Live Evaluations guide.

Claude-Session: https://claude.ai/code/session_01JsVLds2HfEKkcBU9t37H71
Verified against a running stack: the form exposes Score name / Rubric /
Sample rate / Enabled (no separate "Name" field), sample rate is a
percentage with deterministic sampling, and the judge returns a 0-1 score
with a reason.

Claude-Session: https://claude.ai/code/session_01JsVLds2HfEKkcBU9t37H71
@dmontagu dmontagu self-assigned this Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant