feat(guardrails): add Cleanlab TLM hallucination/trustworthiness guardrail#170
Draft
dni138 wants to merge 1 commit into
Draft
feat(guardrails): add Cleanlab TLM hallucination/trustworthiness guardrail#170dni138 wants to merge 1 commit into
dni138 wants to merge 1 commit into
Conversation
…drail Wraps Cleanlab's Trustworthy Language Model (TLM) as a hosted scoring guardrail for detecting hallucinations and low-confidence answers in RAG/agent pipelines. Inherits from Guardrail (not ThreeStageGuardrail) because validate() takes (prompt, response) instead of the standard (input_text) signature. Output semantics are inverted: higher trustworthiness score means more valid. Adds a new optional extra `cleanlab-tlm` (also pulled in by `all`). Unit tests mock the `cleanlab_tlm.TLM` client so no real API calls are made; integration testing is deferred until an API key is provisioned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
CleanlabTlmguardrail wrapping Cleanlab's Trustworthy Language Model — a hosted scoring service that returns a trustworthiness score for any (prompt, response) pair. Designed for hallucination detection in RAG and agent pipelines.Research backing
Why commercial-only
TLM combines black-box intrinsic + extrinsic uncertainty signals over arbitrary LLM APIs as a hosted scoring service. No OSS package replicates the full pipeline (the underlying BSDetector method is published, but the production scoring service with its calibration and multi-signal fusion is closed-source).
Access
CLEANLAB_TLM_API_KEY.Triage notes
pip install 'any-guardrail[cleanlab-tlm]'.threshold=0.7.Guardrail(notThreeStageGuardrail) becausevalidate(prompt, response)doesn't match the standard(input_text)signature.Test plan
tests/integration/with@pytest.mark.e2eusing a known hallucinated response🤖 Generated with Claude Code