Add safety and content moderation with open LMs notebook #215

anakin87 · 2025-06-25T14:01:25Z

Fixes #214

I am proposing a notebook that shows how to use the new LLMMessagesRouter (available from 2.15.0) to perform safety and content moderation with different open models: Llama Guard, IBM Granite Guardian, ShieldGemma, and NVIDIA NeMo Guard.
It also includes an example of content moderation in a RAG pipeline.

review-notebook-app · 2025-06-25T14:01:30Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

bilgeyucel · 2025-06-30T10:51:12Z

notebooks/safety_moderation_open_lms.ipynb

@@ -0,0 +1,1198 @@
+{


My suggestion for the title and description here:

AI Guardrails: Content Moderation and Safety with Open-Source Language Models
Deploying safe and responsible AI applications requires robust guardrails to detect and handle harmful, biased, or inappropriate content. In response to this need, several open source language models have been specifically trained for content moderation, toxicity detection, and safety-related tasks.

Unlike traditional classifiers that return probabilities for predefined labels, generative models can produce natural language outputs even when used for classification, making them more adaptable for real-world moderation scenarios. To support these use cases in Haystack, we've introduced the LLMMessagesRouter, a component that intelligently routes chat messages based on safety classifications provided by a generative language model.

In this notebook, you’ll learn how to implement AI safety mechanisms using leading open source generative models like Llama Guard (Meta), Granite Guardian (IBM), ShieldGemma (Google), and NeMo Guardrails (NVIDIA). You'll also see how to integrate content moderation into your Haystack RAG pipeline, enabling safer and more trustworthy LLM-powered applications.

Reply via ReviewNB

bilgeyucel · 2025-06-30T10:51:12Z

notebooks/safety_moderation_open_lms.ipynb

@@ -0,0 +1,1198 @@
+{


...run Ollama for some open source models.

Reply via ReviewNB

bilgeyucel · 2025-06-30T10:51:12Z

notebooks/safety_moderation_open_lms.ipynb

@@ -0,0 +1,1198 @@
+{


typo: classify the safety of the user input.
remove responds with: Llama Guard 4 model card shows that it responds with safe or unsafe

Reply via ReviewNB

bilgeyucel · 2025-06-30T10:51:13Z

notebooks/safety_moderation_open_lms.ipynb

@@ -0,0 +1,1198 @@
+{


Would this approach work if malicious information is somehow retrieved from the database?

Reply via ReviewNB

Yes, in theory.
Because the text passed to the Router includes Documents coming from the Retriever.

bilgeyucel

Left my comments @anakin87!

bilgeyucel · 2025-06-30T11:46:08Z

index.toml

+new = true
+
+[[cookbook]]
+title = "Safety and content moderation with Open Language Models"


I suggest putting "guardrails" into the title

bilgeyucel · 2025-06-30T11:46:28Z

index.toml

+[[cookbook]]
+title = "Safety and content moderation with Open Language Models"
+notebook = "safety_moderation_open_lms.ipynb"
+topics = ["Safety", "Evaluation", "RAG"]


Also, as a topic:

Suggested change

topics = ["Safety", "Evaluation", "RAG"]

topics = ["Guardrails", "Evaluation", "RAG"]

Add safety and content moderation with open LMs notebook

8021b78

anakin87 marked this pull request as ready for review June 25, 2025 14:03

anakin87 requested a review from a team as a code owner June 25, 2025 14:03

anakin87 requested a review from bilgeyucel June 25, 2025 14:03

bilgeyucel self-assigned this Jun 25, 2025

bilgeyucel reviewed Jun 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add safety and content moderation with open LMs notebook #215

Add safety and content moderation with open LMs notebook #215

anakin87 commented Jun 25, 2025

Uh oh!

review-notebook-app bot commented Jun 25, 2025

Uh oh!

bilgeyucel Jun 30, 2025 •

edited

Loading

Uh oh!

bilgeyucel Jun 30, 2025 •

edited

Loading

Uh oh!

bilgeyucel Jun 30, 2025 •

edited

Loading

Uh oh!

bilgeyucel Jun 30, 2025 •

edited

Loading

Uh oh!

anakin87 Jun 30, 2025

Uh oh!

bilgeyucel left a comment

Uh oh!

bilgeyucel Jun 30, 2025

Uh oh!

bilgeyucel Jun 30, 2025

Uh oh!

Uh oh!

	topics = ["Safety", "Evaluation", "RAG"]
	topics = ["Guardrails", "Evaluation", "RAG"]

Add safety and content moderation with open LMs notebook #215

Are you sure you want to change the base?

Add safety and content moderation with open LMs notebook #215

Conversation

anakin87 commented Jun 25, 2025

Uh oh!

review-notebook-app bot commented Jun 25, 2025

Uh oh!

bilgeyucel Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AI Guardrails: Content Moderation and Safety with Open-Source Language Models

Uh oh!

bilgeyucel Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bilgeyucel Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bilgeyucel Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anakin87 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

bilgeyucel left a comment

Choose a reason for hiding this comment

Uh oh!

bilgeyucel Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

bilgeyucel Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bilgeyucel Jun 30, 2025 •

edited

Loading

bilgeyucel Jun 30, 2025 •

edited

Loading

bilgeyucel Jun 30, 2025 •

edited

Loading

bilgeyucel Jun 30, 2025 •

edited

Loading