Releases: NVIDIA/NeMo-Guardrails
Release v0.8.1
This minor release mainly focuses on fixing Colang 2.0 parser and runtime issues. It fixes a bug related to logging the prompt for chat models in verbose mode and a small issue in the installation guide. It also adds an example of using streaming with a custom action.
What's Changed
Added
- #377 Add example for streaming from custom action.
Changed
- #380 Update installation guide for OpenAI usage.
- #401 Replace YAML import with new import statement in multi-modal example.
Fixed
- #398 Colang parser fixes and improvements.
- #394 Fixes and improvements for Colang 2.0 runtime.
- #381 Fix typo by @serhatgktp.
- #379 Fix missing prompt in verbose mode for chat models.
- #400 Fix Authorization header showing up in logs for NeMo LLM.
Full Changelog: v0.8.0...v0.8.1
Release v0.8.0
This release adds three main new features:
- A new type of input rail that uses a set of jailbreak heuristics. More heuristics will be added in the future.
- Support for generation options allowing fine-grained control on what types of rails should be triggered, what data should be returned and what logging information should be included in the response.
- Support for making API calls to the guardrails server using multiple configuration ids.
This release also improves the support for working with embeddings (better async support, batching and caching), adds support for stop tokens per task template, and adds streaming support for HuggingFace pipelines. Last but not least, this release includes the core implementation for Colang 2.0 as a preview for early testing (version 0.9.0
will include documentation and examples).
What's Changed
Added
- #292 Jailbreak heuristics by @erickgalinkin.
- #256 Support generation options.
- #307 Added support for multi-config api calls by @makeshn.
- #293 Adds configurable stop tokens by @zmackie.
- #334 Colang 2.0 - Preview by @schuellc.
- #208 Implement cache embeddings (resolves #200) by @Pouyanpi.
- #331 Huggingface pipeline streaming by @trebedea.
Documentation:
- #311 Update documentation to demonstrate the use of output rails when using a custom RAG by @niels-garve.
- #347 Add detailed logging docs by @erickgalinkin.
- #354 Input and output rails only guide by @trebedea.
- #359 Added user guide for jailbreak detection heuristics by @makeshn.
- #363 Add multi-config API call user guide.
- #297 Example configurations for using only the guardrails, without LLM generation.
Changed
- #309 Change the paper citation from ArXiV to EMNLP 2023 by @manuelciosici
- #319 Enable embeddings model caching.
- #267 Make embeddings computing async and add support for batching.
- #281 Follow symlinks when building knowledge base by @piotrm0.
- #280 Add more information to results of
retrieve_relevant_chunks
by @piotrm0. - #332 Update docs for batch embedding computations.
- #244 Docs/edit getting started by @DougAtNvidia.
- #333 Follow-up to PR 244.
- #341 Updated 'fastembed' version to 0.2.2 by @NirantK.
Fixed
- #286 Fixed #285 - using the same evaluation set given a random seed for topical rails by @trebedea.
- #336 Fix #320. Reuse the asyncio loop between sync calls.
- #337 Fix stats gathering in a parallel async setup.
- #342 Fixes OpenAI embeddings support.
- #346 Fix issues with KB embeddings cache, bot intent detection and config ids validator logic.
- #349 Fix multi-config bug, asyncio loop issue and cache folder for embeddings.
- #350 Fix the incorrect logging of an extra dialog rail.
- #358 Fix Openai embeddings async support.
- #362 Fix the issue with the server being pointed to a folder with a single config.
- #352 Fix a few issues related to jailbreak detection heuristics.
- #356 Redo followlinks PR in new code by @piotrm0.
New Contributors
- @manuelciosici made their first contribution in #309
- @erickgalinkin made their first contribution in #292
- @trebedea made their first contribution in #286
- @piotrm0 made their first contribution in #281
- @Pouyanpi made their first contribution in #208
- @niels-garve made their first contribution in #311
- @zmackie made their first contribution in #293
- @DougAtNvidia made their first contribution in #244
- @NirantK made their first contribution in #341
- @makeshn made their first contribution in #359
Full Changelog: v0.7.1...v0.8.0
Release v0.7.1
What's Changed
Full Changelog: v0.7.0...v0.7.1
Release v0.7.0
This release adds three new features: support for Llama Guard, improved LangChain integration, and support for server-side threads. It also adds support for Python 3.11 and solves the issue with pinned dependencies (e.g., langchain>=0.1.0,<2.0
, typer>=0.7.0
). Last but not least, it includes multiple feature and security-related fixes.
What's Changed
Added
- #254 Support for Llama Guard input and output content moderation.
- #253 Support for server-side threads.
- #235 Improved LangChain integration through
RunnableRails
. - #190 Add example for using
generate_events_async
with streaming. - Support for Python 3.11.
Changed
Fixed
- #239 Fixed logging issue where
verbose=true
flag did not trigger expected log output. - #228 Fix docstrings for various functions.
- #242 Fix Azure LLM support.
- #225 Fix annoy import, to allow using without.
- #209 Fix user messages missing from prompt.
- #261 Fix small bug in
print_llm_calls_summary
. - #252 Fixed duplicate loading for the default config.
- Fixed the dependencies pinning, allowing a wider range of dependencies versions.
- Fixed sever security issues related to uncontrolled data used in path expression and information exposure through an exception.
New Contributors
- @spehl-max made their first contribution in #239
- @rajveer43 made their first contribution in #228
- @smartestrobotdai made their first contribution in #242
- @prasoonvarshney made their first contribution in #269
- @eneadodi made their first contribution in #276
- @baggiponte made their first contribution in #240
Full Changelog: v0.6.1...v0.7.0
Release v0.6.1
This patch release upgrades two dependencies (langchain
and httpx
) and replaces the deprecated text-davinci-003
model with gpt-3.5-turbo-instruct
in all configurations and examples.
Added
- Support for
--version
flag in the CLI.
Changed
- Upgraded
langchain
to0.0.352
. - Upgraded
httpx
to0.24.1
. - Replaced deprecated
text-davinci-003
model withgpt-3.5-turbo-instruct
.
Fixed
- #191: Fix chat generation chunk issue.
Release v0.6.0
This release builds on the feedback received over the last few months and brings many improvements and new features. It is also the first beta release for NeMo Guardrails. Equally important, this release is the first to include LLM vulnerability scan results for one of the sample bots.
Release highlights include:
- Better configuration and support for input, output, dialog, retrieval, and execution rails.
- Ability to reduce the overall latency using
single_call
mode orembeddings_only
mode for dialog rails. - Support for streaming.
- First version of the Guardrails Library.
- Fast fact-checking using AlignScore.
- Updated Getting Started guide.
- Docker image for easy deployment.
Detailed changes are included below.
Added
- Support for explicit definition of input/output/retrieval rails.
- Support for custom tasks and their prompts.
- Support for fact-checking using AlignScore.
- Support for NeMo LLM Service as an LLM provider.
- Support for making a single LLM call for both the guardrails process and generating the response (by setting
rails.dialog.single_call.enabled
toTrue
). - Support for sensitive data detection guardrails using Presidio.
- Example using NeMo Guardrails with the LLaMa2-13B model.
- Dockerfile for building a Docker image.
- Support for prompting modes using
prompting_mode
. - Support for TRT-LLM as an LLM provider.
- Support for streaming the LLM responses when no output rails are used.
- Integration of ActiveFence ActiveScore API as an input rail.
- Support for
--prefix
and--auto-reload
in the guardrails server. - Support for loading a configuration from dictionary, i.e.
RailsConfig.from_content(config=...)
. - Guidance on LLM support.
- Support for
LLMRails.explain()
(see the Getting Started guide for sample usage).
Changed
- Allow context data directly in the
/v1/chat/completion
using messages with the type"role"
. - Allow calling a subflow whose name is in a variable, e.g.
do $some_name
. - Allow using actions which are not
async
functions. - Disabled pretty exceptions in CLI.
- Upgraded dependencies.
- Updated the Getting Started Guide.
- Main README now provides more details.
- Merged original examples into a single ABC Bot and removed the original ones.
- Documentation improvements.
Fixed
- Fix going over the maximum prompt length using the
max_length
attribute in Prompt Templates. - Fixed problem with
nest_asyncio
initialization. - #144 Fixed TypeError in logging call.
- #121 Detect chat model using openai engine.
- #109 Fixed minor logging issue.
- Parallel flow support.
- Fix
HuggingFacePipeline
bug related to LangChain version upgrade.
Release v0.5.0
This release adds support for custom embedding search providers (not using Annoy/SentenceTransformers) and support for OpenAI embeddings for the default embedding search provider. This release adds an advanced example for using multiple knowledge bases (i.e., a tabular and regular one). This release also fixes an old issue related to using the generate
method inside an async environment (e.g., a notebook) and includes multiple small fixes. Detailed change log below.
Added
- Support for custom configuration data.
- Example for using custom LLM and multiple KBs
- Support for
PROMPTS_DIR
. - #101 Support for using OpenAI embeddings models in addition to SentenceTransformers.
- First set of end-to-end QA tests for the example configurations.
- Support for configurable embedding search providers
Changed
- Moved to using
nest_asyncio
for implementing the blocking API. Fixes #3 and #32. - Improved event property validation in
new_event_dict
. - Refactored imports to allow installing from source without Annoy/SentenceTransformers (would need a custom embedding search provider to work).
Fixed
Release v0.4.0
This release focused on multiple areas:
- Extending the guardrails interface to support generic events.
- Adding experimental support for running a red teaming process.
- Adding experimental support for
vicuna-7b-v1.3
andmpt-7b-instruct
. - Extending Colang 1.0 with support for bot message instructions and using variables inside bot message definitions.
- Fixing several bugs reported by the community.
Detailed change log below.
Added
- Event-based API for guardrails.
- Support for message with type "event" in
LLMRails.generate_async
. - Support for bot message instructions.
- Support for using variables inside bot message definitions.
- Support for
vicuna-7b-v1.3
andmpt-7b-instruct
. - Topical evaluation results for
vicuna-7b-v1.3
andmpt-7b-instruct
. - Support to use different models for different LLM tasks.
- Support for red-teaming using challenges.
- Support to disable the Chat UI when running the server using
--disable-chat-ui
. - Support for accessing the API request headers in server mode.
- Support to enable CORS settings for the guardrails server.
Changed
- Changed the naming of the internal events to align to the upcoming UMIM spec (Unified Multimodal Interaction Management).
- If there are no user message examples, the bot messages examples lookup is disabled as well.
Fixed
- #58: Fix install on Mac OS 13.
- #55: Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs.
- Fixed the model name initialization for LLMs that use the
model
kwarg. - Fixed the Cohere prompt templates.
- #55: Fix bug related to LangChain callbacks initialization.
- Fixed generation of "..." on value generation.
- Fixed the parameters type conversion when invoking actions from colang (previously everything was string).
- Fixed
model_kwargs
property for theWrapperLLM
. - Fixed bug when
stop
was used inside flows. - Fixed Chat UI bug when an invalid guardrails configuration was used.
Release v0.3.0.
This release focuses on enhancing the support to integrate additional LLMs with NeMo Guardrails. It adds the ability to customize the prompt for various LLMs, including support for completion and chat models. This release adds examples for using the HuggingFace pipeline and inference endpoints. Last but not least, this release provides an initial evaluation of the core prompting technique and some of the rails.
Added
- Support for defining subflows.
- Improved support for customizing LLM prompts
- Support for using filters to change how variables are included in a prompt template.
- Output parsers for prompt templates.
- The
verbose_v1
formatter and output parser to be used for smaller models that don't understand Colang very well in a few-shot manner. - Support for including context variables in prompt templates.
- Support for chat models i.e. prompting with a sequence of messages.
- Experimental support for allowing the LLM to generate multi-step flows.
- Example of using Llama Index from a guardrails configuration (#40).
- Example for using HuggingFace Endpoint LLMs with a guardrails configuration.
- Example for using HuggingFace Pipeline LLMs with a guardrails configuration.
- Support to alter LLM parameters passed as
model_kwargs
in LangChain. - CLI tool for running evaluations on the different steps (e.g., canonical form generation, next steps, bot message) and on existing rails implementation (e.g., moderation, jailbreak, fact-checking, and hallucination).
- Initial evaluation results for
text-davinci-003
andgpt-3.5-turbo
. - The
lowest_temperature
can be set through the guardrails config (to be used for deterministic tasks).
Changed
- The core templates now use Jinja2 as the rendering engines.
- Improved the internal prompting architecture, now using an LLM Task Manager.
Fixed
Release v0.2.0
Update CHANGELOG and setup.py.