bug: NeMo-Guardrails responses apparently breaking/terminating on line breaks with different models #936

ta-dr0aid · 2025-01-17T08:27:56Z

Did you check docs and existing issues?

I have read all the NeMo-Guardrails docs
I have updated the package to the latest version before submitting this issue
(optional) I have used the develop branch
I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

Python 3.11.9

Operating system/version

Windows 11 23H2

NeMo-Guardrails version (if you must use a specific version and not the latest

latest pip install; latest develop branch

Describe the bug

I'm running into issues for LLM responses that are formatted in a specific way. In this particular case, I was able to narrow it down to apparently linebreaks being present in the LLM response.

The issue was first found on Azure GPT-4o instances, but also spread over to llama3.1 models hosted locally via ollama. I cannot say if this is present for other llama3.x models too. For both situations, at the point of finding, the installation was outdated (commit-id 3265b39, 12.12.2024).
Both versions have been updated to the latest beta branch, leading to the same result on Azure and ollama. On their own (without NeMo), both endpoints work expectedly with the same question. Therefore, I suspect that the issue is within NeMo-Guardrails.

I've added a log of potential outputs with the 12.12. llama3 version which I'm still seeing in a similar form with the latest development branch on Azure instances.
Output_one_paragraph.txt (this seems fine)
Output_with_paragraph.txt (this breaks)

Edit: Tested with develop with Azure and ollama - same result.

Steps To Reproduce

steps to reproduce.txt
config_test_6.zip
I've cloned the NeMo-Guardrails repo (which is present in the NeMo-Guardrails folder) and added the folder local_files on its root level, with the other paths as in the outputs/logs.

For the Azure instances, I use the following config instead:

colang_version: "2.x"

models:
  - type: main
    engine: azure
    model: gpt-4o
    parameters:
      azure_endpoint: SNIPPED
      deployment_name: gpt-4o
      api_version: 2023-06-01-preview
      api_key: SNIPPED (via system environment variables)

Expected Behavior

For the linebreak versions, I'd expect an output similar to this in white text on green ground as seen in the output_with_paragraph file (there, it was printed as - if I understand correctly - preliminary answer with black text on green ground):

"Let's break down the differences between the RTX 3090 and RTX 4090.                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                    
**Price per Performance:** The RTX 4090 is a more recent release, and as such, it comes with a higher price tag. However, in terms of price per performance, the RTX 4090 offers significant improvements over its predecessor. With a starting price around $1,599, the RTX 4090 provides approximately 30-40%     
better performance than the RTX 3090 at similar power consumption levels. This means that for the same amount of money, you can get more performance out of the RTX 4090 compared to the RTX 3090.                                                                                                                  
                                                                                                                                                                                                                                                                                                                    
**Performance:** In terms of raw performance, the RTX 4090 is a beast. It features 24 GB of GDDR6X memory and a boost clock speed of up to 1.71 GHz, making it one of the fastest consumer-grade GPUs available in the market today. The RTX 3090, on the other hand, has 10 GB of GDDR6X memory and a boost clock  
speed of up to 1.70 GHz. While the RTX 3090 is still an extremely powerful GPU, the RTX 4090 offers significantly better performance in demanding applications such as 8K gaming, ray tracing, and AI-enhanced workloads.                                                                                           
                                                                                                                                                                                                                                                                                                                    
**Price:** As mentioned earlier, the RTX 4090 comes with a higher price tag compared to the RTX 3090. However, considering its improved performance and features, many users feel that the extra cost is justified. The RTX 3090 starts at around $1,499, while the RTX 4090 starts at around $1,599. While this may
seem like a significant price difference, it's essential to consider the long-term value and performance benefits that the RTX 4090 offers."

For the Azure version, there is no trailing "Now, let's continue the conversation!" or starting "I think there may be some confusion! The AI's previous response should have been:", but the result "I'm sorry, an internal error has occurred." is the same.

Actual Behavior

See attached outputs as above. The error text is "I'm sorry, an internal error has occurred."
Output_one_paragraph.txt
Output_with_paragraph.txt

The text was updated successfully, but these errors were encountered:

Pouyanpi · 2025-01-21T09:47:50Z

thank you @ta-dr0aid for openning this issue and using NeMo Guardrails 👍🏻

Would you please create a gist (https://gist.github.com/) similar to this example to follow contributing guidelines. So you can include the config required for reproduction, and also any necessary python code. You can add the output files there too.

ta-dr0aid · 2025-01-21T10:50:08Z

Hi @Pouyanpi, thanks a lot for your response. I've created a gist here and included the setup for the local ollama environment. Outputs and steps to reproduce are added as well.

schuellc-nvidia · 2025-02-14T13:47:15Z

Hi @ta-dr0aid,
I had a look at your logs, and something seems off with the generated LLM responses. In both logs, I see that the LLM responses have an incompatible format. E.g.

LLM Completion (6e284..)
Based on the conversation history and the provided list of possible user intents, I would derive the following:                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                    
`user intent`: asked about comparison between two NVIDIA products (RTX 3090 and RTX 4090)                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                    
This is because the user's action ("What is the difference...") suggests they are looking for a comparison or analysis of some sort. The specific questions about price per performance, performance, and price also indicate that the user wants to understand how these two products stack up against each other  
in terms of value and capabilities.

The LLM is adding additional explanations and comments around the actual answer and does not follow the prompt structure at all. This is something I cannot observe using the OpenAI models directly via there interface.

You stated that you are using an Azure GPT-4o model? Did you add additional system prompts or did some fine tuning maybe?

ta-dr0aid · 2025-02-14T16:13:12Z

Hi @schuellc-nvidia,

thanks a bunch for your response.

I think there might be a slight misunderstanding here. The gist linked above from which I also took the logs uses a llama3.1 model hosted via a local ollama.
I did, however, also encounter the described behavior using a company Azure GPT-4o model with the inherent changes in the LLM responses/response structure - this is not linked in the gist.

To answer your question: In both cases I did not change the system prompts. For the ollama llama instance, there is no fine tuning involved - it is just run with a plain ollama run llama3.1. For the Azure model, there should also not be a finetuning in place, but I cannot say for sure because I'm not the maintainer of the resource.

ta-dr0aid added bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized. labels Jan 17, 2025

Pouyanpi added good first issue Good for newcomers status: help wanted Issues where external contributions are encouraged. and removed status: needs triage New issues that have not yet been reviewed or categorized. labels Jan 20, 2025

Pouyanpi added the colang 2.0 label Jan 21, 2025

Pouyanpi assigned schuellc-nvidia Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: NeMo-Guardrails responses apparently breaking/terminating on line breaks with different models #936

bug: NeMo-Guardrails responses apparently breaking/terminating on line breaks with different models #936

ta-dr0aid commented Jan 17, 2025 •

edited

Loading

Pouyanpi commented Jan 21, 2025

ta-dr0aid commented Jan 21, 2025

schuellc-nvidia commented Feb 14, 2025 •

edited

Loading

ta-dr0aid commented Feb 14, 2025

bug: NeMo-Guardrails responses apparently breaking/terminating on line breaks with different models #936

bug: NeMo-Guardrails responses apparently breaking/terminating on line breaks with different models #936

Comments

ta-dr0aid commented Jan 17, 2025 • edited Loading

Did you check docs and existing issues?

Python version (python --version)

Operating system/version

NeMo-Guardrails version (if you must use a specific version and not the latest

Describe the bug

Steps To Reproduce

Expected Behavior

Actual Behavior

Pouyanpi commented Jan 21, 2025

ta-dr0aid commented Jan 21, 2025

schuellc-nvidia commented Feb 14, 2025 • edited Loading

ta-dr0aid commented Feb 14, 2025

ta-dr0aid commented Jan 17, 2025 •

edited

Loading

schuellc-nvidia commented Feb 14, 2025 •

edited

Loading