Skip to content

Conversation

sanjay20m
Copy link

Summary

This PR fixes a memory safety issue in StringToArrayFast() where the parser
could read past the end of the model string when the declared array size was
larger than the available data.

Vulnerability

An attacker could create a malicious LightGBM text model file with:

  • A large declared array size
  • Fewer actual numeric values

This mismatch would cause the parser to read beyond the allocated buffer,
triggering undefined behavior. This can result in:

  • Denial of Service (process crash)
  • Possible information disclosure

Fix

  • Added a check to ensure the parser does not read beyond the string's end.
  • Logs a fatal error and aborts safely if the file is malformed.

Security Impact

This hardens the model loading path against malicious or corrupted model files.
The patch does not change public APIs or intended parsing behavior.


…ffer over-read


This patch fixes a heap buffer over-read vulnerability in the C++ core of LightGBM.
The `StringToArrayFast()` function did not check if the parser had reached the
end of the string before reading the next array element.
@jameslamb jameslamb changed the title [security] Add bounds check in StringToArrayFast() to prevent heap bu… [c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read Aug 11, 2025
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in LightGBM.

How did you discover this? How can we test it? Can you share a normal model file and one modified in the way you say this protects against, so we can understand what's being proposed here?

@sanjay20m
Copy link
Author

sanjay20m commented Aug 11, 2025

Hi @jameslamb
I discovered this while inspecting the StringToArrayFast() implementation and noticed that if the declared array size (n) is larger than the number of values in the input string, the parser will keep advancing the pointer past the end of the string buffer. This could happen if a model file is corrupted or manually edited.

How to reproduce

  1. Train any LightGBM model (e.g., using examples/regression) and save it as a text model file.
  2. Open the file and locate a numeric array line, such as:
    thresholds=0.1 0.5 0.9
  3. Edit the corresponding metadata or header so that the declared array size is larger than the number of values in that array (for example, 10 instead of 3).
  4. Load the modified model:
  • Before this patch → parser reads beyond the buffer, which may cause a crash or unexpected values.
  • With this patch → parser stops and logs:
    Fatal: Malformed model file: not enough values in string for array of size X
    

This demonstrates that the change prevents a possible heap buffer over-read when loading malformed model files.

@sanjay20m sanjay20m requested a review from jameslamb August 12, 2025 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants