[c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read #6998

sanjay20m · 2025-08-11T14:09:34Z

Summary

This PR fixes a memory safety issue in StringToArrayFast() where the parser
could read past the end of the model string when the declared array size was
larger than the available data.

Vulnerability

An attacker could create a malicious LightGBM text model file with:

A large declared array size
Fewer actual numeric values

This mismatch would cause the parser to read beyond the allocated buffer,
triggering undefined behavior. This can result in:

Denial of Service (process crash)
Possible information disclosure

Fix

Added a check to ensure the parser does not read beyond the string's end.
Logs a fatal error and aborts safely if the file is malformed.

Security Impact

This hardens the model loading path against malicious or corrupted model files.
The patch does not change public APIs or intended parsing behavior.

…ffer over-read This patch fixes a heap buffer over-read vulnerability in the C++ core of LightGBM. The `StringToArrayFast()` function did not check if the parser had reached the end of the string before reading the next array element.

jameslamb

Thanks for your interest in LightGBM.

How did you discover this? How can we test it? Can you share a normal model file and one modified in the way you say this protects against, so we can understand what's being proposed here?

sanjay20m · 2025-08-11T16:52:05Z

Hi @jameslamb
I discovered this while inspecting the StringToArrayFast() implementation and noticed that if the declared array size (n) is larger than the number of values in the input string, the parser will keep advancing the pointer past the end of the string buffer. This could happen if a model file is corrupted or manually edited.

How to reproduce

Train any LightGBM model (e.g., using examples/regression) and save it as a text model file.
Open the file and locate a numeric array line, such as:
thresholds=0.1 0.5 0.9
Edit the corresponding metadata or header so that the declared array size is larger than the number of values in that array (for example, 10 instead of 3).
Load the modified model:

Before this patch → parser reads beyond the buffer, which may cause a crash or unexpected values.

With this patch → parser stops and logs:

Fatal: Malformed model file: not enough values in string for array of size X

This demonstrates that the change prevents a possible heap buffer over-read when loading malformed model files.

sanjay20m requested review from StrikerRUS, borchero, guolinke, jameslamb, jmoralez and shiyu1994 as code owners August 11, 2025 14:09

jameslamb changed the title ~~[security] Add bounds check in StringToArrayFast() to prevent heap bu…~~ [c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read Aug 11, 2025

jameslamb requested changes Aug 11, 2025

View reviewed changes

jameslamb added the maintenance label Aug 11, 2025

sanjay20m requested a review from jameslamb August 12, 2025 04:18

Merge branch 'master' into patch-4

1d4e0f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read #6998

[c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read #6998

sanjay20m commented Aug 11, 2025

Uh oh!

jameslamb left a comment

Uh oh!

sanjay20m commented Aug 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read #6998

Are you sure you want to change the base?

[c++] Add bounds check in StringToArrayFast() to prevent heap buffer over-read #6998

Conversation

sanjay20m commented Aug 11, 2025

Summary

Vulnerability

Fix

Security Impact

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

sanjay20m commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanjay20m commented Aug 11, 2025 •

edited

Loading