Description
bytes::complete::tag_no_case (and bytes::streaming::tag_no_case) panic when matching an &str input where the matched character case-folds to a character of a different UTF-8 byte length than the tag character. An attacker who controls the parser input can crash any nom-based parser that uses tag_no_case with &str input.
Reproduction (nom 7.1.3)
use nom::{bytes::complete::tag_no_case, error::Error, IResult};
fn main() {
// KELVIN SIGN (U+212A, 3 bytes) case-folds to ASCII 'k' (1 byte)
let input: &str = "\u{212A}xyz"; // 4 chars, 6 bytes
let _: IResult<&str, &str, Error<&str>> = tag_no_case("k")(input);
// PANIC: byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3)
// OHM SIGN (U+2126, 3 bytes) case-folds to ω (2 bytes)
let input2: &str = "\u{2126}xyz";
let _: IResult<&str, &str, Error<&str>> = tag_no_case("ab")(input2);
// PANIC: byte index 2 is not a char boundary; it is inside 'Ω' (bytes 0..3)
}
Root cause
Compare<&str> for &str::compare_no_case() (src/traits.rs:845) does char-level comparison with to_lowercase(). After deciding "this matches", tag_no_case (src/bytes/complete.rs:85) slices the input using the byte length of the tag, not the byte length of the matched prefix in the input:
let tag_len = tag.input_len(); // byte length of the LITERAL tag
…
CompareResult::Ok => Ok(i.take_split(tag_len)),
When the matched character in the input has more bytes than the tag character it case-folded to, tag_len lands inside a multi-byte UTF-8 character and split_at panics.
Property that fails
use proptest::prelude::*;
use nom::{bytes::complete::tag_no_case, error::Error, IResult};
proptest! {
#[test]
fn tag_no_case_should_never_panic(tag in "[a-zA-Z]{1,5}", input in ".*") {
// tag_no_case must either return Ok or Err, never panic
let _result: IResult<&str, &str, Error<&str>> =
tag_no_case::<_, _, Error<&str>>(tag.as_str())(input.as_str());
}
}
// Shrinks to tag="k", input="\u{212A}"
Threat model
Any nom parser using tag_no_case on &str and exposed to untrusted input is vulnerable to denial-of-service. Examples: HTTP/SMTP header parsers, config-file parsers, URL/email validators, query parsers.
The attacker needs only to include U+212A (Kelvin sign), U+2126 (Ohm sign), U+017F (long s, folds to 's'), or any other case-folding-with-byte-length-change character in a position the parser tries tag_no_case against. The crash is a plain Rust panic!, which (unless callers wrap calls in std::panic::catch_unwind — which most async/web frameworks don't) terminates the thread/process.
Suggested fix
After confirming a case-insensitive match, slice the input by the input byte length of the matched prefix, not the tag's byte length. Concretely, in bytes/complete.rs::tag_no_case, derive the slice length from iterating the input's chars and summing c.len_utf8() for as many chars as the tag has, not from tag.input_len().
Equivalent fix at the Compare trait level: have compare_no_case return the matched input prefix length alongside CompareResult::Ok.
Environment
Other affected codepoints
U+017F (ſ → s, 2 → 1), U+0130 (İ → i̇, 2 → 3), U+1FBE (ι → ι, 3 → 2), various Greek / German sharp s.
Description
bytes::complete::tag_no_case(andbytes::streaming::tag_no_case) panic when matching an&strinput where the matched character case-folds to a character of a different UTF-8 byte length than the tag character. An attacker who controls the parser input can crash any nom-based parser that usestag_no_casewith&strinput.Reproduction (nom 7.1.3)
Root cause
Compare<&str> for &str::compare_no_case()(src/traits.rs:845) does char-level comparison withto_lowercase(). After deciding "this matches",tag_no_case(src/bytes/complete.rs:85) slices the input using the byte length of the tag, not the byte length of the matched prefix in the input:When the matched character in the input has more bytes than the tag character it case-folded to,
tag_lenlands inside a multi-byte UTF-8 character andsplit_atpanics.Property that fails
Threat model
Any nom parser using
tag_no_caseon&strand exposed to untrusted input is vulnerable to denial-of-service. Examples: HTTP/SMTP header parsers, config-file parsers, URL/email validators, query parsers.The attacker needs only to include
U+212A(Kelvin sign),U+2126(Ohm sign),U+017F(long s, folds to 's'), or any other case-folding-with-byte-length-change character in a position the parser triestag_no_caseagainst. The crash is a plain Rustpanic!, which (unless callers wrap calls instd::panic::catch_unwind— which most async/web frameworks don't) terminates the thread/process.Suggested fix
After confirming a case-insensitive match, slice the input by the input byte length of the matched prefix, not the tag's byte length. Concretely, in
bytes/complete.rs::tag_no_case, derive the slice length from iterating the input's chars and summingc.len_utf8()for as many chars as the tag has, not fromtag.input_len().Equivalent fix at the
Comparetrait level: havecompare_no_casereturn the matched input prefix length alongsideCompareResult::Ok.Environment
Other affected codepoints
U+017F(ſ → s, 2 → 1),U+0130(İ → i̇, 2 → 3),U+1FBE(ι → ι, 3 → 2), various Greek / German sharp s.