Skip to content

tag_no_case panics on &str input when Unicode case-fold changes byte length (DoS) #1884

Description

@zhangjiashuo-cs

Description

bytes::complete::tag_no_case (and bytes::streaming::tag_no_case) panic when matching an &str input where the matched character case-folds to a character of a different UTF-8 byte length than the tag character. An attacker who controls the parser input can crash any nom-based parser that uses tag_no_case with &str input.

Reproduction (nom 7.1.3)

use nom::{bytes::complete::tag_no_case, error::Error, IResult};

fn main() {
    // KELVIN SIGN (U+212A, 3 bytes) case-folds to ASCII 'k' (1 byte)
    let input: &str = "\u{212A}xyz";   // 4 chars, 6 bytes
    let _: IResult<&str, &str, Error<&str>> = tag_no_case("k")(input);
    // PANIC: byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3)

    // OHM SIGN (U+2126, 3 bytes) case-folds to ω (2 bytes)
    let input2: &str = "\u{2126}xyz";
    let _: IResult<&str, &str, Error<&str>> = tag_no_case("ab")(input2);
    // PANIC: byte index 2 is not a char boundary; it is inside 'Ω' (bytes 0..3)
}

Root cause

Compare<&str> for &str::compare_no_case() (src/traits.rs:845) does char-level comparison with to_lowercase(). After deciding "this matches", tag_no_case (src/bytes/complete.rs:85) slices the input using the byte length of the tag, not the byte length of the matched prefix in the input:

let tag_len = tag.input_len();     // byte length of the LITERAL tagCompareResult::Ok => Ok(i.take_split(tag_len)),

When the matched character in the input has more bytes than the tag character it case-folded to, tag_len lands inside a multi-byte UTF-8 character and split_at panics.

Property that fails

use proptest::prelude::*;
use nom::{bytes::complete::tag_no_case, error::Error, IResult};

proptest! {
    #[test]
    fn tag_no_case_should_never_panic(tag in "[a-zA-Z]{1,5}", input in ".*") {
        // tag_no_case must either return Ok or Err, never panic
        let _result: IResult<&str, &str, Error<&str>> =
            tag_no_case::<_, _, Error<&str>>(tag.as_str())(input.as_str());
    }
}
// Shrinks to tag="k", input="\u{212A}"

Threat model

Any nom parser using tag_no_case on &str and exposed to untrusted input is vulnerable to denial-of-service. Examples: HTTP/SMTP header parsers, config-file parsers, URL/email validators, query parsers.

The attacker needs only to include U+212A (Kelvin sign), U+2126 (Ohm sign), U+017F (long s, folds to 's'), or any other case-folding-with-byte-length-change character in a position the parser tries tag_no_case against. The crash is a plain Rust panic!, which (unless callers wrap calls in std::panic::catch_unwind — which most async/web frameworks don't) terminates the thread/process.

Suggested fix

After confirming a case-insensitive match, slice the input by the input byte length of the matched prefix, not the tag's byte length. Concretely, in bytes/complete.rs::tag_no_case, derive the slice length from iterating the input's chars and summing c.len_utf8() for as many chars as the tag has, not from tag.input_len().

Equivalent fix at the Compare trait level: have compare_no_case return the matched input prefix length alongside CompareResult::Ok.

Environment

  • nom: 7.1.3
  • Rust: 1.80+

Other affected codepoints

U+017F (ſ → s, 2 → 1), U+0130 (İ → i̇, 2 → 3), U+1FBE (ι → ι, 3 → 2), various Greek / German sharp s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions