Skip to content

tag_no_case panics on Unicode chars whose lowercase has different byte length (e.g. U+212A KELVIN SIGN) #1883

@zhangjiashuo-cs

Description

@zhangjiashuo-cs

Description

tag_no_case panics with byte index N is not a char boundary when applied to &str input whose first character case-folds to a shorter UTF-8 representation than itself (e.g. U+212A KELVIN SIGN → ASCII 'k', or U+2126 OHM SIGN'ω').

Any nom-based parser that uses tag_no_case on attacker-controlled &str input can be crashed (DoS) by a single character.

Reproducer (nom 7.1.3, also reproduces in tip; cargo cargo run --release)

Cargo.toml:

[dependencies]
nom = "7.1.3"

src/main.rs:

use nom::{bytes::complete::tag_no_case, error::Error, IResult};

fn main() {
    // U+212A (KELVIN SIGN, 3 bytes) case-folds to ASCII 'k' (1 byte)
    let input = "\u{212A}xyz";        // 6 bytes
    let _: IResult<&str, &str, Error<&str>> = tag_no_case("k")(input);
    //  ^ panic: "end byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3) of `Kxyz`"
}

Output:

thread 'main' panicked at .../core/src/str/mod.rs:849:21:
end byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3) of `Kxyz`

Similarly:

  • tag_no_case("\u{03C9}") on input starting with "\u{2126}" (OHM → omega): panics
  • The same input through tag_no_case as &[u8] works fine (byte-level compare, no folding) — only the &str path panics.

Root cause

Compare<&str> for &str::compare_no_case() (src/traits.rs) compares character-by-character using char::to_lowercase():

let pos = self
    .chars()
    .zip(t.chars())
    .position(|(a, b)| a.to_lowercase().ne(b.to_lowercase()));

This correctly returns CompareResult::Ok for "Kxyz" vs "k" (because 'K'.to_lowercase() == "k").

But tag_no_case then slices the input using the byte length of the tag (tag.input_len() = 1 for "k"), not the byte length of the matched prefix in the input (3 bytes for U+212A):

let tag_len = tag.input_len();
...
CompareResult::Ok => Ok(i.take_split(tag_len)),   // split_at(1) panics — char boundary at byte 3

split_at(1) on "Kxyz" (where 'K' occupies bytes 0..3) panics.

Suggested fix

In bytes::complete::tag_no_case, after CompareResult::Ok, compute the actual matched byte length in the input by iterating chars and summing their len_utf8() until the tag is exhausted, then take_split using that length instead of tag.input_len().

Alternatively: in compare_no_case, return both CompareResult and the consumed byte count, and have tag_no_case use that.

Threat model

Any nom-based parser that uses tag_no_case on attacker-controlled &str input. Concrete examples in the wild:

  • HTTP header parsers matching keep-alive, content-type, etc.
  • Config / DSL parsers matching keywords case-insensitively.
  • Protocol keyword detection.

A single byte in a network request (the UTF-8 bytes of U+212A) is enough to crash the worker thread / process.

Environment

  • nom: 7.1.3 (also reproduces in 8.x per code inspection)
  • rustc: 1.95.0

Happy to submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions