Description
tag_no_case panics with byte index N is not a char boundary when applied to &str input whose first character case-folds to a shorter UTF-8 representation than itself (e.g. U+212A KELVIN SIGN → ASCII 'k', or U+2126 OHM SIGN → 'ω').
Any nom-based parser that uses tag_no_case on attacker-controlled &str input can be crashed (DoS) by a single character.
Reproducer (nom 7.1.3, also reproduces in tip; cargo cargo run --release)
Cargo.toml:
[dependencies]
nom = "7.1.3"
src/main.rs:
use nom::{bytes::complete::tag_no_case, error::Error, IResult};
fn main() {
// U+212A (KELVIN SIGN, 3 bytes) case-folds to ASCII 'k' (1 byte)
let input = "\u{212A}xyz"; // 6 bytes
let _: IResult<&str, &str, Error<&str>> = tag_no_case("k")(input);
// ^ panic: "end byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3) of `Kxyz`"
}
Output:
thread 'main' panicked at .../core/src/str/mod.rs:849:21:
end byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3) of `Kxyz`
Similarly:
tag_no_case("\u{03C9}") on input starting with "\u{2126}" (OHM → omega): panics
- The same input through
tag_no_case as &[u8] works fine (byte-level compare, no folding) — only the &str path panics.
Root cause
Compare<&str> for &str::compare_no_case() (src/traits.rs) compares character-by-character using char::to_lowercase():
let pos = self
.chars()
.zip(t.chars())
.position(|(a, b)| a.to_lowercase().ne(b.to_lowercase()));
This correctly returns CompareResult::Ok for "Kxyz" vs "k" (because 'K'.to_lowercase() == "k").
But tag_no_case then slices the input using the byte length of the tag (tag.input_len() = 1 for "k"), not the byte length of the matched prefix in the input (3 bytes for U+212A):
let tag_len = tag.input_len();
...
CompareResult::Ok => Ok(i.take_split(tag_len)), // split_at(1) panics — char boundary at byte 3
split_at(1) on "Kxyz" (where 'K' occupies bytes 0..3) panics.
Suggested fix
In bytes::complete::tag_no_case, after CompareResult::Ok, compute the actual matched byte length in the input by iterating chars and summing their len_utf8() until the tag is exhausted, then take_split using that length instead of tag.input_len().
Alternatively: in compare_no_case, return both CompareResult and the consumed byte count, and have tag_no_case use that.
Threat model
Any nom-based parser that uses tag_no_case on attacker-controlled &str input. Concrete examples in the wild:
- HTTP header parsers matching
keep-alive, content-type, etc.
- Config / DSL parsers matching keywords case-insensitively.
- Protocol keyword detection.
A single byte in a network request (the UTF-8 bytes of U+212A) is enough to crash the worker thread / process.
Environment
- nom: 7.1.3 (also reproduces in 8.x per code inspection)
- rustc: 1.95.0
Happy to submit a PR.
Description
tag_no_casepanics withbyte index N is not a char boundarywhen applied to&strinput whose first character case-folds to a shorter UTF-8 representation than itself (e.g. U+212A KELVIN SIGN → ASCII'k', or U+2126 OHM SIGN →'ω').Any nom-based parser that uses
tag_no_caseon attacker-controlled&strinput can be crashed (DoS) by a single character.Reproducer (nom 7.1.3, also reproduces in tip; cargo
cargo run --release)Cargo.toml:src/main.rs:Output:
Similarly:
tag_no_case("\u{03C9}")on input starting with"\u{2126}"(OHM → omega): panicstag_no_caseas&[u8]works fine (byte-level compare, no folding) — only the&strpath panics.Root cause
Compare<&str> for &str::compare_no_case()(src/traits.rs) compares character-by-character usingchar::to_lowercase():This correctly returns
CompareResult::Okfor"Kxyz"vs"k"(because'K'.to_lowercase() == "k").But
tag_no_casethen slices the input using the byte length of the tag (tag.input_len()= 1 for"k"), not the byte length of the matched prefix in the input (3 bytes for U+212A):split_at(1)on"Kxyz"(where'K'occupies bytes 0..3) panics.Suggested fix
In
bytes::complete::tag_no_case, afterCompareResult::Ok, compute the actual matched byte length in the input by iterating chars and summing theirlen_utf8()until the tag is exhausted, thentake_splitusing that length instead oftag.input_len().Alternatively: in
compare_no_case, return bothCompareResultand the consumed byte count, and havetag_no_caseuse that.Threat model
Any nom-based parser that uses
tag_no_caseon attacker-controlled&strinput. Concrete examples in the wild:keep-alive,content-type, etc.A single byte in a network request (the UTF-8 bytes of U+212A) is enough to crash the worker thread / process.
Environment
Happy to submit a PR.