tag_no_case panics on Unicode chars whose lowercase has different byte length (e.g. U+212A KELVIN SIGN)

### Description

`tag_no_case` panics with `byte index N is not a char boundary` when applied to `&str` input whose first character case-folds to a shorter UTF-8 representation than itself (e.g. **U+212A KELVIN SIGN** → ASCII `'k'`, or **U+2126 OHM SIGN** → `'ω'`).

Any nom-based parser that uses `tag_no_case` on attacker-controlled `&str` input can be crashed (DoS) by a single character.

### Reproducer (nom 7.1.3, also reproduces in tip; cargo `cargo run --release`)

`Cargo.toml`:
```toml
[dependencies]
nom = "7.1.3"
```

`src/main.rs`:
```rust
use nom::{bytes::complete::tag_no_case, error::Error, IResult};

fn main() {
    // U+212A (KELVIN SIGN, 3 bytes) case-folds to ASCII 'k' (1 byte)
    let input = "\u{212A}xyz";        // 6 bytes
    let _: IResult<&str, &str, Error<&str>> = tag_no_case("k")(input);
    //  ^ panic: "end byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3) of `Kxyz`"
}
```

Output:
```
thread 'main' panicked at .../core/src/str/mod.rs:849:21:
end byte index 1 is not a char boundary; it is inside 'K' (bytes 0..3) of `Kxyz`
```

Similarly:
- `tag_no_case("\u{03C9}")` on input starting with `"\u{2126}"` (OHM → omega): panics
- The same input through `tag_no_case` as `&[u8]` works fine (byte-level compare, no folding) — only the `&str` path panics.

### Root cause

`Compare<&str> for &str::compare_no_case()` (`src/traits.rs`) compares character-by-character using `char::to_lowercase()`:

```rust
let pos = self
    .chars()
    .zip(t.chars())
    .position(|(a, b)| a.to_lowercase().ne(b.to_lowercase()));
```

This correctly returns `CompareResult::Ok` for `"Kxyz"` vs `"k"` (because `'K'.to_lowercase() == "k"`).

But `tag_no_case` then slices the input using the **byte length of the tag** (`tag.input_len()` = 1 for `"k"`), not the byte length of the matched prefix in the input (3 bytes for U+212A):

```rust
let tag_len = tag.input_len();
...
CompareResult::Ok => Ok(i.take_split(tag_len)),   // split_at(1) panics — char boundary at byte 3
```

`split_at(1)` on `"Kxyz"` (where `'K'` occupies bytes 0..3) panics.

### Suggested fix

In `bytes::complete::tag_no_case`, after `CompareResult::Ok`, compute the actual matched byte length in the input by iterating chars and summing their `len_utf8()` until the tag is exhausted, then `take_split` using that length instead of `tag.input_len()`.

Alternatively: in `compare_no_case`, return both `CompareResult` and the consumed byte count, and have `tag_no_case` use that.

### Threat model

Any nom-based parser that uses `tag_no_case` on attacker-controlled `&str` input. Concrete examples in the wild:
- HTTP header parsers matching `keep-alive`, `content-type`, etc.
- Config / DSL parsers matching keywords case-insensitively.
- Protocol keyword detection.

A single byte in a network request (the UTF-8 bytes of U+212A) is enough to crash the worker thread / process.

### Environment

- nom: 7.1.3 (also reproduces in 8.x per code inspection)
- rustc: 1.95.0

Happy to submit a PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tag_no_case panics on Unicode chars whose lowercase has different byte length (e.g. U+212A KELVIN SIGN) #1883

Description

Reproducer (nom 7.1.3, also reproduces in tip; cargo `cargo run --release`)

Root cause

Suggested fix

Threat model

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

tag_no_case panics on Unicode chars whose lowercase has different byte length (e.g. U+212A KELVIN SIGN) #1883

Description

Description

Reproducer (nom 7.1.3, also reproduces in tip; cargo cargo run --release)

Root cause

Suggested fix

Threat model

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Reproducer (nom 7.1.3, also reproduces in tip; cargo `cargo run --release`)