`-i` matches multi-character sequences via Unicode case folding where GNU matches one

Under `-i`, `uu_grep` lets a single-character pattern match a *multi-character* sequence when that sequence is a Unicode case-folding of one character — e.g. `st` (folds from the ligatures `ﬅ`/`ﬆ`), `ss` (from `ß`), or `ff`/`fi`/`ffi` (from `ﬀ`/`ﬁ`/`ﬃ`). GNU under `LC_ALL=C` folds case 1:1 and matches a single character. The input here is plain ASCII, so this is not the locale/encoding limitation — the extra matching comes from the case folder, not from byte-vs-codepoint handling.

Found by the differential fuzzer (`fuzz_grep`).

**Rust (incorrect)**
```bash
$ printf 'st\n' | ./target/release/grep -o -i '[[:alpha:]]'
st
# one match spanning two characters
```

**GNU (correct)**
```bash
$ printf 'st\n' | LC_ALL=C /usr/bin/grep -o -i '[[:alpha:]]'
s
t
# two separate single-character matches
```

More cases (Rust → GNU): `ss` → `ss` vs `s`/`s`; `ff` → `ff` vs `f`/`f`; `ffi` → `ffi` vs `f`/`f`/`i`. It is order-sensitive (`st` merges but `ts` does not) and changes match *counts*, not just `-o` output.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`-i` matches multi-character sequences via Unicode case folding where GNU matches one #32

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

-i matches multi-character sequences via Unicode case folding where GNU matches one #32

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`-i` matches multi-character sequences via Unicode case folding where GNU matches one #32