Under -i, uu_grep lets a single-character pattern match a multi-character sequence when that sequence is a Unicode case-folding of one character — e.g. st (folds from the ligatures ſt/st), ss (from ß), or ff/fi/ffi (from ff/fi/ffi). GNU under LC_ALL=C folds case 1:1 and matches a single character. The input here is plain ASCII, so this is not the locale/encoding limitation — the extra matching comes from the case folder, not from byte-vs-codepoint handling.
Found by the differential fuzzer (fuzz_grep).
Rust (incorrect)
$ printf 'st\n' | ./target/release/grep -o -i '[[:alpha:]]'
st
# one match spanning two characters
GNU (correct)
$ printf 'st\n' | LC_ALL=C /usr/bin/grep -o -i '[[:alpha:]]'
s
t
# two separate single-character matches
More cases (Rust → GNU): ss → ss vs s/s; ff → ff vs f/f; ffi → ffi vs f/f/i. It is order-sensitive (st merges but ts does not) and changes match counts, not just -o output.
Under
-i,uu_greplets a single-character pattern match a multi-character sequence when that sequence is a Unicode case-folding of one character — e.g.st(folds from the ligaturesſt/st),ss(fromß), orff/fi/ffi(fromff/fi/ffi). GNU underLC_ALL=Cfolds case 1:1 and matches a single character. The input here is plain ASCII, so this is not the locale/encoding limitation — the extra matching comes from the case folder, not from byte-vs-codepoint handling.Found by the differential fuzzer (
fuzz_grep).Rust (incorrect)
GNU (correct)
More cases (Rust → GNU):
ss→ssvss/s;ff→ffvsf/f;ffi→ffivsf/f/i. It is order-sensitive (stmerges buttsdoes not) and changes match counts, not just-ooutput.