Hi there -- my normal data-cleaning pipeline is normalize-punctuation.perl | remove-non-printing-char.perl | tokenizer.perl, but this doesn't remove non-breaking spaces, which can break some things downstream. Any chance we can add removal of nonbreaking spaces? Hex: \C2 \A0 (source: https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=A0&mode=hex)