Skip to content

Token robustness

Choose a tag to compare

@danieldk danieldk released this 13 Aug 08:44
  • Improve reading of embeddings that contain unicode whitespace in tokens.
  • Add lossy variants of the text/word2vec/fasttext reading methods. The lossy variants read tokens with invalid UTF-8 byte sequences.