You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Address the feedback on the tokenizer's library (dotnet#7024)
* Fix cache when calling EncodeToIds
* Make EnglishRoberta _mergeRanks thread safe
* Delete Trainer
* Remove the setters on the Bpe properties
* Remove Roberta and Tiktoken special casing in the Tokenizer and support the cases in the Model abstraction
* Support text-embedding-3-small/large embedding
* Remove redundant TokenToId abstraction and keep the one with the extra parameters
* Enable creating Tiktoken asynchronously or directly using the tokenizer data
* Add cancellationToken support in CreateAsync APIs
* Rename sequence to text and Tokenize to Encode
* Rename skipSpecialTokens to considerSpecialTokens
* Rename TokenizerResult to EncodingResult
* Make Token publicly immutable
* Change offset tuples from (Index, End) to (Index, Length)
* Rename NormalizedString method's parameters
* Rename Model's methods to start with verb
* Convert Model.GetVocab() method to a Vocab property
* Some method's parameters and variable renaming
* Remove Vocab and VocabSize from the abstraction
* Cleanup normalization support
* Minor Bpe cleanup
* Resolve rebase change
* Address the feedback
0 commit comments