Releases · finalfusion/finalfusion-rust

10 Oct 10:18

danieldk

0.18.0

6669c74

0.18.0 Latest

Latest

This release updates all dependencies, upgrades to Rust 2021, and modernizes the code base up to current Clippy standards.

Thanks to @djc for doing all the work on this release!

Contributors

djc

Assets 2

12 Dec 19:26

danieldk

0.17.2

39b6b77

0.17.2

Add WriteEmbeddings::write_embeddings_len. This method returns the serialized length of embeddings in finalfusion format, without performing any serialization.
Add WriteChunk::chunk_len. This method returns the serialized length of a finalfusion chunk, without performing any serialization.
Switch the license to Apache License 2.0 or MIT

Assets 2

04 Dec 11:41

danieldk

0.17.1

c12769a

Add support for Floret embeddings

Add support for reading, writing, and using Floret embeddings.
Add a finalfusion chunk type for Floret-like vocabularies.
Add support for batched embedding lookups (embedding_batch and embedding_batch_into)
Improve error handling:
- Mark wrapped errors using #[source] to get better chains of error messages.
- Split Error::Io into Error::Read and Error::Write.
- Rename some Error variants.

Assets 2

15 Aug 06:46

danieldk

0.14.0

e6e52f0

Subword vocabulary conversion

Add conversion from bucketed subword to explicit subword embeddings.
Hide WordSimilarityResult fields. Use the cosine_similarity and word methods instead.

Assets 2

09 Jun 08:06

danieldk

0.12.2

929be2b

Faster lookup of OPQ-quantized embeddings

Make lookups of unknown words in OPQ-quantized embedding matrices 2.6x faster (resulting in ~1.6x faster allround lookups).
Add the Reconstruct trait is a counterpart to Quantize. This trait can be used to reconstruct quantized embedding matrices. Using this trait is also much faster than reconstructing individual embeddings.
Add more I/O checks to ensure that the embedding matrix can actually be represented in the native usize.

Assets 2

09 Jun 08:01

danieldk

0.12.0

c5fa50a

Improved error handling

Modernize and improve error handling

Merge the Error and ErrorKind enums.
Move the Error enum to the error module.
Derive trait implementations using the thiserror crate.
Make the Error enum non-exhaustive
Replace the ChunkIdentifier::try_from method by an implementation of the TryFrom crate.

This release also feature-gates the memmap dependency (the memmap feature is enabled by default).

Assets 2

26 Oct 07:14

danieldk

0.11.0

118ed9e

Explicit n-gram vocabularies and first API-stable release

Add ExplicitVocab, a subword vocabulary that stores n-grams explicitly.
Add the Embedding::into method. This method realizes an embedding into a user-provided array.
Support big-endian architectures.
Add WordIndex::word and WordIndex::subword methods. These will return an Option with the word index or subword indices, as applicable.
Expose the quantizer in (Mmap)QuantizedArray through the quantizer method.
Add benchmarks for array and quantized embeddings.
Split WordSimilarity into WordSimilarity and WordSimilarityBy; EmbeddingSimilarity into EmbeddingSimilarity and EmbeddingSimilarityBy.
Rename FinalfusionSubwordVocab to BucketSubwordVocab.
Expose fewer types through the prelude.
Hide the chunks module. E.g. chunks::storage becomes storage.

Assets 2

16 Sep 07:37

danieldk

0.10.0

87a0ccc

Reductive 0.3

This is a small update, that updates the reductive dependency to 0.3, which has a crucial bug fix for training product quantizers in multiple attempts. However, reductive 0.3 also requires rand 0.7, resulting in a changed API. Therefore, we have to bump the leading version number from 0.9 to 0.10.

Assets 2

04 Sep 09:13

danieldk

0.9.0

627792e

Memory-mapped quantized arrays

Add the MmapQuantizedArray storage type.
Rename Vocab::len to Vocab::words_len.
Add Vocab::vocab_len to get the vocabulary size including subword
indices.

Assets 2

13 Aug 08:44

danieldk

0.8.1

09e316e

Token robustness

Improve reading of embeddings that contain unicode whitespace in tokens.
Add lossy variants of the text/word2vec/fasttext reading methods. The lossy variants read tokens with invalid UTF-8 byte sequences.

Assets 2

Releases: finalfusion/finalfusion-rust

0.18.0

Contributors

Uh oh!

0.17.2

Uh oh!

Add support for Floret embeddings

Uh oh!

Subword vocabulary conversion

Uh oh!

Faster lookup of OPQ-quantized embeddings

Uh oh!

Improved error handling

Uh oh!

Explicit n-gram vocabularies and first API-stable release

Uh oh!

Reductive 0.3

Uh oh!

Memory-mapped quantized arrays

Uh oh!

Token robustness

Uh oh!