Name	Name	Last commit message	Last commit date
Latest commit wdoppenberg Excluded fixtures/scripts from lib (#33 ) Apr 26, 2024 d9d513b · Apr 26, 2024 History 71 Commits
.cargo	.cargo	Very preliminary embedder server	Jan 19, 2024
.github/workflows	.github/workflows	Error handling (#28 )	Apr 22, 2024
crates	crates	Excluded fixtures/scripts from lib (#33 )	Apr 26, 2024
scripts	scripts	CUDA dockerfile (#22 )	Apr 12, 2024
tests	tests	Aligned tokenizer strategy w.r.t. `sentence-transformers` (#32 )	Apr 26, 2024
.dockerignore	.dockerignore	Docker image + publishing workflow + minor updates (#17 )	Apr 11, 2024
.gitignore	.gitignore	Aligned tokenizer strategy w.r.t. `sentence-transformers` (#32 )	Apr 26, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	From file or folder (#16 )	Apr 10, 2024
Cargo.toml	Cargo.toml	Excluded fixtures/scripts from lib (#33 )	Apr 26, 2024
Dockerfile	Dockerfile	Docker image + publishing workflow + minor updates (#17 )	Apr 11, 2024
Dockerfile.cuda	Dockerfile.cuda	CUDA dockerfile (#22 )	Apr 12, 2024
README.md	README.md	Breaking change: device selection, pooling strategy required args (#30 )	Apr 23, 2024

`glowrs`

Library Usage

glowrs provides an easy and familiar interface to use pre-trained models for embeddings and sentence similarity. Inspired by the sentence-transformers library, which is a great Python library for sentence embeddings and features a wide range of models and utilities.

Example

use glowrs::{SentenceTransformer, Device, PoolingStrategy, Error};

fn main() -> Result<(), Error> {
    let encoder = SentenceTransformer::from_repo_string("sentence-transformers/all-MiniLM-L6-v2", &Device::Cpu)?;

    let sentences = vec![
        "Hello, how are you?",
        "Hey, how are you doing?"
    ];

    let embeddings = encoder.encode_batch(sentences, true, PoolingStrategy::Mean)?;

    println!("{:?}", embeddings);
    
    Ok(())
}

Features

Load models from Hugging Face Hub
Use hardware acceleration (Metal, CUDA)
More to come!

Server Usage

glowrs-server provides a web server for sentence embedding inference. Uses candle as Tensor framework. It currently supports Bert type models hosted on Huggingface, such as those provided by sentence-transformers, Tom Aarsen, or Jina AI, as long as they provide safetensors model weights.

Example usage with the jina-embeddings-v2-base-en model:

cargo run --bin glowrs-server --release -- --model-repo jinaai/jina-embeddings-v2-base-en

If you want to use a certain revision of the model, you can append it to the repository name like so.

cargo run --bin glowrs-server --release -- --model-repo jinaai/jina-embeddings-v2-base-en:main

If you want to run multiple models, you can run multiple instances of the glowrs-server with different model repos.

cargo run --bin glowrs-server --release -- --model-repo jinaai/jina-embeddings-v2-base-en sentence-transformers/paraphrase-multilingual-mpnet-base-v2

Warning: This is not supported with metal acceleration for now.

Instructions:

Usage: glowrs-server [OPTIONS]

Options:
  -m, --model-repo <MODEL_REPO>  
  -r, --revision <REVISION>      [default: main]
  -h, --help                     Print help

Build features

metal: Compile with Metal acceleration
cuda: Compile with CUDA acceleration
accelerate: Compile with Accelerate framework acceleration (CPU)

Docker Usage

For now the docker image only supports CPU on x86 and arm64.

docker run -p 3000:3000 ghcr.io/wdoppenberg/glowrs-server:latest --model-repo <MODEL_REPO>

Features

`curl`

curl -X POST http://localhost:3000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["The food was delicious and the waiter...", "was too"], 
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "encoding_format": "float"
  }'

Python `openai` client

Install the OpenAI Python library:

pip install openai

Use the embeddings method regularly.

from openai import OpenAI
from time import time

client = OpenAI(
	api_key="sk-something",
	base_url="http://127.0.0.1:3000/v1"
)

start = time()
print(client.embeddings.create(
	input=["This is a sentence that requires an embedding"] * 50,
	model="jinaai/jina-embeddings-v2-base-en"
))

print(f"Done in {time() - start}")

# List models
print(client.models.list())

Details

Use TOKIO_WORKER_THREADS to set the number of threads per queue.

Disclaimer

This is still a work-in-progress. The embedding performance is decent but can probably do with some benchmarking. Furthermore, for higher batch sizes, the program is killed due to a bug.

Do not use this in a production environment.

Credits

Huggingface for the models and the candle library.
sentence-transformers for being the gold standard in sentence embeddings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`glowrs`

Library Usage

Example

Features

Server Usage

Instructions:

Build features

Docker Usage

Features

`curl`

Python `openai` client

Details

Disclaimer

Credits

About

Releases 3

Packages 1

Languages

wdoppenberg/glowrs

Folders and files

Latest commit

History

Repository files navigation

glowrs

Library Usage

Example

Features

Server Usage

Instructions:

Build features

Docker Usage

Features

curl

Python openai client

Details

Disclaimer

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases 3

Packages 1

Languages

`glowrs`

`curl`

Python `openai` client