Skip to content

Commit

Permalink
Add utf8-converter.
Browse files Browse the repository at this point in the history
  • Loading branch information
jorendorff committed Nov 1, 2024
1 parent ace76c1 commit b5a2f5c
Show file tree
Hide file tree
Showing 5 changed files with 1,104 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ A collection of useful algorithms written in Rust. Currently contains:

- [`geo_filters`](crates/geo_filters): probabilistic data structures that solve the [Distinct Count Problem](https://en.wikipedia.org/wiki/Count-distinct_problem) using geometric filters.
- [`bpe`](crates/bpe): fast, correct, and novel algorithms for the [Byte Pair Encoding Algorithm](https://en.wikipedia.org/wiki/Large_language_model#BPE) which are particularly useful for chunking of documents.
- [`utf8-converter`](crates/utf8-converter): converts string positions between bytes, chars, UTF-16 code units, and line numbers. Useful when sending string indices across language boundaries.

## Background

Expand Down
10 changes: 10 additions & 0 deletions crates/utf8-converter/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[package]
authors = ["The blackbird team <[email protected]>"]
edition = "2021"
name = "utf8-converter"
version = "0.1.0"

[dependencies]
itertools = "0.13"
rand = "0.8"
rand_chacha = "0.3"
12 changes: 12 additions & 0 deletions crates/utf8-converter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# UTF-8 Converter

This crate converts string positions between Rust style (UTF-8 byte offsets) and styles used by other programming languages, as well as line numbers.

## Usage

Add this to your `Cargo.toml`:

```toml
[dependencies]
utf8-converter = "0.1"
```
Loading

0 comments on commit b5a2f5c

Please sign in to comment.