Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add v0.3.0 documentation #114

Merged
merged 8 commits into from
Mar 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/benchmarks/_category_.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"label": "Benchmarks",
"position": 5,
"position": 8,
"link": {
"type": "generated-index"
}
Expand Down
22 changes: 22 additions & 0 deletions docs/docs/benchmarks/inference-time.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,28 @@ Times presented in the tables are measured as consecutive runs of the model. Ini
| STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 |

## OCR

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- |
| CRAFT_800 | 2099 | 2227 | ❌ | 2245 | 7108 |
| CRNN_EN_512 | 70 | 252 | ❌ | 54 | 151 |
| CRNN_EN_256 | 39 | 123 | ❌ | 24 | 78 |
| CRNN_EN_128 | 17 | 83 | ❌ | 14 | 39 |

❌ - Insufficient RAM.

## Vertical OCR

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- |
| CRAFT_1280 | 5457 | 5833 | ❌ | 6296 | 14053 |
| CRAFT_320 | 1351 | 1460 | ❌ | 1485 | 3101 |
| CRNN_EN_512 | 39 | 123 | ❌ | 24 | 78 |
| CRNN_EN_64 | 10 | 33 | ❌ | 7 | 18 |

❌ - Insufficient RAM.

## LLMs

| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
Expand Down
20 changes: 20 additions & 0 deletions docs/docs/benchmarks/memory-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,19 @@ sidebar_position: 2
| STYLE_TRANSFER_UDNIE | 950 | 350 |
| STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 |

## OCR

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| --------------------------------------------------- | ---------------------- | ------------------ |
| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 2100 | 1782 |

## Vertical OCR

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------ | ---------------------- | ------------------ |
| CRAFT_1280 + CRAFT_320 + CRNN_EN_512 | 2770 | 3720 |
| CRAFT_1280 + CRAFT_320 + CRNN_EN_64 | 1770 | 2740 |

## LLMs

| Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
Expand All @@ -34,3 +47,10 @@ sidebar_position: 2
| LLAMA3_2_3B | 7.1 | 7.3 |
| LLAMA3_2_3B_SPINQUANT | 3.7 | 3.8 |
| LLAMA3_2_3B_QLORA | 4 | 4.1 |

## Speech to text

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------------- | ---------------------- | ------------------ |
| WHISPER_TINY | 900 | 600 |
| MOONSHINE_TINY | 650 | 560 |
25 changes: 25 additions & 0 deletions docs/docs/benchmarks/model-size.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ sidebar_position: 1
| STYLE_TRANSFER_UDNIE | 6.78 | 5.22 |
| STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 |

## OCR

| Model | XNNPACK [MB] |
| ----------- | ------------ |
| CRAFT_800 | 83.1 |
| CRNN_EN_512 | 547 |
| CRNN_EN_256 | 277 |
| CRNN_EN_128 | 142 |

## Vertical OCR

| Model | XNNPACK [MB] |
| ----------- | ------------ |
| CRAFT_1280 | 83.1 |
| CRAFT_320 | 83.1 |
| CRNN_EN_512 | 277 |
| CRNN_EN_64 | 74.3 |

## LLMs

| Model | XNNPACK [GB] |
Expand All @@ -34,3 +52,10 @@ sidebar_position: 1
| LLAMA3_2_3B | 6.43 |
| LLAMA3_2_3B_SPINQUANT | 2.55 |
| LLAMA3_2_3B_QLORA | 2.65 |

## Speech to text

| Model | XNNPACK [MB] |
| -------------- | ------------ |
| WHISPER_TINY | 231.0 |
| MOONSHINE_TINY | 148.9 |
2 changes: 1 addition & 1 deletion docs/docs/computer-vision/_category_.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"label": "Computer Vision",
"position": 3,
"position": 4,
"link": {
"type": "generated-index"
}
Expand Down
13 changes: 7 additions & 6 deletions docs/docs/computer-vision/useClassification.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,13 @@ A string that specifies the location of the model binary. For more information,

### Returns

| Field | Type | Description |
| -------------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- |
| `forward` | `(input: string) => Promise<{ [category: string]: number }>` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. |
| `error` | <code>string &#124; null</code> | Contains the error message if the model failed to load. |
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
| Field | Type | Description |
| ------------------ | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- |
| `forward` | `(input: string) => Promise<{ [category: string]: number }>` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. |
| `error` | <code>string &#124; null</code> | Contains the error message if the model failed to load. |
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. |

## Running the model

Expand Down
193 changes: 193 additions & 0 deletions docs/docs/computer-vision/useOCR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
title: useOCR
sidebar_position: 4
---

Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.

:::caution
It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/765305abc289083787eb9613b899d6fcc0e24126/src/constants/modelUrls.ts#L51) shipped with our library.
:::

## Reference

```jsx
import {
useOCR,
CRAFT_800,
RECOGNIZER_EN_CRNN_512,
RECOGNIZER_EN_CRNN_256,
RECOGNIZER_EN_CRNN_128
} from 'react-native-executorch';

function App() {
const model = useOCR({
detectorSource: CRAFT_800,
recognizerSources: {
recognizerLarge: RECOGNIZER_EN_CRNN_512,
recognizerMedium: RECOGNIZER_EN_CRNN_256,
recognizerSmall: RECOGNIZER_EN_CRNN_128
},
language: "en",
});

...
for (const ocrDetection of await model.forward("https://url-to-image.jpg")) {
console.log("Bounding box: ", ocrDetection.bbox);
console.log("Bounding label: ", ocrDetection.text);
console.log("Bounding score: ", ocrDetection.score);
}
...
}
```

<details>
<summary>Type definitions</summary>

```typescript
interface RecognizerSources {
recognizerLarge: string | number;
recognizerMedium: string | number;
recognizerSmall: string | number;
}

type OCRLanguage = 'en';

interface Point {
x: number;
y: number;
}

interface OCRDetection {
bbox: Point[];
text: string;
score: number;
}
```

</details>

### Arguments

**`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section.

**`recognizerSources`** - An object that specifies locations of the recognizers binary files. Each recognizer is composed of three models tailored to process images of varying widths.

- `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels.
- `recognizerMedium` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels.
- `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels.

For more information, take a look at [loading models](../fundamentals/loading-models.md) section.

**`language`** - A parameter that specifies the language of the text to be recognized by the OCR.

### Returns

The hook returns an object with the following properties:

| Field | Type | Description |
| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------- |
| `forward` | `(input: string) => Promise<OCRDetection[]>` | A function that accepts an image (url, b64) and returns an array of `OCRDetection` objects. |
| `error` | <code>string &#124; null</code> | Contains the error message if the model loading failed. |
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. |

## Running the model

To run the model, you can use the `forward` method. It accepts one argument, which is the image. The image can be a remote URL, a local file URI, or a base64-encoded image. The function returns an array of `OCRDetection` objects. Each object contains coordinates of the bounding box, the text recognized within the box, and the confidence score. For more information, please refer to the reference or type definitions.

## Detection object

The detection object is specified as follows:

```typescript
interface Point {
x: number;
y: number;
}

interface OCRDetection {
bbox: Point[];
text: string;
score: number;
}
```

The `bbox` property contains information about the bounding box of detected text regions. It is represented as four points, which are corners of detected bounding box.
The `text` property contains the text recognized within detected text region. The `score` represents the confidence score of the recognized text.

## Example

```tsx
import {
useOCR,
CRAFT_800,
RECOGNIZER_EN_CRNN_512,
RECOGNIZER_EN_CRNN_256,
RECOGNIZER_EN_CRNN_128,
} from 'react-native-executorch';

function App() {
const model = useOCR({
detectorSource: CRAFT_800,
recognizerSources: {
recognizerLarge: RECOGNIZER_EN_CRNN_512,
recognizerMedium: RECOGNIZER_EN_CRNN_256,
recognizerSmall: RECOGNIZER_EN_CRNN_128,
},
language: 'en',
});

const runModel = async () => {
const ocrDetections = await model.forward('https://url-to-image.jpg');

for (const ocrDetection of ocrDetections) {
console.log('Bounding box: ', ocrDetection.bbox);
console.log('Bounding text: ', ocrDetection.text);
console.log('Bounding score: ', ocrDetection.score);
}
};
}
```

## Supported models

| Model | Type |
| ------------------------------------------------------ | ---------- |
| [CRAFT_800](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN_EN_512](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_EN_256](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_EN_128](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| ----------- | ------------ |
| CRAFT_800 | 83.1 |
| CRNN_EN_512 | 547 |
| CRNN_EN_256 | 277 |
| CRNN_EN_128 | 142 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| --------------------------------------------------- | ---------------------- | ------------------ |
| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 2100 | 1782 |

### Inference time

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- |
| CRAFT_800 | 2099 | 2227 | ❌ | 2245 | 7108 |
| CRNN_EN_512 | 70 | 252 | ❌ | 54 | 151 |
| CRNN_EN_256 | 39 | 123 | ❌ | 24 | 78 |
| CRNN_EN_128 | 17 | 83 | ❌ | 14 | 39 |

❌ - Insufficient RAM.
15 changes: 8 additions & 7 deletions docs/docs/computer-vision/useObjectDetection.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: useObjectDetection
sidebar_position: 2
sidebar_position: 3
---

Object detection is a computer vision technique that identifies and locates objects within images or video. It’s commonly used in applications like image recognition, video surveillance or autonomous driving.
Expand Down Expand Up @@ -61,12 +61,13 @@ For more information on that topic, you can check out the [Loading models](https

The hook returns an object with the following properties:

| Field | Type | Description |
| -------------- | ----------------------------------------- | ---------------------------------------------------------------------------------------- |
| `forward` | `(input: string) => Promise<Detection[]>` | A function that accepts an image (url, b64) and returns an array of `Detection` objects. |
| `error` | <code>string &#124; null</code> | Contains the error message if the model loading failed. |
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
| Field | Type | Description |
| ------------------ | ----------------------------------------- | ---------------------------------------------------------------------------------------- |
| `forward` | `(input: string) => Promise<Detection[]>` | A function that accepts an image (url, b64) and returns an array of `Detection` objects. |
| `error` | <code>string &#124; null</code> | Contains the error message if the model loading failed. |
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. |

## Running the model

Expand Down
Loading
Loading