Skip to content

Commit e1b1717

Browse files
jakmrochmjkbMateusz KopcińskiNorbertKlockiewicz
authored
docs: Add v0.3.0 documentation (#114)
## Description Add documentation for v0.3.0 release ### Type of change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Documentation update (improves or adds clarity to existing documentation) ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings --------- Co-authored-by: Jakub Chmura <[email protected]> Co-authored-by: Mateusz Kopciński <[email protected]> Co-authored-by: Norbert Klockiewicz <[email protected]>
1 parent ca02d17 commit e1b1717

26 files changed

+1336
-36
lines changed

docs/docs/benchmarks/_category_.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"label": "Benchmarks",
3-
"position": 5,
3+
"position": 8,
44
"link": {
55
"type": "generated-index"
66
}

docs/docs/benchmarks/inference-time.md

+22
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,28 @@ Times presented in the tables are measured as consecutive runs of the model. Ini
2828
| STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 |
2929
| STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 |
3030

31+
## OCR
32+
33+
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
34+
| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- |
35+
| CRAFT_800 | 2099 | 2227 || 2245 | 7108 |
36+
| CRNN_EN_512 | 70 | 252 || 54 | 151 |
37+
| CRNN_EN_256 | 39 | 123 || 24 | 78 |
38+
| CRNN_EN_128 | 17 | 83 || 14 | 39 |
39+
40+
❌ - Insufficient RAM.
41+
42+
## Vertical OCR
43+
44+
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
45+
| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- |
46+
| CRAFT_1280 | 5457 | 5833 || 6296 | 14053 |
47+
| CRAFT_320 | 1351 | 1460 || 1485 | 3101 |
48+
| CRNN_EN_512 | 39 | 123 || 24 | 78 |
49+
| CRNN_EN_64 | 10 | 33 || 7 | 18 |
50+
51+
❌ - Insufficient RAM.
52+
3153
## LLMs
3254

3355
| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |

docs/docs/benchmarks/memory-usage.md

+20
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,19 @@ sidebar_position: 2
2424
| STYLE_TRANSFER_UDNIE | 950 | 350 |
2525
| STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 |
2626

27+
## OCR
28+
29+
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
30+
| --------------------------------------------------- | ---------------------- | ------------------ |
31+
| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 2100 | 1782 |
32+
33+
## Vertical OCR
34+
35+
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
36+
| ------------------------------------ | ---------------------- | ------------------ |
37+
| CRAFT_1280 + CRAFT_320 + CRNN_EN_512 | 2770 | 3720 |
38+
| CRAFT_1280 + CRAFT_320 + CRNN_EN_64 | 1770 | 2740 |
39+
2740
## LLMs
2841

2942
| Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
@@ -34,3 +47,10 @@ sidebar_position: 2
3447
| LLAMA3_2_3B | 7.1 | 7.3 |
3548
| LLAMA3_2_3B_SPINQUANT | 3.7 | 3.8 |
3649
| LLAMA3_2_3B_QLORA | 4 | 4.1 |
50+
51+
## Speech to text
52+
53+
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
54+
| -------------- | ---------------------- | ------------------ |
55+
| WHISPER_TINY | 900 | 600 |
56+
| MOONSHINE_TINY | 650 | 560 |

docs/docs/benchmarks/model-size.md

+25
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,24 @@ sidebar_position: 1
2424
| STYLE_TRANSFER_UDNIE | 6.78 | 5.22 |
2525
| STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 |
2626

27+
## OCR
28+
29+
| Model | XNNPACK [MB] |
30+
| ----------- | ------------ |
31+
| CRAFT_800 | 83.1 |
32+
| CRNN_EN_512 | 547 |
33+
| CRNN_EN_256 | 277 |
34+
| CRNN_EN_128 | 142 |
35+
36+
## Vertical OCR
37+
38+
| Model | XNNPACK [MB] |
39+
| ----------- | ------------ |
40+
| CRAFT_1280 | 83.1 |
41+
| CRAFT_320 | 83.1 |
42+
| CRNN_EN_512 | 277 |
43+
| CRNN_EN_64 | 74.3 |
44+
2745
## LLMs
2846

2947
| Model | XNNPACK [GB] |
@@ -34,3 +52,10 @@ sidebar_position: 1
3452
| LLAMA3_2_3B | 6.43 |
3553
| LLAMA3_2_3B_SPINQUANT | 2.55 |
3654
| LLAMA3_2_3B_QLORA | 2.65 |
55+
56+
## Speech to text
57+
58+
| Model | XNNPACK [MB] |
59+
| -------------- | ------------ |
60+
| WHISPER_TINY | 231.0 |
61+
| MOONSHINE_TINY | 148.9 |

docs/docs/computer-vision/_category_.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"label": "Computer Vision",
3-
"position": 3,
3+
"position": 4,
44
"link": {
55
"type": "generated-index"
66
}

docs/docs/computer-vision/useClassification.md

+7-6
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,13 @@ A string that specifies the location of the model binary. For more information,
3838

3939
### Returns
4040

41-
| Field | Type | Description |
42-
| -------------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- |
43-
| `forward` | `(input: string) => Promise<{ [category: string]: number }>` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. |
44-
| `error` | <code>string &#124; null</code> | Contains the error message if the model failed to load. |
45-
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
46-
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
41+
| Field | Type | Description |
42+
| ------------------ | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- |
43+
| `forward` | `(input: string) => Promise<{ [category: string]: number }>` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. |
44+
| `error` | <code>string &#124; null</code> | Contains the error message if the model failed to load. |
45+
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
46+
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
47+
| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. |
4748

4849
## Running the model
4950

docs/docs/computer-vision/useOCR.md

+193
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
title: useOCR
3+
sidebar_position: 4
4+
---
5+
6+
Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.
7+
8+
:::caution
9+
It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/765305abc289083787eb9613b899d6fcc0e24126/src/constants/modelUrls.ts#L51) shipped with our library.
10+
:::
11+
12+
## Reference
13+
14+
```jsx
15+
import {
16+
useOCR,
17+
CRAFT_800,
18+
RECOGNIZER_EN_CRNN_512,
19+
RECOGNIZER_EN_CRNN_256,
20+
RECOGNIZER_EN_CRNN_128
21+
} from 'react-native-executorch';
22+
23+
function App() {
24+
const model = useOCR({
25+
detectorSource: CRAFT_800,
26+
recognizerSources: {
27+
recognizerLarge: RECOGNIZER_EN_CRNN_512,
28+
recognizerMedium: RECOGNIZER_EN_CRNN_256,
29+
recognizerSmall: RECOGNIZER_EN_CRNN_128
30+
},
31+
language: "en",
32+
});
33+
34+
...
35+
for (const ocrDetection of await model.forward("https://url-to-image.jpg")) {
36+
console.log("Bounding box: ", ocrDetection.bbox);
37+
console.log("Bounding label: ", ocrDetection.text);
38+
console.log("Bounding score: ", ocrDetection.score);
39+
}
40+
...
41+
}
42+
```
43+
44+
<details>
45+
<summary>Type definitions</summary>
46+
47+
```typescript
48+
interface RecognizerSources {
49+
recognizerLarge: string | number;
50+
recognizerMedium: string | number;
51+
recognizerSmall: string | number;
52+
}
53+
54+
type OCRLanguage = 'en';
55+
56+
interface Point {
57+
x: number;
58+
y: number;
59+
}
60+
61+
interface OCRDetection {
62+
bbox: Point[];
63+
text: string;
64+
score: number;
65+
}
66+
```
67+
68+
</details>
69+
70+
### Arguments
71+
72+
**`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section.
73+
74+
**`recognizerSources`** - An object that specifies locations of the recognizers binary files. Each recognizer is composed of three models tailored to process images of varying widths.
75+
76+
- `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels.
77+
- `recognizerMedium` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels.
78+
- `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels.
79+
80+
For more information, take a look at [loading models](../fundamentals/loading-models.md) section.
81+
82+
**`language`** - A parameter that specifies the language of the text to be recognized by the OCR.
83+
84+
### Returns
85+
86+
The hook returns an object with the following properties:
87+
88+
| Field | Type | Description |
89+
| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------- |
90+
| `forward` | `(input: string) => Promise<OCRDetection[]>` | A function that accepts an image (url, b64) and returns an array of `OCRDetection` objects. |
91+
| `error` | <code>string &#124; null</code> | Contains the error message if the model loading failed. |
92+
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
93+
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
94+
| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. |
95+
96+
## Running the model
97+
98+
To run the model, you can use the `forward` method. It accepts one argument, which is the image. The image can be a remote URL, a local file URI, or a base64-encoded image. The function returns an array of `OCRDetection` objects. Each object contains coordinates of the bounding box, the text recognized within the box, and the confidence score. For more information, please refer to the reference or type definitions.
99+
100+
## Detection object
101+
102+
The detection object is specified as follows:
103+
104+
```typescript
105+
interface Point {
106+
x: number;
107+
y: number;
108+
}
109+
110+
interface OCRDetection {
111+
bbox: Point[];
112+
text: string;
113+
score: number;
114+
}
115+
```
116+
117+
The `bbox` property contains information about the bounding box of detected text regions. It is represented as four points, which are corners of detected bounding box.
118+
The `text` property contains the text recognized within detected text region. The `score` represents the confidence score of the recognized text.
119+
120+
## Example
121+
122+
```tsx
123+
import {
124+
useOCR,
125+
CRAFT_800,
126+
RECOGNIZER_EN_CRNN_512,
127+
RECOGNIZER_EN_CRNN_256,
128+
RECOGNIZER_EN_CRNN_128,
129+
} from 'react-native-executorch';
130+
131+
function App() {
132+
const model = useOCR({
133+
detectorSource: CRAFT_800,
134+
recognizerSources: {
135+
recognizerLarge: RECOGNIZER_EN_CRNN_512,
136+
recognizerMedium: RECOGNIZER_EN_CRNN_256,
137+
recognizerSmall: RECOGNIZER_EN_CRNN_128,
138+
},
139+
language: 'en',
140+
});
141+
142+
const runModel = async () => {
143+
const ocrDetections = await model.forward('https://url-to-image.jpg');
144+
145+
for (const ocrDetection of ocrDetections) {
146+
console.log('Bounding box: ', ocrDetection.bbox);
147+
console.log('Bounding text: ', ocrDetection.text);
148+
console.log('Bounding score: ', ocrDetection.score);
149+
}
150+
};
151+
}
152+
```
153+
154+
## Supported models
155+
156+
| Model | Type |
157+
| ------------------------------------------------------ | ---------- |
158+
| [CRAFT_800](https://github.com/clovaai/CRAFT-pytorch) | Detector |
159+
| [CRNN_EN_512](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
160+
| [CRNN_EN_256](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
161+
| [CRNN_EN_128](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
162+
163+
## Benchmarks
164+
165+
### Model size
166+
167+
| Model | XNNPACK [MB] |
168+
| ----------- | ------------ |
169+
| CRAFT_800 | 83.1 |
170+
| CRNN_EN_512 | 547 |
171+
| CRNN_EN_256 | 277 |
172+
| CRNN_EN_128 | 142 |
173+
174+
### Memory usage
175+
176+
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
177+
| --------------------------------------------------- | ---------------------- | ------------------ |
178+
| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 2100 | 1782 |
179+
180+
### Inference time
181+
182+
:::warning warning
183+
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
184+
:::
185+
186+
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] |
187+
| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- |
188+
| CRAFT_800 | 2099 | 2227 || 2245 | 7108 |
189+
| CRNN_EN_512 | 70 | 252 || 54 | 151 |
190+
| CRNN_EN_256 | 39 | 123 || 24 | 78 |
191+
| CRNN_EN_128 | 17 | 83 || 14 | 39 |
192+
193+
❌ - Insufficient RAM.

docs/docs/computer-vision/useObjectDetection.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: useObjectDetection
3-
sidebar_position: 2
3+
sidebar_position: 3
44
---
55

66
Object detection is a computer vision technique that identifies and locates objects within images or video. It’s commonly used in applications like image recognition, video surveillance or autonomous driving.
@@ -61,12 +61,13 @@ For more information on that topic, you can check out the [Loading models](https
6161

6262
The hook returns an object with the following properties:
6363

64-
| Field | Type | Description |
65-
| -------------- | ----------------------------------------- | ---------------------------------------------------------------------------------------- |
66-
| `forward` | `(input: string) => Promise<Detection[]>` | A function that accepts an image (url, b64) and returns an array of `Detection` objects. |
67-
| `error` | <code>string &#124; null</code> | Contains the error message if the model loading failed. |
68-
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
69-
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
64+
| Field | Type | Description |
65+
| ------------------ | ----------------------------------------- | ---------------------------------------------------------------------------------------- |
66+
| `forward` | `(input: string) => Promise<Detection[]>` | A function that accepts an image (url, b64) and returns an array of `Detection` objects. |
67+
| `error` | <code>string &#124; null</code> | Contains the error message if the model loading failed. |
68+
| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. |
69+
| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. |
70+
| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. |
7071

7172
## Running the model
7273

0 commit comments

Comments
 (0)