Skip to content

Commit c2eee13

Browse files
authored
docs: Benchmarks (#92)
## Description Add models Benchmarks (memory usage, inference time, model size) ### Type of change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Documentation update (improves or adds clarity to existing documentation) ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings
1 parent 2a98ffa commit c2eee13

11 files changed

+274
-21
lines changed

docs/docs/benchmarks/_category_.json

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"label": "Benchmarks",
3+
"position": 5,
4+
"link": {
5+
"type": "generated-index"
6+
}
7+
}
+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: Inference Time
3+
sidebar_position: 3
4+
---
5+
6+
:::warning warning
7+
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
8+
:::
9+
10+
## Classification
11+
12+
| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
13+
| ----------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
14+
| EFFICIENTNET_V2_S | 100 | 120 | 130 | 180 | 170 |
15+
16+
## Object Detection
17+
18+
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
19+
| ------------------------------ | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
20+
| SSDLITE_320_MOBILENET_V3_LARGE | 190 | 260 | 280 | 100 | 90 |
21+
22+
## Style Transfer
23+
24+
| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
25+
| ---------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
26+
| STYLE_TRANSFER_CANDY | 450 | 600 | 750 | 1650 | 1800 |
27+
| STYLE_TRANSFER_MOSAIC | 450 | 600 | 750 | 1650 | 1800 |
28+
| STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 |
29+
| STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 |
30+
31+
## LLMs
32+
33+
| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
34+
| --------------------- | ---------------------------------- | ---------------------------------- | -------------------------------- | --------------------------------------- | ------------------------------- |
35+
| LLAMA3_2_1B | 16.1 | 11.4 || 15.6 | 19.3 |
36+
| LLAMA3_2_1B_SPINQUANT | 40.6 | 16.7 | 16.5 | 40.3 | 48.2 |
37+
| LLAMA3_2_1B_QLORA | 31.8 | 11.4 | 11.2 | 37.3 | 44.4 |
38+
| LLAMA3_2_3B ||||| 7.1 |
39+
| LLAMA3_2_3B_SPINQUANT | 17.2 | 8.2 || 16.2 | 19.4 |
40+
| LLAMA3_2_3B_QLORA | 14.5 ||| 14.8 | 18.1 |
41+
42+
❌ - Insufficient RAM.

docs/docs/benchmarks/memory-usage.md

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: Memory Usage
3+
sidebar_position: 2
4+
---
5+
6+
## Classification
7+
8+
| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
9+
| ----------------- | ---------------------- | ------------------ |
10+
| EFFICIENTNET_V2_S | 130 | 85 |
11+
12+
## Object Detection
13+
14+
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
15+
| ------------------------------ | ---------------------- | ------------------ |
16+
| SSDLITE_320_MOBILENET_V3_LARGE | 90 | 90 |
17+
18+
## Style Transfer
19+
20+
| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
21+
| ---------------------------- | ---------------------- | ------------------ |
22+
| STYLE_TRANSFER_CANDY | 950 | 350 |
23+
| STYLE_TRANSFER_MOSAIC | 950 | 350 |
24+
| STYLE_TRANSFER_UDNIE | 950 | 350 |
25+
| STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 |
26+
27+
## LLMs
28+
29+
| Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
30+
| --------------------- | ---------------------- | ------------------ |
31+
| LLAMA3_2_1B | 3.2 | 3.1 |
32+
| LLAMA3_2_1B_SPINQUANT | 1.9 | 2 |
33+
| LLAMA3_2_1B_QLORA | 2.2 | 2.5 |
34+
| LLAMA3_2_3B | 7.1 | 7.3 |
35+
| LLAMA3_2_3B_SPINQUANT | 3.7 | 3.8 |
36+
| LLAMA3_2_3B_QLORA | 4 | 4.1 |

docs/docs/benchmarks/model-size.md

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: Model Size
3+
sidebar_position: 1
4+
---
5+
6+
## Classification
7+
8+
| Model | XNNPACK [MB] | Core ML [MB] |
9+
| ----------------- | ------------ | ------------ |
10+
| EFFICIENTNET_V2_S | 85.6 | 43.9 |
11+
12+
## Object Detection
13+
14+
| Model | XNNPACK [MB] |
15+
| ------------------------------ | ------------ |
16+
| SSDLITE_320_MOBILENET_V3_LARGE | 13.9 |
17+
18+
## Style Transfer
19+
20+
| Model | XNNPACK [MB] | Core ML [MB] |
21+
| ---------------------------- | ------------ | ------------ |
22+
| STYLE_TRANSFER_CANDY | 6.78 | 5.22 |
23+
| STYLE_TRANSFER_MOSAIC | 6.78 | 5.22 |
24+
| STYLE_TRANSFER_UDNIE | 6.78 | 5.22 |
25+
| STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 |
26+
27+
## LLMs
28+
29+
| Model | XNNPACK [GB] |
30+
| --------------------- | ------------ |
31+
| LLAMA3_2_1B | 2.47 |
32+
| LLAMA3_2_1B_SPINQUANT | 1.14 |
33+
| LLAMA3_2_1B_QLORA | 1.18 |
34+
| LLAMA3_2_3B | 6.43 |
35+
| LLAMA3_2_3B_SPINQUANT | 2.55 |
36+
| LLAMA3_2_3B_QLORA | 2.65 |

docs/docs/computer-vision/useClassification.mdx docs/docs/computer-vision/useClassification.md

+24
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,27 @@ function App() {
8686
| Model | Number of classes | Class list |
8787
| --------------------------------------------------------------------------------------------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
8888
| [efficientnet_v2_s](https://pytorch.org/vision/0.20/models/generated/torchvision.models.efficientnet_v2_s.html) | 1000 | [ImageNet1k_v1](https://github.com/software-mansion/react-native-executorch/blob/main/android/src/main/java/com/swmansion/rnexecutorch/models/classification/Constants.kt) |
89+
90+
## Benchmarks
91+
92+
### Model size
93+
94+
| Model | XNNPACK [MB] | Core ML [MB] |
95+
| ----------------- | ------------ | ------------ |
96+
| EFFICIENTNET_V2_S | 85.6 | 43.9 |
97+
98+
### Memory usage
99+
100+
| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
101+
| ----------------- | ---------------------- | ------------------ |
102+
| EFFICIENTNET_V2_S | 130 | 85 |
103+
104+
### Inference time
105+
106+
:::warning warning
107+
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
108+
:::
109+
110+
| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
111+
| ----------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
112+
| EFFICIENTNET_V2_S | 100 | 120 | 130 | 180 | 170 |

docs/docs/computer-vision/useObjectDetection.mdx docs/docs/computer-vision/useObjectDetection.md

+24
Original file line numberDiff line numberDiff line change
@@ -124,3 +124,27 @@ function App() {
124124
| Model | Number of classes | Class list |
125125
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
126126
| [SSDLite320 MobileNetV3 Large](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.ssdlite320_mobilenet_v3_large.html#torchvision.models.detection.SSDLite320_MobileNet_V3_Large_Weights) | 91 | [COCO](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/types/object_detection.ts#L14) |
127+
128+
## Benchmarks
129+
130+
### Model size
131+
132+
| Model | XNNPACK [MB] |
133+
| ------------------------------ | ------------ |
134+
| SSDLITE_320_MOBILENET_V3_LARGE | 13.9 |
135+
136+
### Memory usage
137+
138+
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
139+
| ------------------------------ | ---------------------- | ------------------ |
140+
| SSDLITE_320_MOBILENET_V3_LARGE | 90 | 90 |
141+
142+
### Inference time
143+
144+
:::warning warning
145+
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
146+
:::
147+
148+
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
149+
| ------------------------------ | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
150+
| SSDLITE_320_MOBILENET_V3_LARGE | 190 | 260 | 280 | 100 | 90 |

docs/docs/computer-vision/useStyleTransfer.mdx docs/docs/computer-vision/useStyleTransfer.md

+33
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,36 @@ function App(){
7878
- [Mosaic](https://github.com/pytorch/examples/tree/main/fast_neural_style)
7979
- [Udnie](https://github.com/pytorch/examples/tree/main/fast_neural_style)
8080
- [Rain princess](https://github.com/pytorch/examples/tree/main/fast_neural_style)
81+
82+
## Benchmarks
83+
84+
### Model size
85+
86+
| Model | XNNPACK [MB] | Core ML [MB] |
87+
| ---------------------------- | ------------ | ------------ |
88+
| STYLE_TRANSFER_CANDY | 6.78 | 5.22 |
89+
| STYLE_TRANSFER_MOSAIC | 6.78 | 5.22 |
90+
| STYLE_TRANSFER_UDNIE | 6.78 | 5.22 |
91+
| STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 |
92+
93+
### Memory usage
94+
95+
| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
96+
| ---------------------------- | ---------------------- | ------------------ |
97+
| STYLE_TRANSFER_CANDY | 950 | 350 |
98+
| STYLE_TRANSFER_MOSAIC | 950 | 350 |
99+
| STYLE_TRANSFER_UDNIE | 950 | 350 |
100+
| STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 |
101+
102+
### Inference time
103+
104+
:::warning warning
105+
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
106+
:::
107+
108+
| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
109+
| ---------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
110+
| STYLE_TRANSFER_CANDY | 450 | 600 | 750 | 1650 | 1800 |
111+
| STYLE_TRANSFER_MOSAIC | 450 | 600 | 750 | 1650 | 1800 |
112+
| STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 |
113+
| STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 |

docs/docs/fundamentals/getting-started.mdx docs/docs/fundamentals/getting-started.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,15 @@ import Tabs from '@theme/Tabs';
77
import TabItem from '@theme/TabItem';
88

99
## What is ExecuTorch?
10-
ExecuTorch is a novel AI framework developed by Meta, designed to streamline deploying PyTorch models on a variety of devices, including mobile phones and microcontrollers. This framework enables exporting models into standalone binaries, allowing them to run locally without requiring API calls. ExecuTorch achieves state-of-the-art performance through optimizations and delegates such as CoreML and XNNPack. It provides a seamless export process with robust debugging options, making it easier to resolve issues if they arise.
10+
11+
ExecuTorch is a novel AI framework developed by Meta, designed to streamline deploying PyTorch models on a variety of devices, including mobile phones and microcontrollers. This framework enables exporting models into standalone binaries, allowing them to run locally without requiring API calls. ExecuTorch achieves state-of-the-art performance through optimizations and delegates such as Core ML and XNNPACK. It provides a seamless export process with robust debugging options, making it easier to resolve issues if they arise.
1112

1213
## React Native ExecuTorch
14+
1315
React Native ExecuTorch is our way of bringing ExecuTorch into the React Native world. Our API is built to be simple, declarative, and efficient. Plus, we’ll provide a set of pre-exported models for common use cases, so you won’t have to worry about handling exports yourself. With just a few lines of JavaScript, you’ll be able to run AI models (even LLMs 👀) right on your device—keeping user data private and saving on cloud costs.
1416

1517
## Installation
18+
1619
Installation is pretty straightforward, just use your favorite package manager.
1720

1821
<Tabs>
@@ -54,12 +57,15 @@ Because we are using ExecuTorch under the hood, you won't be able to build iOS a
5457
:::
5558

5659
Running the app with the library:
60+
5761
```bash
5862
yarn run expo:<ios | android> -d
5963
```
6064

6165
## Good reads
62-
If you want to dive deeper into ExecuTorch or our previous work with the framework, we highly encourage you to check out the following resources:
66+
67+
If you want to dive deeper into ExecuTorch or our previous work with the framework, we highly encourage you to check out the following resources:
68+
6369
- [ExecuTorch docs](https://pytorch.org/executorch/stable/index.html)
6470
- [Native code for iOS](https://medium.com/swmansion/bringing-native-ai-to-your-mobile-apps-with-executorch-part-i-ios-f1562a4556e8?source=user_profile_page---------0-------------250189c98ccf---------------)
6571
- [Native code for Android](https://medium.com/swmansion/bringing-native-ai-to-your-mobile-apps-with-executorch-part-ii-android-29431b6b9f7f?source=user_profile_page---------2-------------b8e3a5cb1c63---------------)

docs/docs/llms/exporting-llama.mdx docs/docs/llms/exporting-llama.md

+18-9
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,41 @@ title: Exporting Llama
33
sidebar_position: 2
44
---
55

6-
In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.
6+
In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.
77

88
## Steps to export Llama
9+
910
### 1. Create an account
10-
Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/).
11+
12+
Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/).
1113

1214
### 2. Select a model
15+
1316
Pick the model that suits your needs. Before you download it, you'll need to accept a license. For best performance, we recommend using Spin-Quant or QLoRA versions of the model:
14-
- [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original)
15-
- [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original)
16-
- [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main)
17-
- [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main)
18-
- [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main)
19-
- [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main)
17+
18+
- [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original)
19+
- [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original)
20+
- [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main)
21+
- [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main)
22+
- [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main)
23+
- [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main)
2024

2125
### 3. Download files
26+
2227
Download the `consolidated.00.pth`, `params.json` and `tokenizer.model` files. If you can't see them, make sure to check the `original` directory.
2328

2429
### 4. Rename the tokenizer file
30+
2531
Rename the `tokenizer.model` file to `tokenizer.bin` as required by the library:
32+
2633
```bash
2734
mv tokenizer.model tokenizer.bin
2835
```
2936

3037
### 5. Run the export script
31-
Navigate to the `llama_export` directory and run the following command:
38+
39+
Navigate to the `llama_export` directory and run the following command:
40+
3241
```bash
3342
./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json
3443
```

0 commit comments

Comments
 (0)