docs: Benchmarks (#92)

jakmro · web-flow · commit c2eee13aa19b · 2025-02-03T20:27:26.000+01:00
## Description
Add models Benchmarks (memory usage, inference time, model size)

### Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [x] Documentation update (improves or adds clarity to existing
documentation)

### Checklist
- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [x] My changes generate no new warnings
diff --git a/docs/docs/benchmarks/_category_.json b/docs/docs/benchmarks/_category_.json
@@ -0,0 +1,7 @@
+{
+  "label": "Benchmarks",
+  "position": 5,
+  "link": {
+    "type": "generated-index"
+  }
+}
diff --git a/docs/docs/benchmarks/inference-time.md b/docs/docs/benchmarks/inference-time.md
@@ -0,0 +1,42 @@
+---
+title: Inference Time
+sidebar_position: 3
+---
+
+:::warning warning
+Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
+:::
+
+## Classification
+
+| Model             | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
+| ----------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
+| EFFICIENTNET_V2_S | 100                          | 120                          | 130                        | 180                               | 170                       |
+
+## Object Detection
+
+| Model                          | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
+| ------------------------------ | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
+| SSDLITE_320_MOBILENET_V3_LARGE | 190                          | 260                          | 280                        | 100                               | 90                        |
+
+## Style Transfer
+
+| Model                        | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
+| ---------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
+| STYLE_TRANSFER_CANDY         | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+| STYLE_TRANSFER_MOSAIC        | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+| STYLE_TRANSFER_UDNIE         | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+| STYLE_TRANSFER_RAIN_PRINCESS | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+
+## LLMs
+
+| Model                 | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
+| --------------------- | ---------------------------------- | ---------------------------------- | -------------------------------- | --------------------------------------- | ------------------------------- |
+| LLAMA3_2_1B           | 16.1                               | 11.4                               | ❌                               | 15.6                                    | 19.3                            |
+| LLAMA3_2_1B_SPINQUANT | 40.6                               | 16.7                               | 16.5                             | 40.3                                    | 48.2                            |
+| LLAMA3_2_1B_QLORA     | 31.8                               | 11.4                               | 11.2                             | 37.3                                    | 44.4                            |
+| LLAMA3_2_3B           | ❌                                 | ❌                                 | ❌                               | ❌                                      | 7.1                             |
+| LLAMA3_2_3B_SPINQUANT | 17.2                               | 8.2                                | ❌                               | 16.2                                    | 19.4                            |
+| LLAMA3_2_3B_QLORA     | 14.5                               | ❌                                 | ❌                               | 14.8                                    | 18.1                            |
+
+❌ - Insufficient RAM.
diff --git a/docs/docs/benchmarks/memory-usage.md b/docs/docs/benchmarks/memory-usage.md
@@ -0,0 +1,36 @@
+---
+title: Memory Usage
+sidebar_position: 2
+---
+
+## Classification
+
+| Model             | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
+| ----------------- | ---------------------- | ------------------ |
+| EFFICIENTNET_V2_S | 130                    | 85                 |
+
+## Object Detection
+
+| Model                          | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
+| ------------------------------ | ---------------------- | ------------------ |
+| SSDLITE_320_MOBILENET_V3_LARGE | 90                     | 90                 |
+
+## Style Transfer
+
+| Model                        | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
+| ---------------------------- | ---------------------- | ------------------ |
+| STYLE_TRANSFER_CANDY         | 950                    | 350                |
+| STYLE_TRANSFER_MOSAIC        | 950                    | 350                |
+| STYLE_TRANSFER_UDNIE         | 950                    | 350                |
+| STYLE_TRANSFER_RAIN_PRINCESS | 950                    | 350                |
+
+## LLMs
+
+| Model                 | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
+| --------------------- | ---------------------- | ------------------ |
+| LLAMA3_2_1B           | 3.2                    | 3.1                |
+| LLAMA3_2_1B_SPINQUANT | 1.9                    | 2                  |
+| LLAMA3_2_1B_QLORA     | 2.2                    | 2.5                |
+| LLAMA3_2_3B           | 7.1                    | 7.3                |
+| LLAMA3_2_3B_SPINQUANT | 3.7                    | 3.8                |
+| LLAMA3_2_3B_QLORA     | 4                      | 4.1                |
diff --git a/docs/docs/benchmarks/model-size.md b/docs/docs/benchmarks/model-size.md
@@ -0,0 +1,36 @@
+---
+title: Model Size
+sidebar_position: 1
+---
+
+## Classification
+
+| Model             | XNNPACK [MB] | Core ML [MB] |
+| ----------------- | ------------ | ------------ |
+| EFFICIENTNET_V2_S | 85.6         | 43.9         |
+
+## Object Detection
+
+| Model                          | XNNPACK [MB] |
+| ------------------------------ | ------------ |
+| SSDLITE_320_MOBILENET_V3_LARGE | 13.9         |
+
+## Style Transfer
+
+| Model                        | XNNPACK [MB] | Core ML [MB] |
+| ---------------------------- | ------------ | ------------ |
+| STYLE_TRANSFER_CANDY         | 6.78         | 5.22         |
+| STYLE_TRANSFER_MOSAIC        | 6.78         | 5.22         |
+| STYLE_TRANSFER_UDNIE         | 6.78         | 5.22         |
+| STYLE_TRANSFER_RAIN_PRINCESS | 6.78         | 5.22         |
+
+## LLMs
+
+| Model                 | XNNPACK [GB] |
+| --------------------- | ------------ |
+| LLAMA3_2_1B           | 2.47         |
+| LLAMA3_2_1B_SPINQUANT | 1.14         |
+| LLAMA3_2_1B_QLORA     | 1.18         |
+| LLAMA3_2_3B           | 6.43         |
+| LLAMA3_2_3B_SPINQUANT | 2.55         |
+| LLAMA3_2_3B_QLORA     | 2.65         |
diff --git a/docs/docs/computer-vision/useClassification.md b/docs/docs/computer-vision/useClassification.md
@@ -86,3 +86,27 @@ function App() {
 | Model                                                                                                           | Number of classes | Class list                                                                                                                                                                 |
 | --------------------------------------------------------------------------------------------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [efficientnet_v2_s](https://pytorch.org/vision/0.20/models/generated/torchvision.models.efficientnet_v2_s.html) | 1000              | [ImageNet1k_v1](https://github.com/software-mansion/react-native-executorch/blob/main/android/src/main/java/com/swmansion/rnexecutorch/models/classification/Constants.kt) |
+
+## Benchmarks
+
+### Model size
+
+| Model             | XNNPACK [MB] | Core ML [MB] |
+| ----------------- | ------------ | ------------ |
+| EFFICIENTNET_V2_S | 85.6         | 43.9         |
+
+### Memory usage
+
+| Model             | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
+| ----------------- | ---------------------- | ------------------ |
+| EFFICIENTNET_V2_S | 130                    | 85                 |
+
+### Inference time
+
+:::warning warning
+Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
+:::
+
+| Model             | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
+| ----------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
+| EFFICIENTNET_V2_S | 100                          | 120                          | 130                        | 180                               | 170                       |
diff --git a/docs/docs/computer-vision/useObjectDetection.md b/docs/docs/computer-vision/useObjectDetection.md
@@ -124,3 +124,27 @@ function App() {
 | Model                                                                                                                                                                                                               | Number of classes | Class list                                                                                                                                          |
 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [SSDLite320 MobileNetV3 Large](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.ssdlite320_mobilenet_v3_large.html#torchvision.models.detection.SSDLite320_MobileNet_V3_Large_Weights) | 91                | [COCO](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/types/object_detection.ts#L14) |
+
+## Benchmarks
+
+### Model size
+
+| Model                          | XNNPACK [MB] |
+| ------------------------------ | ------------ |
+| SSDLITE_320_MOBILENET_V3_LARGE | 13.9         |
+
+### Memory usage
+
+| Model                          | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
+| ------------------------------ | ---------------------- | ------------------ |
+| SSDLITE_320_MOBILENET_V3_LARGE | 90                     | 90                 |
+
+### Inference time
+
+:::warning warning
+Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
+:::
+
+| Model                          | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
+| ------------------------------ | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
+| SSDLITE_320_MOBILENET_V3_LARGE | 190                          | 260                          | 280                        | 100                               | 90                        |
diff --git a/docs/docs/computer-vision/useStyleTransfer.md b/docs/docs/computer-vision/useStyleTransfer.md
@@ -78,3 +78,36 @@ function App(){
 - [Mosaic](https://github.com/pytorch/examples/tree/main/fast_neural_style)
 - [Udnie](https://github.com/pytorch/examples/tree/main/fast_neural_style)
 - [Rain princess](https://github.com/pytorch/examples/tree/main/fast_neural_style)
+
+## Benchmarks
+
+### Model size
+
+| Model                        | XNNPACK [MB] | Core ML [MB] |
+| ---------------------------- | ------------ | ------------ |
+| STYLE_TRANSFER_CANDY         | 6.78         | 5.22         |
+| STYLE_TRANSFER_MOSAIC        | 6.78         | 5.22         |
+| STYLE_TRANSFER_UDNIE         | 6.78         | 5.22         |
+| STYLE_TRANSFER_RAIN_PRINCESS | 6.78         | 5.22         |
+
+### Memory usage
+
+| Model                        | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
+| ---------------------------- | ---------------------- | ------------------ |
+| STYLE_TRANSFER_CANDY         | 950                    | 350                |
+| STYLE_TRANSFER_MOSAIC        | 950                    | 350                |
+| STYLE_TRANSFER_UDNIE         | 950                    | 350                |
+| STYLE_TRANSFER_RAIN_PRINCESS | 950                    | 350                |
+
+### Inference time
+
+:::warning warning
+Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
+:::
+
+| Model                        | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
+| ---------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
+| STYLE_TRANSFER_CANDY         | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+| STYLE_TRANSFER_MOSAIC        | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+| STYLE_TRANSFER_UDNIE         | 450                          | 600                          | 750                        | 1650                              | 1800                      |
+| STYLE_TRANSFER_RAIN_PRINCESS | 450                          | 600                          | 750                        | 1650                              | 1800                      |
diff --git a/docs/docs/fundamentals/getting-started.md b/docs/docs/fundamentals/getting-started.md
@@ -7,12 +7,15 @@ import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
 ## What is ExecuTorch?
-ExecuTorch is a novel AI framework developed by Meta, designed to streamline deploying PyTorch models on a variety of devices, including mobile phones and microcontrollers. This framework enables exporting models into standalone binaries, allowing them to run locally without requiring API calls. ExecuTorch achieves state-of-the-art performance through optimizations and delegates such as CoreML and XNNPack. It provides a seamless export process with robust debugging options, making it easier to resolve issues if they arise.
+
+ExecuTorch is a novel AI framework developed by Meta, designed to streamline deploying PyTorch models on a variety of devices, including mobile phones and microcontrollers. This framework enables exporting models into standalone binaries, allowing them to run locally without requiring API calls. ExecuTorch achieves state-of-the-art performance through optimizations and delegates such as Core ML and XNNPACK. It provides a seamless export process with robust debugging options, making it easier to resolve issues if they arise.
 
 ## React Native ExecuTorch
+
 React Native ExecuTorch is our way of bringing ExecuTorch into the React Native world. Our API is built to be simple, declarative, and efficient. Plus, we’ll provide a set of pre-exported models for common use cases, so you won’t have to worry about handling exports yourself. With just a few lines of JavaScript, you’ll be able to run AI models (even LLMs 👀) right on your device—keeping user data private and saving on cloud costs.
 
 ## Installation
+
 Installation is pretty straightforward, just use your favorite package manager.
 
 <Tabs>
@@ -54,12 +57,15 @@ Because we are using ExecuTorch under the hood, you won't be able to build iOS a
 :::
 
 Running the app with the library:
+
 ```bash
 yarn run expo:<ios | android> -d
 ```
 
 ## Good reads
-If you want to dive deeper into ExecuTorch or our previous work with the framework, we highly encourage you to check out the following resources:  
+
+If you want to dive deeper into ExecuTorch or our previous work with the framework, we highly encourage you to check out the following resources:
+
 - [ExecuTorch docs](https://pytorch.org/executorch/stable/index.html)
 - [Native code for iOS](https://medium.com/swmansion/bringing-native-ai-to-your-mobile-apps-with-executorch-part-i-ios-f1562a4556e8?source=user_profile_page---------0-------------250189c98ccf---------------)
 - [Native code for Android](https://medium.com/swmansion/bringing-native-ai-to-your-mobile-apps-with-executorch-part-ii-android-29431b6b9f7f?source=user_profile_page---------2-------------b8e3a5cb1c63---------------)
diff --git a/docs/docs/llms/exporting-llama.md b/docs/docs/llms/exporting-llama.md
@@ -3,32 +3,41 @@ title: Exporting Llama
 sidebar_position: 2
 ---
 
-In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model. 
+In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.
 
 ## Steps to export Llama
+
 ### 1. Create an account
-Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/). 
+
+Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/).
 
 ### 2. Select a model
+
 Pick the model that suits your needs. Before you download it, you'll need to accept a license. For best performance, we recommend using Spin-Quant or QLoRA versions of the model:
-   - [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original)
-   - [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original)
-   - [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main)
-   - [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main)
-   - [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main)
-   - [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main)
+
+- [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original)
+- [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original)
+- [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main)
+- [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main)
+- [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main)
+- [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main)
 
 ### 3. Download files
+
 Download the `consolidated.00.pth`, `params.json` and `tokenizer.model` files. If you can't see them, make sure to check the `original` directory.
 
 ### 4. Rename the tokenizer file
+
 Rename the `tokenizer.model` file to `tokenizer.bin` as required by the library:
+
 ```bash
 mv tokenizer.model tokenizer.bin
 ```
 
 ### 5. Run the export script
-Navigate to the `llama_export` directory and run the following command: 
+
+Navigate to the `llama_export` directory and run the following command:
+
 ```bash
 ./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json
 ```
diff --git a/docs/docs/llms/running-llms.md b/docs/docs/llms/running-llms.md
diff --git a/docs/docs/module-api/executorch-bindings.md b/docs/docs/module-api/executorch-bindings.md