Qualcomm AI Engine Direct - Support LPAI Backend by shewu-quic · Pull Request #18151 · pytorch/executorch

shewu-quic · 2026-03-13T05:34:02Z

Summary:

Add LPAI infra
- Because of the accuracy issue with quantize and dequantize operations in LPAI, Q and DQ will fallback to CPU for the LPAI backend. Therefore, it only support fully delegate for now
- Support runtime option for fps, ftrt_ratio, clent_perf_type, affinity, and core_selection. Refer to QNN doc to get more details for these options.
Support LPAI in test_qnn_delegate.py
Refactor documentation for LPAI and GPU
Add a script to sign the skel library
- Note that the skel library must be signed and must run with root on the device for LPAI.

Test Plan

python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_mean  -b build-android  -H ${host} -s ${serial_num} -m SM8850 -r /path/to/executorch -a /path/to/artifacts --backend lpai

pytorch-bot · 2026-03-13T05:34:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18151

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 7 Pending

As of commit 25c600e with merge base 71334aa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-13T05:34:50Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

shewu-quic · 2026-03-18T06:59:30Z

Hi @cccclai, @abhinaykukkadapu,

This PR is to support LAPI Backend on x86 and arm (with RPC mode). After LPAI backend is supported, it should be easy to extend non-RPC support for aDSP.
Could you please help to take a look?

Thanks,
Hutton

abhinaykukkadapu · 2026-03-30T05:36:04Z

docs/source/backends-qualcomm.md

- - SXR2330P
- - QCM6490
- - QCS9100
+You will need an Android / Linux device with adb-connected running on one of Qualcomm SoCs listed in `QcomChipset`. Please navigate to [qc_schema.py](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/serialization/qc_schema.py).


Nice! thanks

abhinaykukkadapu · 2026-03-30T05:41:44Z

backends/qualcomm/quantizer/annotators/lpai_rules.py

+    ],
+    QnnConstants.OpConv2d.op_name,
+)
+class Conv2d(GeneralOpDef):


Curious, if we are going to follow up validation for the LPAI op def in following PRs?

Yes, once QNN 2.45 is available, validation will be enabled in QNN and we will test using this version.

abhinaykukkadapu · 2026-03-30T06:12:19Z

backends/qualcomm/partition/qnn_partitioner.py

+        pl = len(partitions)
+        if backend == QnnExecuTorchBackendType.kLpaiBackend:
+            assert (
+                pl != 1


If we are enforcing full partition, shouldn't this be pl == 1?

Woops, you are right. Let me fix it. Thanks for pointing out.

abhinaykukkadapu · 2026-03-30T06:12:47Z

@shewu-quic thanks for adding the LPAI backend, can you please rebase?

abhinaykukkadapu · 2026-03-30T06:14:15Z

docs/source/backends-qualcomm.md


 The compiled model is `./deeplab_v3/dlv3_qnn.pte`.

+Note that the model is compiled for specific backend (e.g., HTP), so you can specify the target backend via `--backend gpu` or `--backend lpai`. If not specified, it will be default to HTP.


Noob question, does the delegate in current state support multiple QNN backends in a single graph, for example, can we mark some nodes for GPU and others for HTP?

We don't currently have a use case for this, but I think we can run some tests for this scenario. Thanks!
May I know do you have any use case for it?

Not really, but i wanted to flush if there are any issues and we can advertise stronger support just staying within QNN ecosystem, not sure if we have this but i remember QnnMem.h was one direction to share memory between backends, so it would be great and was curious to know if we support, if a usecase appears.

This is not supported today but can be in the future once a couple of other in progress features come in. We can discuss this at some point in the 2nd half of the year.

shewu-quic · 2026-03-31T02:18:33Z

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

abhinaykukkadapu · 2026-03-31T02:29:32Z

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

Hi @shewu-quic, thanks these are some test skills we are experimenting across Executorch, but this brings an important direction, which i wanted to discuss in tomorrow's meeting, we definitely would like to have the et-qnn related skills. Some skills i'm thinking:

QNN Profiling and debugger
QNN lowering/ export
QNN Custom kernels
New op development
Model enablement

let's chat more tomorrow in the sync.

shewu-quic · 2026-03-31T02:56:51Z

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

Hi @shewu-quic, thanks these are some test skills we are experimenting across Executorch, but this brings an important direction, which i wanted to discuss in tomorrow's meeting, we definitely would like to have the et-qnn related skills. Some skills i'm thinking:

QNN Profiling and debugger

QNN lowering/ export

QNN Custom kernels

New op development

Model enablement

let's chat more tomorrow in the sync.

Sure, I think that would be an interesting topic to include. If you can, please share your experience with how you plan to contribute to and manage the Claude file. Also, what are the current use cases in the executorch repo in tomorrow's meeting?

Thanks!

abhinaykukkadapu · 2026-03-31T20:13:48Z

@shewu-quic can you fix the lints, i will import and check the internal CI as this is a large diff, just to be on the safe side.

shewu-quic · 2026-04-01T00:23:43Z

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

Hi @shewu-quic, thanks these are some test skills we are experimenting across Executorch, but this brings an important direction, which i wanted to discuss in tomorrow's meeting, we definitely would like to have the et-qnn related skills. Some skills i'm thinking:

QNN Profiling and debugger

QNN lowering/ export

QNN Custom kernels

New op development

Model enablement

let's chat more tomorrow in the sync.

Could you let me know if you recommend having a separate SKILL.md for each skill, or should they all be combined into one SKILL.md?
I'm currently creating a folder named qualcomm in .claude/skills. Is this what you expected?

shewu-quic · 2026-04-01T00:30:27Z

@shewu-quic can you fix the lints, i will import and check the internal CI as this is a large diff, just to be on the safe side.

Fixed. Thanks!

meta-codesync · 2026-04-01T05:27:04Z

@abhinaykukkadapu has imported this pull request. If you are a Meta employee, you can view this in D99040563.

abhinaykukkadapu · 2026-04-01T19:49:39Z

@shewu-quic there are few internal build failures, as discussed yesterday i'm adding the CI to catch buck and cmake drifts like these. Here is the patch for you.

--- a/backends/qualcomm/quantizer/targets.bzl
  +++ b/backends/qualcomm/quantizer/targets.bzl
  @@ -15,5 +15,7 @@
           visibility = ["PUBLIC"],
           deps = [
               "//executorch/backends/transforms:decompose_sdpa",
  +            "//executorch/extension/pybindings:portable_lib",
  +            "//executorch/kernels/quantized:quantized_ops_lib",
           ],
       )
  --- a/backends/qualcomm/runtime/targets.bzl
  +++ b/backends/qualcomm/runtime/targets.bzl
  @@ -47,11 +47,17 @@
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  +                    "backends/lpai/*.cpp",
                   ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/gpu/target/*.cpp"]) + (
                       ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/htp/target/*.cpp"]) + (
  -                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"])
   + (
  +                    ["backends/lpai/x86_64/*.cpp"] if include_aot_qnn_lib else
  ["backends/lpai/aarch64/*.cpp"]
                   ),
  -                exclude = ["Logging.cpp"],
  +                exclude = [
  +                    "Logging.cpp",
  +                    # AfterConfigure is not declared in LpaiGraph - dead code from import
  +                    "backends/lpai/aarch64/LpaiGraph.cpp",
  +                ],
               ),
               exported_headers = glob(
                   [
  @@ -60,6 +66,7 @@
                       "backends/gpu/*.h",
                       "backends/htp/*.h",
                       "backends/ir/*.h",
  +                    "backends/lpai/*.h",
                   ],
                   exclude = ["Logging.h"],
               ),

shewu-quic · 2026-04-02T00:41:56Z

@shewu-quic there are few internal build failures, as discussed yesterday i'm adding the CI to catch buck and cmake drifts like these. Here is the patch for you.

--- a/backends/qualcomm/quantizer/targets.bzl
  +++ b/backends/qualcomm/quantizer/targets.bzl
  @@ -15,5 +15,7 @@
           visibility = ["PUBLIC"],
           deps = [
               "//executorch/backends/transforms:decompose_sdpa",
  +            "//executorch/extension/pybindings:portable_lib",
  +            "//executorch/kernels/quantized:quantized_ops_lib",
           ],
       )
  --- a/backends/qualcomm/runtime/targets.bzl
  +++ b/backends/qualcomm/runtime/targets.bzl
  @@ -47,11 +47,17 @@
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  +                    "backends/lpai/*.cpp",
                   ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/gpu/target/*.cpp"]) + (
                       ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/htp/target/*.cpp"]) + (
  -                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"])
   + (
  +                    ["backends/lpai/x86_64/*.cpp"] if include_aot_qnn_lib else
  ["backends/lpai/aarch64/*.cpp"]
                   ),
  -                exclude = ["Logging.cpp"],
  +                exclude = [
  +                    "Logging.cpp",
  +                    # AfterConfigure is not declared in LpaiGraph - dead code from import
  +                    "backends/lpai/aarch64/LpaiGraph.cpp",
  +                ],
               ),
               exported_headers = glob(
                   [
  @@ -60,6 +66,7 @@
                       "backends/gpu/*.h",
                       "backends/htp/*.h",
                       "backends/ir/*.h",
  +                    "backends/lpai/*.h",
                   ],
                   exclude = ["Logging.h"],
               ),

Sure, I have applied it. Thank you.

abhinaykukkadapu · 2026-04-02T05:19:15Z

backends/qualcomm/quantizer/quantizer.py

 from typing import Callable, Dict, List, Optional, Sequence, Set, Tuple

+# To support quantize op lowering in AOT
+import executorch.extension.pybindings.portable_lib  # noqa[F401]


@shewu-quic i think this doesn't belong here, i'm trying to debug few other failures for the internal CI after the patch, this is for python runtime and not for backends?

If I didn't import this, I will encounter the error
Reproduce command: python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_linear -b build-android -s {device}-m SM8850 -a unit_test --backend lpai

[INFO 2026-04-07 09:36:21,233 __init__.py:390] Failed converting '<EdgeOpOverload: quantized_decomposed.quantize_per_tensor.default>: schema = quantized_decomposed::quantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::quantize_per_tensor can't be found. We've found the schemas of all the overloads: ['qu antized_decomposed::quantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> Tensor', 'quantized_decomposed::quantize_per_tensor.tensor(Tensor input, Tensor scale, Tensor zer o_point, int quant_min, int quant_max, ScalarType dtype) -> Tensor', 'quantized_decomposed::quantize_per_tensor.tensor2(Tensor input, Tensor scale, Tensor zero_point, Tensor quant_min, Tensor quant_max, ScalarType dtype) -> Tensor' ]' [INFO 2026-04-07 09:36:21,233 __init__.py:390] Failed converting '<EdgeOpOverload: quantized_decomposed.dequantize_per_tensor.default>: schema = quantized_decomposed::dequantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::dequantize_per_tensor can't be found. We've found t he schemas of all the overloads: ['quantized_decomposed::dequantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor', 'quantized_decompo sed::dequantize_per_tensor.tensor(Tensor input, Tensor scale, Tensor zero_point, int quant_min, int quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor', 'quantized_decomposed::dequantize_per_tensor.tensor2(Tensor input, Tensor scale, Tensor zero_point, Tensor quant_min, Tensor quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor']' ====================================================================== ERROR: test_qnn_backend_linear (__main__.TestQNNQuantizedOperator) (i=0) ---------------------------------------------------------------------- Traceback (most recent call last): File "/local/mnt/workspace/shewu/executorch/backends/qualcomm/tests/test_qnn_delegate.py", line 3756, in test_qnn_backend_linear self.lower_module_and_test_output(module, sample_input) File "/local/mnt/workspace/shewu/executorch/backends/qualcomm/tests/utils.py", line 592, in lower_module_and_test_output exec_prog = delegated_program.to_executorch( File "/local/mnt/workspace/shewu/executorch/exir/program/_program.py", line 116, in wrapper return func(*args, **kwargs) File "/local/mnt/workspace/shewu/executorch/exir/program/_program.py", line 1751, in to_executorch new_gm_res = p(new_gm) File "/local2/mnt/workspace/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 47, in __call__ res = self.call(graph_module) File "/local/mnt/workspace/shewu/executorch/exir/passes/__init__.py", line 440, in call raise RuntimeError(f"Missing out variants: {missing_out_vars}") RuntimeError: Missing out variants: {'quantized_decomposed::dequantize_per_tensor', 'quantized_decomposed::quantize_per_tensor'}

It looks like the quantized kernel will be missing if I don't import it. Are you thinking about moving this import here?

executorch/backends/qualcomm/utils/utils.py

Line 71 in c7f1d72

Or we wrap it to try-except block? Such as

executorch/backends/arm/quantizer/__init__.py

Line 25 in c7f1d72

# Load quantized ops library.

The problem is portable_lib dependency is huge and it has its registration mechanism, if we do it here, many internal tests are failing due to double registration.

I think import executorch.kernels.quantized is just sufficient? as it registers the required aot quant library: https://github.com/pytorch/executorch/blob/main/kernels/quantized/__init__.py

I think you also need to enable the cmake target -DEXECUTORCH_BUILD_KERNELS_QUANTIZED_AOT=ON

Update the buck target for quantizer from portable_lib to

deps = [ "//executorch/backends/transforms:decompose_sdpa", "//executorch/kernels/quantized:aot_lib", "//executorch/kernels/quantized:quantized_ops_lib", ],

digantdesai · 2026-04-02T16:58:59Z

@claude Review this PR, make sure buck files are updated properly, there are no test in the CI which may signal buck issues.

claude · 2026-04-02T16:59:38Z

Claude finished @digantdesai's task in 4m 51s —— View job

Review: Buck/Build File Analysis for LPAI Backend PR

Examine buck/targets.bzl files for LPAI additions
Check CMakeLists.txt vs buck targets for consistency
Verify all new LPAI source files are covered in build targets
Review the patch from @abhinaykukkadapu for completeness
Check for any remaining buck/build gaps
Provide review summary

Buck Files Status

The buck targets.bzl files have been updated with the patch from @abhinaykukkadapu and appear to be in reasonable shape, but there are a few issues worth flagging.

`backends/qualcomm/runtime/targets.bzl` — Mostly correct

The LPAI backend files are properly included in the buck glob patterns:

"backends/lpai/*.cpp" — correctly matches all 4 top-level .cpp files (LpaiBackend.cpp, LpaiBackendCustomConfig.cpp, LpaiContext.cpp, LpaiGraph.cpp)
"backends/lpai/host/*.cpp" / "backends/lpai/target/*.cpp" — correctly matches the platform-specific files using the same host/target convention as GPU, HTP, and IR backends
"backends/lpai/*.h" — correctly matches all 6 header files

Note on the original patch: The patch @abhinaykukkadapu posted in the comments mentioned x86_64 and aarch64 directories and an exclusion for backends/lpai/aarch64/LpaiGraph.cpp (with the comment "AfterConfigure is not declared"). The actual code uses host/ and target/ directories (no x86_64/aarch64 exist), and LpaiGraph.cpp correctly implements AfterRetrieveGraph and AfterCreateGraph (not AfterConfigure). The applied patch in the repo correctly uses host/target and does NOT include the erroneous LpaiGraph.cpp exclusion — this is correct.

Potential concern with exclude: The current exclude is just ["Logging.cpp"]. Unlike the original patch which expanded the exclude list, the current code relies on the glob patterns not matching unwanted files. This is fine since all .cpp files in the lpai/ directory are intentionally included.

`backends/qualcomm/quantizer/targets.bzl` — Correct but see review comment

Two new deps were added:

"//executorch/extension/pybindings:portable_lib",
"//executorch/kernels/quantized:quantized_ops_lib",

These correspond to the new imports in quantizer.py (lines 12-13):

import executorch.extension.pybindings.portable_lib  # noqa[F401]
import executorch.kernels.quantized  # noqa[F401]

@abhinaykukkadapu flagged this in the latest review comment — these imports are for python runtime (pybindings/portable_lib, quantized_ops_lib) and may not belong in the quantizer module. This is a valid concern: importing the full portable runtime and quantized kernels library as a side effect import at module load time in quantizer.py (just for "quantize op lowering in AOT") adds heavyweight dependencies to what should be a lightweight quantization annotation tool. This should be addressed.

Fix this →

CMakeLists.txt vs Buck Consistency

The CMakeLists.txt at backends/qualcomm/runtime/backends/CMakeLists.txt covers all LPAI files correctly:

CMake target	LPAI files included	Status
`qnn_device`	`LpaiDevice.h`, `${HOST_ARCHITECTURE_LPAI}/LpaiDevice.cpp`	✅
`qnn_context`	`LpaiContext.h`, `LpaiContext.cpp`, `LpaiContextCustomConfig.h`, `${HOST_ARCHITECTURE_LPAI}/LpaiContextCustomConfig.cpp`	✅
`qnn_graph`	`LpaiGraph.h`, `LpaiGraph.cpp`	✅
`qnn_backend`	`LpaiBackend.h`, `LpaiBackend.cpp`, `LpaiBackendCustomConfig.h`, `LpaiBackendCustomConfig.cpp`	✅
Headers exported	`backends/lpai/*.h`	✅

The HOST_ARCHITECTURE_LPAI variable is set to lpai/${platform} where platform is host for x86_64 and target otherwise — matching the buck convention.

Other Issues Found During Review

Naming inconsistency in runtime option member variables (QnnExecuTorchBackend.h:69-70): Two member variables use lapi instead of lpai:
```
RuntimeOption qnn_runtime_lapi_fps_{false, 0};       // should be lpai
RuntimeOption qnn_runtime_lapi_ftrt_ratio_{false, 0}; // should be lpai
```
While the remaining three use lpai correctly. The macro keys in QnnExecuTorch.h all use LPAI correctly (QNN_RUNTIME_LPAI_FPS, etc.), so this is just a member variable naming inconsistency. It works correctly because the key strings match, but it's confusing.

Fix this →
Typo in runtime option key (QnnExecuTorch.h:25):
```
#define QNN_RUNTIME_LPAI_CLIENT_PERF_TYPE "qnn_runtime_lpai_clent_perf_type"
```
clent should be client. This is the actual string key used at runtime, so it would need to match on both the set_option and get_option sides. Since both sides use the same macro, it's functionally correct but will be confusing for anyone using the string directly.

Fix this →
quantizer.py imports (flagged by @abhinaykukkadapu): The executorch.extension.pybindings.portable_lib and executorch.kernels.quantized imports at the top of quantizer.py pull in heavy runtime dependencies. This needs to be resolved — either move them to the call site that actually needs them, or add a clear explanation of why they must be at module level.

Summary

Buck files are properly updated for the LPAI backend. All new source files are covered by glob patterns in runtime/targets.bzl, and the quantizer deps in quantizer/targets.bzl match the new imports (though the imports themselves may need relocation per reviewer feedback). CMake and buck are consistent in their LPAI file coverage. The minor issues flagged above (naming inconsistency, typo, questionable imports) are not buck-build-breaking but should be cleaned up.

shewu-quic · 2026-04-08T05:11:01Z

Hi @abhinaykukkadapu ,
I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend.
Could you try your internal CI again?

abhinaykukkadapu · 2026-04-08T05:40:05Z

Hi @abhinaykukkadapu , I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend. Could you try your internal CI again?

Just imported again, will monitor, meanwhile can you try this: #18151 (comment)

shewu-quic · 2026-04-08T05:56:40Z

Hi @abhinaykukkadapu , I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend. Could you try your internal CI again?

Just imported again, will monitor, meanwhile can you try this: #18151 (comment)

Updated. I tried this comment and it worked. It appears I didn't rebuild quantized_aot_lib, which resulted in it not loading correctly.

abhinaykukkadapu · 2026-04-08T06:06:15Z

Hi @abhinaykukkadapu , I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend. Could you try your internal CI again?

Just imported again, will monitor, meanwhile can you try this: #18151 (comment)

Updated. I tried this comment and it worked. It appears I didn't rebuild quantized_aot_lib, which resulted in it not loading correctly.

Awesome, just imported again, will look at ci in the morning.

shewu-quic · 2026-04-08T09:24:35Z

@claude Could you please check if the buck build is working?

abhinaykukkadapu · 2026-04-08T17:02:09Z

@claude Could you please check if the buck build is working?

claude · 2026-04-08T17:02:57Z

Claude finished @abhinaykukkadapu's task in 1m 50s —— View job

Buck Build File Analysis

Examine buck targets.bzl files for LPAI additions
Cross-reference all new LPAI source files against build targets
Check CMake vs buck consistency
Verify quantizer/targets.bzl deps match latest review feedback
Provide summary of findings

`runtime/targets.bzl` — Looks correct

The LPAI backend sources are properly included via glob patterns:

Pattern	Files matched	Status
`backends/lpai/*.cpp`	`LpaiBackend.cpp`, `LpaiBackendCustomConfig.cpp`, `LpaiContext.cpp`, `LpaiGraph.cpp`	OK
`backends/lpai/host/.cpp` (AOT) / `backends/lpai/target/.cpp` (device)	`host/LpaiDevice.cpp`, `host/LpaiContextCustomConfig.cpp` / `target/LpaiDevice.cpp`, `target/LpaiContextCustomConfig.cpp`	OK
`backends/lpai/*.h`	All 6 headers (`LpaiBackend.h`, `LpaiBackendCustomConfig.h`, `LpaiContext.h`, `LpaiContextCustomConfig.h`, `LpaiDevice.h`, `LpaiGraph.h`)	OK

The host/target directory convention matches the existing pattern used by GPU, HTP, and IR backends. All 8 .cpp files and 6 .h files are covered.

`quantizer/targets.bzl` — Updated per review feedback

Current deps:

deps = [
    "//executorch/backends/transforms:decompose_sdpa",
    "//executorch/kernels/quantized:aot_lib",
    "//executorch/kernels/quantized:quantized_ops_lib",
],

This reflects the latest review feedback from @abhinaykukkadapu — portable_lib was removed and replaced with aot_lib. The quantizer.py import now only uses import executorch.kernels.quantized (wrapped in try/except), which aligns with the aot_lib and quantized_ops_lib deps. This looks correct.

The glob ["*.py", "*/*.py"] properly captures the new annotators/lpai_rules.py file.

`aot/python/targets.bzl` — No issues

The PyQnnManagerAdaptor target already had quantized:aot_lib and quantized:quantized_ops_lib, and it depends on runtime:runtime which now includes LPAI via the runtime targets.bzl glob patterns. No changes needed here.

CMake vs Buck consistency

All LPAI files present in backends/CMakeLists.txt are also covered by the buck globs in runtime/targets.bzl:

CMake target	LPAI files	Buck coverage
`qnn_device`	`LpaiDevice.h`, `${platform}/LpaiDevice.cpp`	`backends/lpai/.h` + `backends/lpai/{host,target}/.cpp`
`qnn_context`	`LpaiContext.h`, `LpaiContext.cpp`, `LpaiContextCustomConfig.h`, `${platform}/LpaiContextCustomConfig.cpp`	`backends/lpai/.{h,cpp}` + `backends/lpai/{host,target}/.cpp`
`qnn_graph`	`LpaiGraph.h`, `LpaiGraph.cpp`	`backends/lpai/*.{h,cpp}`
`qnn_backend`	`LpaiBackend.h`, `LpaiBackend.cpp`, `LpaiBackendCustomConfig.h`, `LpaiBackendCustomConfig.cpp`	`backends/lpai/*.{h,cpp}`

CMake uses HOST_ARCHITECTURE_LPAI = lpai/${platform} where platform = host for x86_64 and target otherwise — this matches the buck include_aot_qnn_lib conditional exactly. Consistent.

Key difference: CMake is more granular, buck is monolithic

Note that CMake splits LPAI files across multiple fine-grained targets (qnn_device, qnn_context, qnn_graph, qnn_backend), while buck collects everything into a single runtime library via broad globs. This is the existing architectural pattern for all backends (HTP, GPU, IR) and is expected.

Summary

The buck build configuration appears correct. All new LPAI source files (8 .cpp, 6 .h) are properly covered by the glob patterns in runtime/targets.bzl. The quantizer/targets.bzl deps have been updated per @abhinaykukkadapu's feedback (using aot_lib instead of portable_lib). CMake and buck are consistent in their LPAI file coverage. No missing files or dependency issues were found in the static analysis.

Note: I can only do static analysis of the build files — I cannot actually execute a buck build. If you need a live buck build test, that would need to be done via the internal CI.

View job

abhinaykukkadapu · 2026-04-08T22:59:55Z

@shewu-quic sorry and thanks for patiently updating the PR, one last patch and i'm gonna merge it and fix forward if there are any more failures, but i believe this would resolve the failures as i put a dummy forward diff internally and it seems to pass. The problem is aot_lib is whole archiving the library and is causing some issues with transitive dependencies.

  diff --git a/backends/qualcomm/aot/python/targets.bzl b/backends/qualcomm/aot/python/targets.bzl                                                                                                    
  --- a/backends/qualcomm/aot/python/targets.bzl                                                                                                                                                      
  +++ b/backends/qualcomm/aot/python/targets.bzl                                                                                                                                                      
  @@ -26,8 +26,6 @@                                                                                                                                                                                   
               "-DEXECUTORCH_PYTHON_MODULE_NAME={}".format(PYTHON_MODULE_NAME),                                                                                                                       
           ],
           deps = [                                                                                                                                                                                   
  -            "//executorch/kernels/quantized:aot_lib",    
  -            "//executorch/kernels/quantized:quantized_ops_lib",                                                                                                                                    
               "//executorch/runtime/core:core",
               "//executorch/backends/qualcomm/aot/python:python_lib",                                                                                                                                
               "//executorch/backends/qualcomm/aot/wrappers:wrappers",
  diff --git a/backends/qualcomm/quantizer/targets.bzl b/backends/qualcomm/quantizer/targets.bzl                                                                                                      
  --- a/backends/qualcomm/quantizer/targets.bzl
  +++ b/backends/qualcomm/quantizer/targets.bzl                                                                                                                                                       
  @@ -15,8 +15,5 @@                                         
           visibility = ["PUBLIC"],                                                                                                                                                                   
           deps = [                                         
               "//executorch/backends/transforms:decompose_sdpa",
  -            "//executorch/kernels/quantized:aot_lib",                                                                                                                                              
  -            "//executorch/kernels/quantized:quantized_ops_lib",
  -                                                                                                                                                                                                   
           ],                                               
       )

Summary: - Add LPAI infra - Because of the accuracy isue with quantize and dequantize operations in LPAI, Q and DQ will fallback to CPU for the LPAI backend. - Support runtime option for `fps`, `ftrt_ratio`, `clent_perf_type`, `affinity`, and `core_selection`. Refer to [QNN doc](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend.html#full-json-scheme) to get more details for these options. - Support LPAI in `test_qnn_delegate.py` - Refactor documentation for LPAI and GPU - Add a script to sign the skel library

shewu-quic · 2026-04-09T00:05:59Z

@shewu-quic sorry and thanks for patiently updating the PR, one last patch and i'm gonna merge it and fix forward if there

Thanks. I have applied this patch and rebased.

shewu-quic requested review from abhinaykukkadapu, cccclai, kirklandsign, larryliu0820 and mergennachin as code owners March 13, 2026 05:34

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2026

shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from ff3c249 to 550ec45 Compare March 13, 2026 05:59

mergennachin requested review from JacobSzwejbka, digantdesai and sxu March 13, 2026 12:41

shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from eaf3e9b to c7e6b61 Compare March 18, 2026 06:57

abhinaykukkadapu reviewed Mar 30, 2026

View reviewed changes

shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from e4a4677 to a404cdd Compare March 31, 2026 04:02

abhinaykukkadapu approved these changes Apr 1, 2026

View reviewed changes

abhinaykukkadapu reviewed Apr 2, 2026

View reviewed changes

shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch 4 times, most recently from 3ee6c1e to c01d384 Compare April 8, 2026 14:01

shewu-quic added 8 commits April 9, 2026 07:59

- Fixed runtime option for profile_level

19f38f5

Fixed wrong check in qnn_partitioner

30e97a7

Fixed lint error

13fe1d9

Fixed buck file and rebase

8a1a7cb

Resolved something typo

7851315

Fixed import target

04f564f

Fixed bzl deps

25c600e

shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from c01d384 to 25c600e Compare April 9, 2026 00:04

abhinaykukkadapu merged commit ddb7762 into pytorch:main Apr 9, 2026
506 of 549 checks passed


		The compiled model is `./deeplab_v3/dlv3_qnn.pte`.

		Note that the model is compiled for specific backend (e.g., HTP), so you can specify the target backend via `--backend gpu` or `--backend lpai`. If not specified, it will be default to HTP.

Conversation

shewu-quic commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Uh oh!

pytorch-bot bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18151

⏳ No Failures, 7 Pending

Uh oh!

github-actions bot commented Mar 13, 2026

This PR needs a release notes: label

Uh oh!

shewu-quic commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu commented Mar 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhinaykukkadapu Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shewu-quic commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhinaykukkadapu commented Mar 31, 2026

Uh oh!

shewu-quic commented Mar 31, 2026

Uh oh!

abhinaykukkadapu commented Mar 31, 2026

Uh oh!

shewu-quic commented Apr 1, 2026

Uh oh!

shewu-quic commented Apr 1, 2026

Uh oh!

meta-codesync bot commented Apr 1, 2026

Uh oh!

abhinaykukkadapu commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shewu-quic commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shewu-quic Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shewu-quic Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

digantdesai commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: Buck/Build File Analysis for LPAI Backend PR

Buck Files Status

shewu-quic commented Mar 13, 2026 •

edited

Loading

pytorch-bot bot commented Mar 13, 2026 •

edited

Loading

This PR needs a `release notes:` label

abhinaykukkadapu Mar 31, 2026 •

edited

Loading

shewu-quic commented Mar 31, 2026 •

edited

Loading

abhinaykukkadapu commented Apr 1, 2026 •

edited

Loading

shewu-quic Apr 7, 2026 •

edited

Loading

shewu-quic Apr 7, 2026 •

edited

Loading

claude bot commented Apr 2, 2026 •

edited

Loading

`backends/qualcomm/runtime/targets.bzl` — Mostly correct

`backends/qualcomm/quantizer/targets.bzl` — Correct but see review comment

claude bot commented Apr 8, 2026 •

edited

Loading

`runtime/targets.bzl` — Looks correct

`quantizer/targets.bzl` — Updated per review feedback

`aot/python/targets.bzl` — No issues

abhinaykukkadapu commented Apr 8, 2026 •

edited

Loading