Skip to content

Qualcomm AI Engine Direct - Support LPAI Backend#18151

Merged
abhinaykukkadapu merged 8 commits intopytorch:mainfrom
CodeLinaro:dev1/hutton/support_LPAI_backend
Apr 9, 2026
Merged

Qualcomm AI Engine Direct - Support LPAI Backend#18151
abhinaykukkadapu merged 8 commits intopytorch:mainfrom
CodeLinaro:dev1/hutton/support_LPAI_backend

Conversation

@shewu-quic
Copy link
Copy Markdown
Collaborator

@shewu-quic shewu-quic commented Mar 13, 2026

Summary:

  • Add LPAI infra
    • Because of the accuracy issue with quantize and dequantize operations in LPAI, Q and DQ will fallback to CPU for the LPAI backend. Therefore, it only support fully delegate for now
    • Support runtime option for fps, ftrt_ratio, clent_perf_type, affinity, and core_selection. Refer to QNN doc to get more details for these options.
  • Support LPAI in test_qnn_delegate.py
  • Refactor documentation for LPAI and GPU
  • Add a script to sign the skel library
    • Note that the skel library must be signed and must run with root on the device for LPAI.

Test Plan

python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_mean  -b build-android  -H ${host} -s ${serial_num} -m SM8850 -r /path/to/executorch -a /path/to/artifacts --backend lpai

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18151

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 7 Pending

As of commit 25c600e with merge base 71334aa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@shewu-quic shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from ff3c249 to 550ec45 Compare March 13, 2026 05:59
@shewu-quic shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from eaf3e9b to c7e6b61 Compare March 18, 2026 06:57
@shewu-quic
Copy link
Copy Markdown
Collaborator Author

Hi @cccclai, @abhinaykukkadapu,

This PR is to support LAPI Backend on x86 and arm (with RPC mode). After LPAI backend is supported, it should be easy to extend non-RPC support for aDSP.
Could you please help to take a look?

Thanks,
Hutton

- SXR2330P
- QCM6490
- QCS9100
You will need an Android / Linux device with adb-connected running on one of Qualcomm SoCs listed in `QcomChipset`. Please navigate to [qc_schema.py](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/serialization/qc_schema.py).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! thanks

],
QnnConstants.OpConv2d.op_name,
)
class Conv2d(GeneralOpDef):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, if we are going to follow up validation for the LPAI op def in following PRs?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once QNN 2.45 is available, validation will be enabled in QNN and we will test using this version.

pl = len(partitions)
if backend == QnnExecuTorchBackendType.kLpaiBackend:
assert (
pl != 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are enforcing full partition, shouldn't this be pl == 1?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, you are right. Let me fix it. Thanks for pointing out.

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

@shewu-quic thanks for adding the LPAI backend, can you please rebase?


The compiled model is `./deeplab_v3/dlv3_qnn.pte`.

Note that the model is compiled for specific backend (e.g., HTP), so you can specify the target backend via `--backend gpu` or `--backend lpai`. If not specified, it will be default to HTP.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question, does the delegate in current state support multiple QNN backends in a single graph, for example, can we mark some nodes for GPU and others for HTP?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have a use case for this, but I think we can run some tests for this scenario. Thanks!
May I know do you have any use case for it?

Copy link
Copy Markdown
Contributor

@abhinaykukkadapu abhinaykukkadapu Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but i wanted to flush if there are any issues and we can advertise stronger support just staying within QNN ecosystem, not sure if we have this but i remember QnnMem.h was one direction to share memory between backends, so it would be great and was curious to know if we support, if a usecase appears.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supported today but can be in the future once a couple of other in progress features come in. We can discuss this at some point in the 2nd half of the year.

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

shewu-quic commented Mar 31, 2026

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

Hi @shewu-quic, thanks these are some test skills we are experimenting across Executorch, but this brings an important direction, which i wanted to discuss in tomorrow's meeting, we definitely would like to have the et-qnn related skills. Some skills i'm thinking:

  1. QNN Profiling and debugger
  2. QNN lowering/ export
  3. QNN Custom kernels
  4. New op development
  5. Model enablement

let's chat more tomorrow in the sync.

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

Hi @shewu-quic, thanks these are some test skills we are experimenting across Executorch, but this brings an important direction, which i wanted to discuss in tomorrow's meeting, we definitely would like to have the et-qnn related skills. Some skills i'm thinking:

  1. QNN Profiling and debugger
  2. QNN lowering/ export
  3. QNN Custom kernels
  4. New op development
  5. Model enablement

let's chat more tomorrow in the sync.

Sure, I think that would be an interesting topic to include. If you can, please share your experience with how you plan to contribute to and manage the Claude file. Also, what are the current use cases in the executorch repo in tomorrow's meeting?

Thanks!

@shewu-quic shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from e4a4677 to a404cdd Compare March 31, 2026 04:02
@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

@shewu-quic can you fix the lints, i will import and check the internal CI as this is a large diff, just to be on the safe side.

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

Hi @abhinaykukkadapu, I was wondering—since you uploaded some files related to claude, do you have any plans to use Claude for development or maintenance?

Hi @shewu-quic, thanks these are some test skills we are experimenting across Executorch, but this brings an important direction, which i wanted to discuss in tomorrow's meeting, we definitely would like to have the et-qnn related skills. Some skills i'm thinking:

  1. QNN Profiling and debugger
  2. QNN lowering/ export
  3. QNN Custom kernels
  4. New op development
  5. Model enablement

let's chat more tomorrow in the sync.

Could you let me know if you recommend having a separate SKILL.md for each skill, or should they all be combined into one SKILL.md?
I'm currently creating a folder named qualcomm in .claude/skills. Is this what you expected?

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

@shewu-quic can you fix the lints, i will import and check the internal CI as this is a large diff, just to be on the safe side.

Fixed. Thanks!

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 1, 2026

@abhinaykukkadapu has imported this pull request. If you are a Meta employee, you can view this in D99040563.

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

abhinaykukkadapu commented Apr 1, 2026

@shewu-quic there are few internal build failures, as discussed yesterday i'm adding the CI to catch buck and cmake drifts like these. Here is the patch for you.

--- a/backends/qualcomm/quantizer/targets.bzl
  +++ b/backends/qualcomm/quantizer/targets.bzl
  @@ -15,5 +15,7 @@
           visibility = ["PUBLIC"],
           deps = [
               "//executorch/backends/transforms:decompose_sdpa",
  +            "//executorch/extension/pybindings:portable_lib",
  +            "//executorch/kernels/quantized:quantized_ops_lib",
           ],
       )
  --- a/backends/qualcomm/runtime/targets.bzl
  +++ b/backends/qualcomm/runtime/targets.bzl
  @@ -47,11 +47,17 @@
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  +                    "backends/lpai/*.cpp",
                   ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/gpu/target/*.cpp"]) + (
                       ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/htp/target/*.cpp"]) + (
  -                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"])
   + (
  +                    ["backends/lpai/x86_64/*.cpp"] if include_aot_qnn_lib else
  ["backends/lpai/aarch64/*.cpp"]
                   ),
  -                exclude = ["Logging.cpp"],
  +                exclude = [
  +                    "Logging.cpp",
  +                    # AfterConfigure is not declared in LpaiGraph - dead code from import
  +                    "backends/lpai/aarch64/LpaiGraph.cpp",
  +                ],
               ),
               exported_headers = glob(
                   [
  @@ -60,6 +66,7 @@
                       "backends/gpu/*.h",
                       "backends/htp/*.h",
                       "backends/ir/*.h",
  +                    "backends/lpai/*.h",
                   ],
                   exclude = ["Logging.h"],
               ),

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

@shewu-quic there are few internal build failures, as discussed yesterday i'm adding the CI to catch buck and cmake drifts like these. Here is the patch for you.

--- a/backends/qualcomm/quantizer/targets.bzl
  +++ b/backends/qualcomm/quantizer/targets.bzl
  @@ -15,5 +15,7 @@
           visibility = ["PUBLIC"],
           deps = [
               "//executorch/backends/transforms:decompose_sdpa",
  +            "//executorch/extension/pybindings:portable_lib",
  +            "//executorch/kernels/quantized:quantized_ops_lib",
           ],
       )
  --- a/backends/qualcomm/runtime/targets.bzl
  +++ b/backends/qualcomm/runtime/targets.bzl
  @@ -47,11 +47,17 @@
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  +                    "backends/lpai/*.cpp",
                   ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/gpu/target/*.cpp"]) + (
                       ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else
  ["backends/htp/target/*.cpp"]) + (
  -                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"])
   + (
  +                    ["backends/lpai/x86_64/*.cpp"] if include_aot_qnn_lib else
  ["backends/lpai/aarch64/*.cpp"]
                   ),
  -                exclude = ["Logging.cpp"],
  +                exclude = [
  +                    "Logging.cpp",
  +                    # AfterConfigure is not declared in LpaiGraph - dead code from import
  +                    "backends/lpai/aarch64/LpaiGraph.cpp",
  +                ],
               ),
               exported_headers = glob(
                   [
  @@ -60,6 +66,7 @@
                       "backends/gpu/*.h",
                       "backends/htp/*.h",
                       "backends/ir/*.h",
  +                    "backends/lpai/*.h",
                   ],
                   exclude = ["Logging.h"],
               ),

Sure, I have applied it. Thank you.

from typing import Callable, Dict, List, Optional, Sequence, Set, Tuple

# To support quantize op lowering in AOT
import executorch.extension.pybindings.portable_lib # noqa[F401]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shewu-quic i think this doesn't belong here, i'm trying to debug few other failures for the internal CI after the patch, this is for python runtime and not for backends?

Copy link
Copy Markdown
Collaborator Author

@shewu-quic shewu-quic Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't import this, I will encounter the error
Reproduce command: python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_linear -b build-android -s {device}-m SM8850 -a unit_test --backend lpai

[INFO 2026-04-07 09:36:21,233 __init__.py:390] Failed converting '<EdgeOpOverload: quantized_decomposed.quantize_per_tensor.default>: schema = quantized_decomposed::quantize_per_tensor(Tensor input, float scale, int zero_point, int
 quant_min, int quant_max, ScalarType dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::quantize_per_tensor can't be found. We've found the schemas of all the overloads: ['qu
antized_decomposed::quantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype) -> Tensor', 'quantized_decomposed::quantize_per_tensor.tensor(Tensor input, Tensor scale, Tensor zer
o_point, int quant_min, int quant_max, ScalarType dtype) -> Tensor', 'quantized_decomposed::quantize_per_tensor.tensor2(Tensor input, Tensor scale, Tensor zero_point, Tensor quant_min, Tensor quant_max, ScalarType dtype) -> Tensor'
]'
[INFO 2026-04-07 09:36:21,233 __init__.py:390] Failed converting '<EdgeOpOverload: quantized_decomposed.dequantize_per_tensor.default>: schema = quantized_decomposed::dequantize_per_tensor(Tensor input, float scale, int zero_point,
 int quant_min, int quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::dequantize_per_tensor can't be found. We've found t
he schemas of all the overloads: ['quantized_decomposed::dequantize_per_tensor(Tensor input, float scale, int zero_point, int quant_min, int quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor', 'quantized_decompo
sed::dequantize_per_tensor.tensor(Tensor input, Tensor scale, Tensor zero_point, int quant_min, int quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor', 'quantized_decomposed::dequantize_per_tensor.tensor2(Tensor
 input, Tensor scale, Tensor zero_point, Tensor quant_min, Tensor quant_max, ScalarType dtype, *, ScalarType? out_dtype=None) -> Tensor']'

======================================================================
ERROR: test_qnn_backend_linear (__main__.TestQNNQuantizedOperator) (i=0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/local/mnt/workspace/shewu/executorch/backends/qualcomm/tests/test_qnn_delegate.py", line 3756, in test_qnn_backend_linear
    self.lower_module_and_test_output(module, sample_input)
  File "/local/mnt/workspace/shewu/executorch/backends/qualcomm/tests/utils.py", line 592, in lower_module_and_test_output
    exec_prog = delegated_program.to_executorch(
  File "/local/mnt/workspace/shewu/executorch/exir/program/_program.py", line 116, in wrapper
    return func(*args, **kwargs)
  File "/local/mnt/workspace/shewu/executorch/exir/program/_program.py", line 1751, in to_executorch
    new_gm_res = p(new_gm)
  File "/local2/mnt/workspace/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 47, in __call__
    res = self.call(graph_module)
  File "/local/mnt/workspace/shewu/executorch/exir/passes/__init__.py", line 440, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'quantized_decomposed::dequantize_per_tensor', 'quantized_decomposed::quantize_per_tensor'}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the quantized kernel will be missing if I don't import it. Are you thinking about moving this import here?

Copy link
Copy Markdown
Collaborator Author

@shewu-quic shewu-quic Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we wrap it to try-except block? Such as

# Load quantized ops library.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is portable_lib dependency is huge and it has its registration mechanism, if we do it here, many internal tests are failing due to double registration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think import executorch.kernels.quantized is just sufficient? as it registers the required aot quant library: https://github.com/pytorch/executorch/blob/main/kernels/quantized/__init__.py

I think you also need to enable the cmake target -DEXECUTORCH_BUILD_KERNELS_QUANTIZED_AOT=ON

Update the buck target for quantizer from portable_lib to

deps = [                                                                                                                                                                                            
      "//executorch/backends/transforms:decompose_sdpa",
      "//executorch/kernels/quantized:aot_lib",                                                                                                                                                       
      "//executorch/kernels/quantized:quantized_ops_lib",
  ],

@digantdesai
Copy link
Copy Markdown
Contributor

@claude Review this PR, make sure buck files are updated properly, there are no test in the CI which may signal buck issues.

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @digantdesai's task in 4m 51s —— View job


Review: Buck/Build File Analysis for LPAI Backend PR

  • Examine buck/targets.bzl files for LPAI additions
  • Check CMakeLists.txt vs buck targets for consistency
  • Verify all new LPAI source files are covered in build targets
  • Review the patch from @abhinaykukkadapu for completeness
  • Check for any remaining buck/build gaps
  • Provide review summary

Buck Files Status

The buck targets.bzl files have been updated with the patch from @abhinaykukkadapu and appear to be in reasonable shape, but there are a few issues worth flagging.

backends/qualcomm/runtime/targets.bzl — Mostly correct

The LPAI backend files are properly included in the buck glob patterns:

  • "backends/lpai/*.cpp" — correctly matches all 4 top-level .cpp files (LpaiBackend.cpp, LpaiBackendCustomConfig.cpp, LpaiContext.cpp, LpaiGraph.cpp)
  • "backends/lpai/host/*.cpp" / "backends/lpai/target/*.cpp" — correctly matches the platform-specific files using the same host/target convention as GPU, HTP, and IR backends
  • "backends/lpai/*.h" — correctly matches all 6 header files

Note on the original patch: The patch @abhinaykukkadapu posted in the comments mentioned x86_64 and aarch64 directories and an exclusion for backends/lpai/aarch64/LpaiGraph.cpp (with the comment "AfterConfigure is not declared"). The actual code uses host/ and target/ directories (no x86_64/aarch64 exist), and LpaiGraph.cpp correctly implements AfterRetrieveGraph and AfterCreateGraph (not AfterConfigure). The applied patch in the repo correctly uses host/target and does NOT include the erroneous LpaiGraph.cpp exclusion — this is correct.

Potential concern with exclude: The current exclude is just ["Logging.cpp"]. Unlike the original patch which expanded the exclude list, the current code relies on the glob patterns not matching unwanted files. This is fine since all .cpp files in the lpai/ directory are intentionally included.

backends/qualcomm/quantizer/targets.bzl — Correct but see review comment

Two new deps were added:

"//executorch/extension/pybindings:portable_lib",
"//executorch/kernels/quantized:quantized_ops_lib",

These correspond to the new imports in quantizer.py (lines 12-13):

import executorch.extension.pybindings.portable_lib  # noqa[F401]
import executorch.kernels.quantized  # noqa[F401]

@abhinaykukkadapu flagged this in the latest review comment — these imports are for python runtime (pybindings/portable_lib, quantized_ops_lib) and may not belong in the quantizer module. This is a valid concern: importing the full portable runtime and quantized kernels library as a side effect import at module load time in quantizer.py (just for "quantize op lowering in AOT") adds heavyweight dependencies to what should be a lightweight quantization annotation tool. This should be addressed.

Fix this →

CMakeLists.txt vs Buck Consistency

The CMakeLists.txt at backends/qualcomm/runtime/backends/CMakeLists.txt covers all LPAI files correctly:

CMake target LPAI files included Status
qnn_device LpaiDevice.h, ${HOST_ARCHITECTURE_LPAI}/LpaiDevice.cpp
qnn_context LpaiContext.h, LpaiContext.cpp, LpaiContextCustomConfig.h, ${HOST_ARCHITECTURE_LPAI}/LpaiContextCustomConfig.cpp
qnn_graph LpaiGraph.h, LpaiGraph.cpp
qnn_backend LpaiBackend.h, LpaiBackend.cpp, LpaiBackendCustomConfig.h, LpaiBackendCustomConfig.cpp
Headers exported backends/lpai/*.h

The HOST_ARCHITECTURE_LPAI variable is set to lpai/${platform} where platform is host for x86_64 and target otherwise — matching the buck convention.

Other Issues Found During Review

  1. Naming inconsistency in runtime option member variables (QnnExecuTorchBackend.h:69-70): Two member variables use lapi instead of lpai:

    RuntimeOption qnn_runtime_lapi_fps_{false, 0};       // should be lpai
    RuntimeOption qnn_runtime_lapi_ftrt_ratio_{false, 0}; // should be lpai

    While the remaining three use lpai correctly. The macro keys in QnnExecuTorch.h all use LPAI correctly (QNN_RUNTIME_LPAI_FPS, etc.), so this is just a member variable naming inconsistency. It works correctly because the key strings match, but it's confusing.

    Fix this →

  2. Typo in runtime option key (QnnExecuTorch.h:25):

    #define QNN_RUNTIME_LPAI_CLIENT_PERF_TYPE "qnn_runtime_lpai_clent_perf_type"

    clent should be client. This is the actual string key used at runtime, so it would need to match on both the set_option and get_option sides. Since both sides use the same macro, it's functionally correct but will be confusing for anyone using the string directly.

    Fix this →

  3. quantizer.py imports (flagged by @abhinaykukkadapu): The executorch.extension.pybindings.portable_lib and executorch.kernels.quantized imports at the top of quantizer.py pull in heavy runtime dependencies. This needs to be resolved — either move them to the call site that actually needs them, or add a clear explanation of why they must be at module level.

Summary

Buck files are properly updated for the LPAI backend. All new source files are covered by glob patterns in runtime/targets.bzl, and the quantizer deps in quantizer/targets.bzl match the new imports (though the imports themselves may need relocation per reviewer feedback). CMake and buck are consistent in their LPAI file coverage. The minor issues flagged above (naming inconsistency, typo, questionable imports) are not buck-build-breaking but should be cleaned up.


@shewu-quic
Copy link
Copy Markdown
Collaborator Author

Hi @abhinaykukkadapu ,
I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend.
Could you try your internal CI again?

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

Hi @abhinaykukkadapu , I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend. Could you try your internal CI again?

Just imported again, will monitor, meanwhile can you try this: #18151 (comment)

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

Hi @abhinaykukkadapu , I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend. Could you try your internal CI again?

Just imported again, will monitor, meanwhile can you try this: #18151 (comment)

Updated. I tried this comment and it worked. It appears I didn't rebuild quantized_aot_lib, which resulted in it not loading correctly.

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

Hi @abhinaykukkadapu , I have fixed some typo and add try-except block for import executorch.extension.pybindings.portable_lib which should be required for LPAI backend. Could you try your internal CI again?

Just imported again, will monitor, meanwhile can you try this: #18151 (comment)

Updated. I tried this comment and it worked. It appears I didn't rebuild quantized_aot_lib, which resulted in it not loading correctly.

Awesome, just imported again, will look at ci in the morning.

@shewu-quic
Copy link
Copy Markdown
Collaborator Author

@claude Could you please check if the buck build is working?

@shewu-quic shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch 4 times, most recently from 3ee6c1e to c01d384 Compare April 8, 2026 14:01
@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

@claude Could you please check if the buck build is working?

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Claude finished @abhinaykukkadapu's task in 1m 50s —— View job


Buck Build File Analysis

  • Examine buck targets.bzl files for LPAI additions
  • Cross-reference all new LPAI source files against build targets
  • Check CMake vs buck consistency
  • Verify quantizer/targets.bzl deps match latest review feedback
  • Provide summary of findings

runtime/targets.bzl — Looks correct

The LPAI backend sources are properly included via glob patterns:

Pattern Files matched Status
backends/lpai/*.cpp LpaiBackend.cpp, LpaiBackendCustomConfig.cpp, LpaiContext.cpp, LpaiGraph.cpp OK
backends/lpai/host/*.cpp (AOT) / backends/lpai/target/*.cpp (device) host/LpaiDevice.cpp, host/LpaiContextCustomConfig.cpp / target/LpaiDevice.cpp, target/LpaiContextCustomConfig.cpp OK
backends/lpai/*.h All 6 headers (LpaiBackend.h, LpaiBackendCustomConfig.h, LpaiContext.h, LpaiContextCustomConfig.h, LpaiDevice.h, LpaiGraph.h) OK

The host/target directory convention matches the existing pattern used by GPU, HTP, and IR backends. All 8 .cpp files and 6 .h files are covered.

quantizer/targets.bzl — Updated per review feedback

Current deps:

deps = [
    "//executorch/backends/transforms:decompose_sdpa",
    "//executorch/kernels/quantized:aot_lib",
    "//executorch/kernels/quantized:quantized_ops_lib",
],

This reflects the latest review feedback from @abhinaykukkadapuportable_lib was removed and replaced with aot_lib. The quantizer.py import now only uses import executorch.kernels.quantized (wrapped in try/except), which aligns with the aot_lib and quantized_ops_lib deps. This looks correct.

The glob ["*.py", "*/*.py"] properly captures the new annotators/lpai_rules.py file.

aot/python/targets.bzl — No issues

The PyQnnManagerAdaptor target already had quantized:aot_lib and quantized:quantized_ops_lib, and it depends on runtime:runtime which now includes LPAI via the runtime targets.bzl glob patterns. No changes needed here.

CMake vs Buck consistency

All LPAI files present in backends/CMakeLists.txt are also covered by the buck globs in runtime/targets.bzl:

CMake target LPAI files Buck coverage
qnn_device LpaiDevice.h, ${platform}/LpaiDevice.cpp backends/lpai/*.h + backends/lpai/{host,target}/*.cpp
qnn_context LpaiContext.h, LpaiContext.cpp, LpaiContextCustomConfig.h, ${platform}/LpaiContextCustomConfig.cpp backends/lpai/*.{h,cpp} + backends/lpai/{host,target}/*.cpp
qnn_graph LpaiGraph.h, LpaiGraph.cpp backends/lpai/*.{h,cpp}
qnn_backend LpaiBackend.h, LpaiBackend.cpp, LpaiBackendCustomConfig.h, LpaiBackendCustomConfig.cpp backends/lpai/*.{h,cpp}

CMake uses HOST_ARCHITECTURE_LPAI = lpai/${platform} where platform = host for x86_64 and target otherwise — this matches the buck include_aot_qnn_lib conditional exactly. Consistent.

Key difference: CMake is more granular, buck is monolithic

Note that CMake splits LPAI files across multiple fine-grained targets (qnn_device, qnn_context, qnn_graph, qnn_backend), while buck collects everything into a single runtime library via broad globs. This is the existing architectural pattern for all backends (HTP, GPU, IR) and is expected.

Summary

The buck build configuration appears correct. All new LPAI source files (8 .cpp, 6 .h) are properly covered by the glob patterns in runtime/targets.bzl. The quantizer/targets.bzl deps have been updated per @abhinaykukkadapu's feedback (using aot_lib instead of portable_lib). CMake and buck are consistent in their LPAI file coverage. No missing files or dependency issues were found in the static analysis.

Note: I can only do static analysis of the build files — I cannot actually execute a buck build. If you need a live buck build test, that would need to be done via the internal CI.


View job

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

abhinaykukkadapu commented Apr 8, 2026

@shewu-quic sorry and thanks for patiently updating the PR, one last patch and i'm gonna merge it and fix forward if there are any more failures, but i believe this would resolve the failures as i put a dummy forward diff internally and it seems to pass. The problem is aot_lib is whole archiving the library and is causing some issues with transitive dependencies.

  diff --git a/backends/qualcomm/aot/python/targets.bzl b/backends/qualcomm/aot/python/targets.bzl                                                                                                    
  --- a/backends/qualcomm/aot/python/targets.bzl                                                                                                                                                      
  +++ b/backends/qualcomm/aot/python/targets.bzl                                                                                                                                                      
  @@ -26,8 +26,6 @@                                                                                                                                                                                   
               "-DEXECUTORCH_PYTHON_MODULE_NAME={}".format(PYTHON_MODULE_NAME),                                                                                                                       
           ],
           deps = [                                                                                                                                                                                   
  -            "//executorch/kernels/quantized:aot_lib",    
  -            "//executorch/kernels/quantized:quantized_ops_lib",                                                                                                                                    
               "//executorch/runtime/core:core",
               "//executorch/backends/qualcomm/aot/python:python_lib",                                                                                                                                
               "//executorch/backends/qualcomm/aot/wrappers:wrappers",
  diff --git a/backends/qualcomm/quantizer/targets.bzl b/backends/qualcomm/quantizer/targets.bzl                                                                                                      
  --- a/backends/qualcomm/quantizer/targets.bzl
  +++ b/backends/qualcomm/quantizer/targets.bzl                                                                                                                                                       
  @@ -15,8 +15,5 @@                                         
           visibility = ["PUBLIC"],                                                                                                                                                                   
           deps = [                                         
               "//executorch/backends/transforms:decompose_sdpa",
  -            "//executorch/kernels/quantized:aot_lib",                                                                                                                                              
  -            "//executorch/kernels/quantized:quantized_ops_lib",
  -                                                                                                                                                                                                   
           ],                                               
       )

Summary:

- Add LPAI infra
  - Because of the accuracy isue with quantize and dequantize
    operations in LPAI, Q and DQ will fallback to CPU for the LPAI
backend.
  - Support runtime option for `fps`, `ftrt_ratio`, `clent_perf_type`,
    `affinity`, and `core_selection`. Refer to [QNN
doc](https://docs.qualcomm.com/doc/80-63442-10/topic/lpai_backend.html#full-json-scheme)
to get more details for these options.
- Support LPAI in `test_qnn_delegate.py`
- Refactor documentation for LPAI and GPU
- Add a script to sign the skel library
@shewu-quic shewu-quic force-pushed the dev1/hutton/support_LPAI_backend branch from c01d384 to 25c600e Compare April 9, 2026 00:04
@shewu-quic
Copy link
Copy Markdown
Collaborator Author

@shewu-quic sorry and thanks for patiently updating the PR, one last patch and i'm gonna merge it and fix forward if there

Thanks. I have applied this patch and rebased.

@abhinaykukkadapu abhinaykukkadapu merged commit ddb7762 into pytorch:main Apr 9, 2026
506 of 549 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants