Support extra ops modes for LLM Models by Jiseong-oh · Pull Request #18670 · pytorch/executorch

Jiseong-oh · 2026-04-02T11:58:30Z

Summary

Add Extra Ops for Exynos Backend

Cos, GroupNorm, Index, Log, Pow, Rms_Norm, Sigmoid, sing, Splite_with_sizes_copy, sum_int_list, tanh and Topk
All ops are verfied
These ops will be used for supporting LLM models

Support Perf mode

It can be supported Perf mode with experimental
This mode MUST BE used to verify model on exynos device farm fristly before testing on the phone

cc @SS-JIA @digantdesai @kimishpatel

pytorch-bot · 2026-04-02T11:58:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18670

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 5 Unrelated Failures

As of commit de3bae4 with merge base b5ae0b9 ():

NEW FAILURE - The following job has failed:

Cadence Build & Test / cpu-test / test-aot / test-aot (gh)
backends/cadence/aot/tests/test_replace_ops_passes.py::TestReplaceOpsPasses::test_replace_transposed_conv_with_linear_0

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / test-models-macos-cpu (llama3_2_vision_encoder, portable) / macos-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-editable / macos / macos-job (gh) (similar failure)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
trunk / unittest-release / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-02T11:59:37Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

The previously supported single ops are being migrated to the current dev branch, as follows. cos, group_norm, index, log, pow, rms_norm, sigmoid, sin, split_with_sizes_copy, sum_int_list, tanh and topk. Co-authored-by: xz.linghu <xz.linghu@samsung.com> Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

LiteCore provide a new api named "graphgen_set_perf_mode". This commit invokes this api to set performance mode. Co-authored-by: Jingya Zhang <jingya.zhang@samsung.com> Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

Internal IR remove "input_type" from Gather op. This commit removes "input_type" setting and set Gather inputs Co-authored-by: Jingya Zhang <jingya.zhang@samsung.com> Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

SS-JIA · 2026-04-09T16:23:57Z

@claude help review this PR

claude · 2026-04-09T16:24:44Z

Claude finished @SS-JIA's task in 4m 41s —— View job

Review of PR #18670: Support extra ops modes for LLM Models

Gather context and understand the PR changes
Review new op builders
Review compose_rms_norm pass
Review test files
Review serialization and compile options changes
Review example script changes and perf mode support
Post comprehensive review

Overview

This PR adds 12 new op builders for the Samsung Exynos backend (Cos, GroupNorm, Index, Log, Pow, RmsNorm, Sigmoid, Sin, SplitWithSizesCopy, SumIntList, Tanh, TopK), a RecomposeRmsNorm pass, and performance mode support via a new PerformanceMode enum. Each new op has a corresponding unit test. All example scripts are updated to use PerformanceMode.HIGH_PERFORMANCE by default.

Overall the PR is well-structured and follows existing patterns. Below are the issues I found, organized by severity.

Issues

1. Bug: `op_pow.py` target vs test mismatch — `backends/samsung/builders/op_pow.py:20`

The builder registers target = "aten.pow.Tensor_Tensor", but the test in test_pow.py:36 checks for torch.ops.aten.pow.Tensor_Scalar (using x**2.0 which produces Tensor_Scalar). Additionally, the partitioner adds exir_ops.edge.aten.pow.Tensor_Scalar to SUPPORTED_OPS (enn_partitioner.py:42).

This means the builder only handles Tensor_Tensor pow, but the test and partitioner expect Tensor_Scalar. The Tensor_Scalar variant would match via the partitioner's SUPPORTED_OPS list and presumably be handled by the ReplaceOpsWithScalar pass, but the builder itself would never directly match Tensor_Scalar ops. If both variants need builder support, the builder's target should be a list including both. If only Tensor_Tensor is intended for the builder and Tensor_Scalar is handled through the scalar replacement pass, the test should be updated to test the Tensor_Tensor case that actually exercises the builder.

Fix this →

2. `compose_rms_norm.py` — potential `_get_eps_node` / `_get_gamma_node` returning `None` — `backends/samsung/_passes/compose_rms_norm.py:22-33`

def _get_eps_node(self, nodes):
    add_node = [n for n in nodes if hasattr(n, "name") and "add" in n.name][0]
    for a in add_node.args:
        if isinstance(a, float) or a.op != "call_function":
            return a

def _get_gamma_node(self, output_node):
    for a in output_node.args:
        if a.op != "call_function":
            return a

If no arg matches the condition, these methods implicitly return None, which would then be passed as the eps/gamma argument to rms_norm. Consider raising an error if the expected node is not found rather than silently passing None.

Fix this →

3. `compose_rms_norm.py` — fragile node name matching — `backends/samsung/_passes/compose_rms_norm.py:24`

add_node = [n for n in nodes if hasattr(n, "name") and "add" in n.name][0]

Matching by "add" in n.name is fragile — node names like "add_padding", "broadcast_add", etc. could be false positives. The Qualcomm version (backends/qualcomm/_passes/recompose_rms_norm.py) matches by checking node.target against known ATen add ops, which is more robust. Consider matching on node.target instead of node.name.

Fix this →

4. Hardcoded high-performance mode in all examples — `examples/samsung/aot_compiler.py:81`

Every example script now unconditionally uses PerformanceMode.HIGH_PERFORMANCE. The PR description warns "This mode MUST BE used to verify model on exynos device farm firstly before testing on the phone." But none of the scripts expose this as a CLI argument, so users can't opt out. The aot_compiler.py (the general-purpose compiler) especially should probably default to DEFAULT or expose --perf_mode as an argument.

5. Copyright header — `backends/samsung/_passes/compose_rms_norm.py:1`

The file has # Copyright (c) Qualcomm Innovation Center, Inc as the first copyright line. This file is derived from backends/qualcomm/_passes/recompose_rms_norm.py (same class name, similar approach), using a get_source_partitions-based implementation. The Qualcomm copyright is appropriate if code was derived from their work, but verify this dual-copyright header is intentional.

6. `gen_samsung_backend_compile_spec` type hint — `backends/samsung/serialization/compile_options.py:76`

def gen_samsung_backend_compile_spec(
    chipset: str,
    perf_mode: PerformanceMode = None,
):

The type hint says PerformanceMode but the default is None. Should be Optional[PerformanceMode] = None for correctness.

7. Inconsistent op name casing in `define_op` calls

Most existing op builders use ALL_CAPS for the op type string (e.g., "SQRT", "SIGMOID", "TANH", "LOG"), but some new ops use mixed casing:

op_sin.py:31: "Sin" (vs "SIN")
op_cos.py:31: "Cos" (vs "COS")
op_topk.py:74: "TopK" (vs "TOPK")

The existing op_hardsigmoid.py also uses "HardSigmoid" (mixed case), so this may be intentional per the ENN backend's expected op names. Worth confirming these are the exact strings the backend library expects.

8. Minor: `op_split_with_sizes_copy.py` missing return type annotation — line 26

The define_node method is missing the -> None return type annotation, unlike all other new builders.

9. Minor: `op_topk.py` shadows built-in `sorted` — line 70

sorted = cast(bool, node.args[4])

Shadows the Python built-in sorted. Not a functional bug but not great practice.

Positive observations

Tests cover multiple configurations (e.g., index on different axes, split with different chunk sizes, sum with/without keepdims, topk with different dims).
The RecomposeRmsNorm pass correctly uses get_source_partitions with torch.nn.RMSNorm source matching, which is a clean approach.
The FlatBuffers schema change for PerformanceMode is clean and backward-compatible (default = 0).
The @experimental decorator on PerformanceMode is a good way to signal that this feature is not yet stable.

Summary

The main issues to address are:

The pow target mismatch (issue Add support for quantized LeakyReLU #1) — verify whether the builder covers the right variant and ensure the test exercises the builder path
Fragile RMS norm pass (issues Re-sync with internal repository #2, Rename _pt2e to pt2e #3) — _get_eps_node/_get_gamma_node could return None and name-based matching is fragile
Hardcoded high-performance mode in all examples (issue Add unlifting pass under private config #4) — consider making this configurable or defaulting to DEFAULT
Type hint correctness (issue Re-sync with internal repository #6) — Optional[PerformanceMode]

The rest are minor style/consistency observations.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2026

Jiseong-oh force-pushed the extra_ops_modes branch 2 times, most recently from e0c4b66 to 27b259e Compare April 2, 2026 12:05

Jiseong-oh requested review from JacobSzwejbka, SS-JIA, abhinaykukkadapu, cccclai, digantdesai, kirklandsign, larryliu0820, lucylq, mergennachin and robert-kalmar as code owners April 2, 2026 12:05

Jiseong-oh force-pushed the extra_ops_modes branch from 27b259e to b7f784e Compare April 2, 2026 12:07

Jiseong-oh removed request for JacobSzwejbka, SS-JIA, abhinaykukkadapu, cccclai, digantdesai, kirklandsign, larryliu0820, lucylq, mergennachin and robert-kalmar April 2, 2026 12:07

Jiseong-oh force-pushed the extra_ops_modes branch from b7f784e to ab5de0d Compare April 2, 2026 12:08

Jiseong-oh changed the title ~~Extra ops modes~~ Support extra ops modes for LLM Models Apr 3, 2026

Jiseong-oh force-pushed the extra_ops_modes branch 2 times, most recently from 492c8d9 to dba1f4e Compare April 6, 2026 23:18

Jiseong-oh requested a review from SS-JIA April 7, 2026 02:51

Jiseong-oh force-pushed the extra_ops_modes branch from bf08ec5 to df35331 Compare April 8, 2026 05:10

Jiseong-oh and others added 4 commits April 8, 2026 05:16

Invoke perf mode api

bde65b1

LiteCore provide a new api named "graphgen_set_perf_mode". This commit invokes this api to set performance mode. Co-authored-by: Jingya Zhang <jingya.zhang@samsung.com> Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

update Litecore version to 1.1.0

7281f62

Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

Update Gather op without input_type in index

de3bae4

Internal IR remove "input_type" from Gather op. This commit removes "input_type" setting and set Gather inputs Co-authored-by: Jingya Zhang <jingya.zhang@samsung.com> Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>

Jiseong-oh force-pushed the extra_ops_modes branch from df35331 to de3bae4 Compare April 8, 2026 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support extra ops modes for LLM Models#18670

Support extra ops modes for LLM Models#18670
Jiseong-oh wants to merge 4 commits intomainfrom
extra_ops_modes

Jiseong-oh commented Apr 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

SS-JIA commented Apr 9, 2026

Uh oh!

claude bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jiseong-oh commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18670

❌ 1 New Failure, 1 Cancelled Job, 5 Unrelated Failures

Uh oh!

github-actions bot commented Apr 2, 2026

This PR needs a release notes: label

Uh oh!

SS-JIA commented Apr 9, 2026

Uh oh!

claude bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #18670: Support extra ops modes for LLM Models

Overview

Issues

1. Bug: op_pow.py target vs test mismatch — backends/samsung/builders/op_pow.py:20

2. compose_rms_norm.py — potential _get_eps_node / _get_gamma_node returning None — backends/samsung/_passes/compose_rms_norm.py:22-33

3. compose_rms_norm.py — fragile node name matching — backends/samsung/_passes/compose_rms_norm.py:24

4. Hardcoded high-performance mode in all examples — examples/samsung/aot_compiler.py:81

5. Copyright header — backends/samsung/_passes/compose_rms_norm.py:1

6. gen_samsung_backend_compile_spec type hint — backends/samsung/serialization/compile_options.py:76

7. Inconsistent op name casing in define_op calls

8. Minor: op_split_with_sizes_copy.py missing return type annotation — line 26

9. Minor: op_topk.py shadows built-in sorted — line 70

Positive observations

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jiseong-oh commented Apr 2, 2026 •

edited

Loading

pytorch-bot bot commented Apr 2, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude bot commented Apr 9, 2026 •

edited

Loading

1. Bug: `op_pow.py` target vs test mismatch — `backends/samsung/builders/op_pow.py:20`

2. `compose_rms_norm.py` — potential `_get_eps_node` / `_get_gamma_node` returning `None` — `backends/samsung/_passes/compose_rms_norm.py:22-33`

3. `compose_rms_norm.py` — fragile node name matching — `backends/samsung/_passes/compose_rms_norm.py:24`

4. Hardcoded high-performance mode in all examples — `examples/samsung/aot_compiler.py:81`

5. Copyright header — `backends/samsung/_passes/compose_rms_norm.py:1`

6. `gen_samsung_backend_compile_spec` type hint — `backends/samsung/serialization/compile_options.py:76`

7. Inconsistent op name casing in `define_op` calls

8. Minor: `op_split_with_sizes_copy.py` missing return type annotation — line 26

9. Minor: `op_topk.py` shadows built-in `sorted` — line 70