Skip to content

Support extra ops modes for LLM Models#18670

Open
Jiseong-oh wants to merge 4 commits intomainfrom
extra_ops_modes
Open

Support extra ops modes for LLM Models#18670
Jiseong-oh wants to merge 4 commits intomainfrom
extra_ops_modes

Conversation

@Jiseong-oh
Copy link
Copy Markdown
Collaborator

@Jiseong-oh Jiseong-oh commented Apr 2, 2026

Summary

Add Extra Ops for Exynos Backend

  • Cos, GroupNorm, Index, Log, Pow, Rms_Norm, Sigmoid, sing, Splite_with_sizes_copy, sum_int_list, tanh and Topk
  • All ops are verfied
  • These ops will be used for supporting LLM models

Support Perf mode

  • It can be supported Perf mode with experimental
  • This mode MUST BE used to verify model on exynos device farm fristly before testing on the phone

cc @SS-JIA @digantdesai @kimishpatel

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18670

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 5 Unrelated Failures

As of commit de3bae4 with merge base b5ae0b9 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2026
@Jiseong-oh Jiseong-oh added partner: samsung For backend delegation, kernels, demo, etc. from the 3rd-party partner, Samsung module: samsung CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Apr 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@Jiseong-oh Jiseong-oh changed the title Extra ops modes Support extra ops modes for LLM Models Apr 3, 2026
@Jiseong-oh Jiseong-oh force-pushed the extra_ops_modes branch 2 times, most recently from 492c8d9 to dba1f4e Compare April 6, 2026 23:18
@Jiseong-oh Jiseong-oh requested a review from SS-JIA April 7, 2026 02:51
Jiseong-oh and others added 4 commits April 8, 2026 05:16
The previously supported single ops are being migrated to the current
dev branch, as follows.
cos, group_norm, index, log, pow, rms_norm, sigmoid, sin,
split_with_sizes_copy, sum_int_list, tanh and topk.

Co-authored-by: xz.linghu <xz.linghu@samsung.com>
Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>
LiteCore provide a new api named "graphgen_set_perf_mode".
This commit invokes this api to set performance mode.

Co-authored-by: Jingya Zhang <jingya.zhang@samsung.com>
Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>
Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>
Internal IR  remove "input_type" from Gather op.
This commit removes "input_type" setting and set Gather inputs

Co-authored-by: Jingya Zhang <jingya.zhang@samsung.com>
Signed-off-by: jiseong.oh <jiseong.oh@samsung.com>
@SS-JIA
Copy link
Copy Markdown
Contributor

SS-JIA commented Apr 9, 2026

@claude help review this PR

@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Claude finished @SS-JIA's task in 4m 41s —— View job


Review of PR #18670: Support extra ops modes for LLM Models

  • Gather context and understand the PR changes
  • Review new op builders
  • Review compose_rms_norm pass
  • Review test files
  • Review serialization and compile options changes
  • Review example script changes and perf mode support
  • Post comprehensive review

Overview

This PR adds 12 new op builders for the Samsung Exynos backend (Cos, GroupNorm, Index, Log, Pow, RmsNorm, Sigmoid, Sin, SplitWithSizesCopy, SumIntList, Tanh, TopK), a RecomposeRmsNorm pass, and performance mode support via a new PerformanceMode enum. Each new op has a corresponding unit test. All example scripts are updated to use PerformanceMode.HIGH_PERFORMANCE by default.

Overall the PR is well-structured and follows existing patterns. Below are the issues I found, organized by severity.


Issues

1. Bug: op_pow.py target vs test mismatch — backends/samsung/builders/op_pow.py:20

The builder registers target = "aten.pow.Tensor_Tensor", but the test in test_pow.py:36 checks for torch.ops.aten.pow.Tensor_Scalar (using x**2.0 which produces Tensor_Scalar). Additionally, the partitioner adds exir_ops.edge.aten.pow.Tensor_Scalar to SUPPORTED_OPS (enn_partitioner.py:42).

This means the builder only handles Tensor_Tensor pow, but the test and partitioner expect Tensor_Scalar. The Tensor_Scalar variant would match via the partitioner's SUPPORTED_OPS list and presumably be handled by the ReplaceOpsWithScalar pass, but the builder itself would never directly match Tensor_Scalar ops. If both variants need builder support, the builder's target should be a list including both. If only Tensor_Tensor is intended for the builder and Tensor_Scalar is handled through the scalar replacement pass, the test should be updated to test the Tensor_Tensor case that actually exercises the builder.

Fix this →

2. compose_rms_norm.py — potential _get_eps_node / _get_gamma_node returning Nonebackends/samsung/_passes/compose_rms_norm.py:22-33

def _get_eps_node(self, nodes):
    add_node = [n for n in nodes if hasattr(n, "name") and "add" in n.name][0]
    for a in add_node.args:
        if isinstance(a, float) or a.op != "call_function":
            return a

def _get_gamma_node(self, output_node):
    for a in output_node.args:
        if a.op != "call_function":
            return a

If no arg matches the condition, these methods implicitly return None, which would then be passed as the eps/gamma argument to rms_norm. Consider raising an error if the expected node is not found rather than silently passing None.

Fix this →

3. compose_rms_norm.py — fragile node name matching — backends/samsung/_passes/compose_rms_norm.py:24

add_node = [n for n in nodes if hasattr(n, "name") and "add" in n.name][0]

Matching by "add" in n.name is fragile — node names like "add_padding", "broadcast_add", etc. could be false positives. The Qualcomm version (backends/qualcomm/_passes/recompose_rms_norm.py) matches by checking node.target against known ATen add ops, which is more robust. Consider matching on node.target instead of node.name.

Fix this →

4. Hardcoded high-performance mode in all examples — examples/samsung/aot_compiler.py:81

Every example script now unconditionally uses PerformanceMode.HIGH_PERFORMANCE. The PR description warns "This mode MUST BE used to verify model on exynos device farm firstly before testing on the phone." But none of the scripts expose this as a CLI argument, so users can't opt out. The aot_compiler.py (the general-purpose compiler) especially should probably default to DEFAULT or expose --perf_mode as an argument.

5. Copyright header — backends/samsung/_passes/compose_rms_norm.py:1

The file has # Copyright (c) Qualcomm Innovation Center, Inc as the first copyright line. This file is derived from backends/qualcomm/_passes/recompose_rms_norm.py (same class name, similar approach), using a get_source_partitions-based implementation. The Qualcomm copyright is appropriate if code was derived from their work, but verify this dual-copyright header is intentional.

6. gen_samsung_backend_compile_spec type hint — backends/samsung/serialization/compile_options.py:76

def gen_samsung_backend_compile_spec(
    chipset: str,
    perf_mode: PerformanceMode = None,
):

The type hint says PerformanceMode but the default is None. Should be Optional[PerformanceMode] = None for correctness.

7. Inconsistent op name casing in define_op calls

Most existing op builders use ALL_CAPS for the op type string (e.g., "SQRT", "SIGMOID", "TANH", "LOG"), but some new ops use mixed casing:

  • op_sin.py:31: "Sin" (vs "SIN")
  • op_cos.py:31: "Cos" (vs "COS")
  • op_topk.py:74: "TopK" (vs "TOPK")

The existing op_hardsigmoid.py also uses "HardSigmoid" (mixed case), so this may be intentional per the ENN backend's expected op names. Worth confirming these are the exact strings the backend library expects.

8. Minor: op_split_with_sizes_copy.py missing return type annotation — line 26

The define_node method is missing the -> None return type annotation, unlike all other new builders.

9. Minor: op_topk.py shadows built-in sorted — line 70

sorted = cast(bool, node.args[4])

Shadows the Python built-in sorted. Not a functional bug but not great practice.


Positive observations

  • Tests cover multiple configurations (e.g., index on different axes, split with different chunk sizes, sum with/without keepdims, topk with different dims).
  • The RecomposeRmsNorm pass correctly uses get_source_partitions with torch.nn.RMSNorm source matching, which is a clean approach.
  • The FlatBuffers schema change for PerformanceMode is clean and backward-compatible (default = 0).
  • The @experimental decorator on PerformanceMode is a good way to signal that this feature is not yet stable.

Summary

The main issues to address are:

  1. The pow target mismatch (issue Add support for quantized LeakyReLU #1) — verify whether the builder covers the right variant and ensure the test exercises the builder path
  2. Fragile RMS norm pass (issues Re-sync with internal repository #2, Rename _pt2e to pt2e #3) — _get_eps_node/_get_gamma_node could return None and name-based matching is fragile
  3. Hardcoded high-performance mode in all examples (issue Add unlifting pass under private config #4) — consider making this configurable or defaulting to DEFAULT
  4. Type hint correctness (issue Re-sync with internal repository #6) — Optional[PerformanceMode]

The rest are minor style/consistency observations.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: samsung partner: samsung For backend delegation, kernels, demo, etc. from the 3rd-party partner, Samsung

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants