Expose static llama in OSS #16184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

meta-codesync merged 1 commit into pytorch:main from metascroy:export-D88875745

Dec 11, 2025

+870 −0

Contributor

metascroy commented Dec 10, 2025

Summary:
This exposes a static llama model for CoreML.

We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes).

Differential Revision: D88875745

metascroy requested a review from cccclai as a code owner

December 10, 2025 19:45

pytorch-bot bot commented Dec 10, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16184

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Unrelated Failure

As of commit 23330e8 with merge base ee236cb ():

NEW FAILURES - The following jobs have failed:

Build Presets / apple (ios-simulator) / build (gh)
The process '/opt/homebrew/bin/git' failed with exit code 128
pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh)
RuntimeError: Command docker exec -t 89d5e563bfe3174e4c8575c819460ce0afde0152a50ffe3c9e1ea433d9ce5d3d /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed label

meta-codesync bot commented Dec 10, 2025

@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88875745.

meta-codesync bot added fb-exported meta-exported labels

github-actions bot commented Dec 10, 2025

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

This was referenced Dec 10, 2025

[CoreML Backend] macOS 26.1 ANE regression: fp16 LLaMA inference produces inf/nan (worked on macOS 15.7) #15833

Open

ANE llama runner fixes for iOS26 #16057

Closed

billmguo approved these changes

View reviewed changes

metascroy added a commit to metascroy/executorch that referenced this pull request


          Expose static llama in OSS (pytorch#16184)

56e05cc

Summary:

This exposes a static llama model for CoreML.

We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes).

Reviewed By: billmguo

Differential Revision: D88875745

metascroy force-pushed the export-D88875745 branch from 3388e7c to 56e05cc Compare

December 10, 2025 21:20

metascroy added a commit to metascroy/executorch that referenced this pull request


          Expose static llama in OSS (pytorch#16184)

82f788c

Summary:

This exposes a static llama model for CoreML.

We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes).

Reviewed By: billmguo

Differential Revision: D88875745

metascroy force-pushed the export-D88875745 branch from 56e05cc to 82f788c Compare

December 10, 2025 21:54

metascroy added a commit to metascroy/executorch that referenced this pull request


          Expose static llama in OSS (pytorch#16184)

3a8f9e5

Summary:

This exposes a static llama model for CoreML.

We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes).

Reviewed By: billmguo

Differential Revision: D88875745

metascroy force-pushed the export-D88875745 branch from 82f788c to 3a8f9e5 Compare

December 10, 2025 22:51


          Expose static llama in OSS (pytorch#16184)

23330e8

Summary:

This exposes a static llama model for CoreML.

We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes).

Reviewed By: billmguo

Differential Revision: D88875745

metascroy force-pushed the export-D88875745 branch from 3a8f9e5 to 23330e8 Compare

December 11, 2025 00:54

JacobSzwejbka reviewed

View reviewed changes

examples/apple/coreml/llama/export_static_llm_coreml.py

+                  def forward(self, *args, **kwargs):
+                      return tuple(
+                          (
+                              torch.ops.executorch_utils.graph_break.Tensor(a)

Contributor

JacobSzwejbka Dec 11, 2025

Oh we should make a tutorial for this op. I think it could be broadly useful to backends.

Contributor Author

metascroy Dec 12, 2025

We could move it out of this file into executorch/exir?

JacobSzwejbka reviewed

View reviewed changes

examples/apple/coreml/llama/export_static_llm_coreml.py

+                  # Setup CoreML partitioner
+                  print("\nSetting up CoreML partitioner...")
+                  compile_specs = CoreMLBackend.generate_compile_specs(

Contributor

JacobSzwejbka Dec 11, 2025

Do we have a recipe we could use? Could you add one?

Contributor Author

metascroy Dec 12, 2025

Core ML has a recipe. I think the default recipe will likely work here, but I'm so used to using the partitioner.

JacobSzwejbka reviewed

View reviewed changes

examples/apple/coreml/llama/export_static_llm_coreml.py

+                  executorch_program = edge_manager.to_executorch(
+                      ExecutorchBackendConfig(
+                          extract_delegate_segments=True,
+                          do_quant_fusion_and_const_prop=True,

Contributor

JacobSzwejbka Dec 11, 2025

Apple path leaves quant ops in the graph?

Contributor Author

metascroy Dec 12, 2025

No, I guess this doesn't need to be specified.

JacobSzwejbka reviewed

View reviewed changes

examples/apple/coreml/llama/export_static_llm_coreml.py

+                          memory_planning_pass=MemoryPlanningPass(
+                              alloc_graph_input=False, alloc_graph_output=False
+                          ),
+                          sym_shape_eval_pass=ConstraintBasedSymShapeEvalPass(),

Contributor

JacobSzwejbka Dec 11, 2025

This is the default now fwiw

Contributor Author

metascroy Dec 12, 2025

ConstraintBasedSymShapeEvalPass is default now? Or the memory planning one?

JacobSzwejbka reviewed

View reviewed changes

examples/apple/coreml/llama/export_static_llm_coreml.py

+                  remove_graph_break_(edge_manager)
+                  executorch_program = edge_manager.to_executorch(
+                      ExecutorchBackendConfig(
+                          extract_delegate_segments=True,

Contributor

JacobSzwejbka Dec 11, 2025

this one is also the default

meta-codesync bot merged commit 71ebc50 into pytorch:main

273 of 279 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported meta-exported