-
Notifications
You must be signed in to change notification settings - Fork 754
Expose static llama in OSS #16184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose static llama in OSS #16184
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16184
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Unrelated FailureAs of commit 23330e8 with merge base ee236cb ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88875745. |
This PR needs a
|
Summary: This exposes a static llama model for CoreML. We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes). Reviewed By: billmguo Differential Revision: D88875745
3388e7c to
56e05cc
Compare
Summary: This exposes a static llama model for CoreML. We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes). Reviewed By: billmguo Differential Revision: D88875745
56e05cc to
82f788c
Compare
Summary: This exposes a static llama model for CoreML. We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes). Reviewed By: billmguo Differential Revision: D88875745
82f788c to
3a8f9e5
Compare
Summary: This exposes a static llama model for CoreML. We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes). Reviewed By: billmguo Differential Revision: D88875745
3a8f9e5 to
23330e8
Compare
| def forward(self, *args, **kwargs): | ||
| return tuple( | ||
| ( | ||
| torch.ops.executorch_utils.graph_break.Tensor(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we should make a tutorial for this op. I think it could be broadly useful to backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move it out of this file into executorch/exir?
|
|
||
| # Setup CoreML partitioner | ||
| print("\nSetting up CoreML partitioner...") | ||
| compile_specs = CoreMLBackend.generate_compile_specs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a recipe we could use? Could you add one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core ML has a recipe. I think the default recipe will likely work here, but I'm so used to using the partitioner.
| executorch_program = edge_manager.to_executorch( | ||
| ExecutorchBackendConfig( | ||
| extract_delegate_segments=True, | ||
| do_quant_fusion_and_const_prop=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apple path leaves quant ops in the graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I guess this doesn't need to be specified.
| memory_planning_pass=MemoryPlanningPass( | ||
| alloc_graph_input=False, alloc_graph_output=False | ||
| ), | ||
| sym_shape_eval_pass=ConstraintBasedSymShapeEvalPass(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default now fwiw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConstraintBasedSymShapeEvalPass is default now? Or the memory planning one?
| remove_graph_break_(edge_manager) | ||
| executorch_program = edge_manager.to_executorch( | ||
| ExecutorchBackendConfig( | ||
| extract_delegate_segments=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is also the default
Summary:
This exposes a static llama model for CoreML.
We want to unify development behind one static model so that we do not need to apply fixes in multiple places (e.g., iOS 26 fixes).
Differential Revision: D88875745