Skip to content

Simplify handling of kernelized functions#622

Open
danieldk wants to merge 5 commits into
mainfrom
deprecate-kernel-functions
Open

Simplify handling of kernelized functions#622
danieldk wants to merge 5 commits into
mainfrom
deprecate-kernel-functions

Conversation

@danieldk
Copy link
Copy Markdown
Member

@danieldk danieldk commented Jun 5, 2026

This is a set of related changes based on internal discussions:

  • Deprecate the FuncRepository classes. Kernel-side functions are not a great interface, because one typically has to set attributes to permit torch.compile and backward passes. However for functions, these are detached. It also hides that the functions are converted to nn.Module underneath.
  • Deprecate the use_kernel_from_hub decorator and extend use_kernel_forward_from_hub to provide the same functionality. This makes it clearer that when decorating a function, it gets replaced by a module instance.
  • Adopt use_kernelized_func from transformers. This annotation makes it unnecessary to assign functions as member variables, which cause issues in some applications. Instead, they are registered in a module separate from (sub-)modules. kernelize is updated to pick up these attached functions as well.

danieldk added 4 commits June 5, 2026 06:36
Change `use_kernel_forward_from_hub` to work with functions.
`use_kernel_func_from_hub` is now deprecated and its implementation is
now just a call to `use_kernel_forward_from_hub`.
This function attaches a function that is made kernelizeable to a
module, so that it can be discovered by `kernelize`.
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Coverage report — kernels/

Measured on: Python 3.10 / Torch 2.12.0.
Other CI configurations are not included in this number.
Hardware-gated code paths (ROCm/XPU/NPU/Darwin/Windows) are excluded or unreachable on the Linux+CUDA runner.

Total coverage: 83.5% — threshold: 80% — ✅

Per-file breakdown
Name Stmts Miss Cover Missing
src/kernels/__init__.py 10 0 100%
src/kernels/_system.py 6 1 83% 10
src/kernels/_versions.py 63 7 89% 46, 49, 52-53, 56-57, 100
src/kernels/backends.py 194 55 72% 40, 44, 48-51, 68, 90, 108, 117, 121, 125-127, 148, 170, 181, 188-191, 201, 205-225, 233, 256-276
src/kernels/compat.py 8 1 88% 5
src/kernels/deps.py 54 4 93% 58-59, 95, 98
src/kernels/layer/__init__.py 6 0 100%
src/kernels/layer/_interval_tree.py 103 4 96% 23, 52, 147, 150
src/kernels/layer/device.py 48 14 71% 42, 47-49, 91, 96-98, 101, 149, 152, 155-157
src/kernels/layer/func.py 81 7 91% 80, 110, 181, 297, 303, 316, 334
src/kernels/layer/globals.py 5 0 100%
src/kernels/layer/kernelize.py 74 8 89% 255, 281, 289-290, 296, 300, 316-318
src/kernels/layer/layer.py 208 17 92% 167, 210, 216, 229, 336, 394, 410-411, 423, 432, 440, 451, 480, 484, 497, 550, 580
src/kernels/layer/mode.py 14 0 100%
src/kernels/layer/repos.py 130 34 74% 27, 33, 36-41, 61-62, 68, 71-74, 88, 92, 101-102, 108, 111-114, 121-122, 128, 131-134, 141-142, 148, 151-154, 235
src/kernels/lockfile.py 69 44 36% 37-98, 102-125
src/kernels/status.py 49 2 96% 23, 81
src/kernels/utils.py 298 60 80% 59, 71-75, 81-82, 211, 215, 218, 280, 288, 325-326, 364, 393, 398, 432, 609-614, 640, 643, 645, 651, 664-665, 686-695, 699-706, 714, 718-728, 732-739, 777, 781, 800, 802
src/kernels/variants.py 262 19 93% 56, 87, 108, 138, 247-248, 289, 291, 371-378, 384-390, 421-427, 439-445, 534-536
TOTAL 1682 277 84%

Updated by the Test kernels workflow on commit 56c9a71e201f8f54f666c12650e612e838643d34.

@danieldk danieldk marked this pull request as ready for review June 5, 2026 11:32
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial comments, I still need to test this on transformers side but it definitely would make our lives easier ❤️ only thing I'm personally missing is the option to also remove the hidden modules, we kernelize once tbh and should not allow rekernelizing (for now)

Comment thread docs/source/layers.md
that uses it. For example:

```python
@use_kernelized_func(silu_and_mul)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice docs 🫡


kernelize_layer(module, mode=mode, device_type=device_type, use_fallback=use_fallback)
# Kernel functions attached to a module through `use_kernelized_func`.
for kernel_func in getattr(module, "_kernel_funcs", {}).values():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for BC, I would like to keep the same original naming as in transformers _hidden_kernels

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to give it a new name, or if would conflict with the existing implementation in transformers. _kernel_funcs also seems to be a more descriptive name?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yea should be fine as well. I see the point 👍

return layer


def _create_func_module(func: Callable) -> Type["nn.Module"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know how import it is for you guys but should we keep a BC reference in func.py for now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only consider things exported from the top-level __init__.py stable API, so we can move things around freely in other places.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's neat

mode=mode,
device_type=device_type,
use_fallback=use_fallback,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should detach them afterwards or at least add this as an optional flag to enable this, wdyt?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detaching sounds like a bad idea, because then you cannot kernelize the same model multiple times, which is a scenario that we want to support. (E.g. for kernelizing with different mappings or kernelizing with a different mode.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to check whether deepspeed still plays well with this. Not sure if it still takes a look at these hidden dicts, it shouldn't but we never know 😓

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants