Simplify handling of kernelized functions#622
Conversation
Change `use_kernel_forward_from_hub` to work with functions. `use_kernel_func_from_hub` is now deprecated and its implementation is now just a call to `use_kernel_forward_from_hub`.
This function attaches a function that is made kernelizeable to a module, so that it can be discovered by `kernelize`.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Coverage report —
|
| Name | Stmts | Miss | Cover | Missing |
|---|---|---|---|---|
| src/kernels/__init__.py | 10 | 0 | 100% | |
| src/kernels/_system.py | 6 | 1 | 83% | 10 |
| src/kernels/_versions.py | 63 | 7 | 89% | 46, 49, 52-53, 56-57, 100 |
| src/kernels/backends.py | 194 | 55 | 72% | 40, 44, 48-51, 68, 90, 108, 117, 121, 125-127, 148, 170, 181, 188-191, 201, 205-225, 233, 256-276 |
| src/kernels/compat.py | 8 | 1 | 88% | 5 |
| src/kernels/deps.py | 54 | 4 | 93% | 58-59, 95, 98 |
| src/kernels/layer/__init__.py | 6 | 0 | 100% | |
| src/kernels/layer/_interval_tree.py | 103 | 4 | 96% | 23, 52, 147, 150 |
| src/kernels/layer/device.py | 48 | 14 | 71% | 42, 47-49, 91, 96-98, 101, 149, 152, 155-157 |
| src/kernels/layer/func.py | 81 | 7 | 91% | 80, 110, 181, 297, 303, 316, 334 |
| src/kernels/layer/globals.py | 5 | 0 | 100% | |
| src/kernels/layer/kernelize.py | 74 | 8 | 89% | 255, 281, 289-290, 296, 300, 316-318 |
| src/kernels/layer/layer.py | 208 | 17 | 92% | 167, 210, 216, 229, 336, 394, 410-411, 423, 432, 440, 451, 480, 484, 497, 550, 580 |
| src/kernels/layer/mode.py | 14 | 0 | 100% | |
| src/kernels/layer/repos.py | 130 | 34 | 74% | 27, 33, 36-41, 61-62, 68, 71-74, 88, 92, 101-102, 108, 111-114, 121-122, 128, 131-134, 141-142, 148, 151-154, 235 |
| src/kernels/lockfile.py | 69 | 44 | 36% | 37-98, 102-125 |
| src/kernels/status.py | 49 | 2 | 96% | 23, 81 |
| src/kernels/utils.py | 298 | 60 | 80% | 59, 71-75, 81-82, 211, 215, 218, 280, 288, 325-326, 364, 393, 398, 432, 609-614, 640, 643, 645, 651, 664-665, 686-695, 699-706, 714, 718-728, 732-739, 777, 781, 800, 802 |
| src/kernels/variants.py | 262 | 19 | 93% | 56, 87, 108, 138, 247-248, 289, 291, 371-378, 384-390, 421-427, 439-445, 534-536 |
| TOTAL | 1682 | 277 | 84% |
Updated by the Test kernels workflow on commit 56c9a71e201f8f54f666c12650e612e838643d34.
vasqu
left a comment
There was a problem hiding this comment.
My initial comments, I still need to test this on transformers side but it definitely would make our lives easier ❤️ only thing I'm personally missing is the option to also remove the hidden modules, we kernelize once tbh and should not allow rekernelizing (for now)
| that uses it. For example: | ||
|
|
||
| ```python | ||
| @use_kernelized_func(silu_and_mul) |
|
|
||
| kernelize_layer(module, mode=mode, device_type=device_type, use_fallback=use_fallback) | ||
| # Kernel functions attached to a module through `use_kernelized_func`. | ||
| for kernel_func in getattr(module, "_kernel_funcs", {}).values(): |
There was a problem hiding this comment.
Just for BC, I would like to keep the same original naming as in transformers _hidden_kernels
There was a problem hiding this comment.
I think it's better to give it a new name, or if would conflict with the existing implementation in transformers. _kernel_funcs also seems to be a more descriptive name?
There was a problem hiding this comment.
Hmm, yea should be fine as well. I see the point 👍
| return layer | ||
|
|
||
|
|
||
| def _create_func_module(func: Callable) -> Type["nn.Module"]: |
There was a problem hiding this comment.
Don't know how import it is for you guys but should we keep a BC reference in func.py for now?
There was a problem hiding this comment.
We only consider things exported from the top-level __init__.py stable API, so we can move things around freely in other places.
| mode=mode, | ||
| device_type=device_type, | ||
| use_fallback=use_fallback, | ||
| ) |
There was a problem hiding this comment.
I think we should detach them afterwards or at least add this as an optional flag to enable this, wdyt?
There was a problem hiding this comment.
Detaching sounds like a bad idea, because then you cannot kernelize the same model multiple times, which is a scenario that we want to support. (E.g. for kernelizing with different mappings or kernelizing with a different mode.)
There was a problem hiding this comment.
I have to check whether deepspeed still plays well with this. Not sure if it still takes a look at these hidden dicts, it shouldn't but we never know 😓
This is a set of related changes based on internal discussions:
FuncRepositoryclasses. Kernel-side functions are not a great interface, because one typically has to set attributes to permittorch.compileand backward passes. However for functions, these are detached. It also hides that the functions are converted tonn.Moduleunderneath.use_kernel_from_hubdecorator and extenduse_kernel_forward_from_hubto provide the same functionality. This makes it clearer that when decorating a function, it gets replaced by a module instance.use_kernelized_funcfrom transformers. This annotation makes it unnecessary to assign functions as member variables, which cause issues in some applications. Instead, they are registered in a module separate from (sub-)modules.kernelizeis updated to pick up these attached functions as well.