Skip to content

Commit 975538f

Browse files
authored
Merge branch 'main' into xpu-skill
2 parents 399cc59 + 61bcfe8 commit 975538f

35 files changed

Lines changed: 493 additions & 338 deletions

File tree

.github/workflows/build_kernel_macos.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
# For now we only test that there are no regressions in building macOS
2929
# kernels. Also run tests once we have a macOS runner.
3030
- name: Build relu kernel
31-
run: ( cd examples/kernels/relu && nix build .\#redistributable.torch210-metal-aarch64-darwin -L )
31+
run: ( cd examples/kernels/relu && nix build .\#redistributable.torch211-metal-aarch64-darwin -L )
3232

3333
- name: Build relu metal cpp kernel
34-
run: ( cd examples/kernels/relu-metal-cpp && nix build .\#redistributable.torch210-metal-aarch64-darwin -L )
34+
run: ( cd examples/kernels/relu-metal-cpp && nix build .\#redistributable.torch211-metal-aarch64-darwin -L )

.github/workflows/build_kernel_rocm.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
# For now we only test that there are no regressions in building ROCm
3535
# kernels. Also run tests once we have a ROCm runner.
3636
- name: Build relu kernel
37-
run: ( cd examples/kernels/relu && nix build .\#redistributable.torch210-cxx11-rocm70-x86_64-linux -L )
37+
run: ( cd examples/kernels/relu && nix build .\#redistributable.torch211-cxx11-rocm71-x86_64-linux -L )
3838

3939
- name: Build relu kernel (compiler flags)
40-
run: ( cd examples/kernels/relu-compiler-flags && nix build .\#redistributable.torch210-cxx11-rocm70-x86_64-linux )
40+
run: ( cd examples/kernels/relu-compiler-flags && nix build .\#redistributable.torch211-cxx11-rocm71-x86_64-linux )

.github/workflows/build_kernel_xpu.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,13 @@ jobs:
3434
# For now we only test that there are no regressions in building XPU
3535
# kernels. Also run tests once we have a XPU runner.
3636
- name: Build relu kernel
37-
run: ( cd examples/kernels/relu && nix build .\#redistributable.torch210-cxx11-xpu20253-x86_64-linux -L )
37+
run: ( cd examples/kernels/relu && nix build .\#redistributable.torch211-cxx11-xpu20253-x86_64-linux -L )
3838

3939
- name: Build relu tvm-ffi kernel
4040
run: ( cd examples/kernels/relu-tvm-ffi && nix build .\#redistributable.tvm-ffi01-xpu20253-x86_64-linux -L )
4141

4242
- name: Build relu kernel (compiler flags)
43-
run: ( cd examples/kernels/relu-compiler-flags && nix build .\#redistributable.torch210-cxx11-xpu20253-x86_64-linux )
43+
run: ( cd examples/kernels/relu-compiler-flags && nix build .\#redistributable.torch211-cxx11-xpu20253-x86_64-linux )
4444

4545
- name: Build cutlass-gemm kernel
46-
run: ( cd examples/kernels/cutlass-gemm && nix build .\#redistributable.torch210-cxx11-xpu20253-x86_64-linux -L )
46+
run: ( cd examples/kernels/cutlass-gemm && nix build .\#redistributable.torch211-cxx11-xpu20253-x86_64-linux -L )

.github/workflows/test_kernels.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ jobs:
2626
strategy:
2727
max-parallel: 4
2828
matrix:
29-
python-version: ["3.10", "3.12"]
30-
torch-version: ["2.10.0", "2.11.0"]
29+
python-version: ["3.10", "3.14"]
30+
torch-version: ["2.11.0", "2.12.0"]
3131

3232
env:
3333
UV_PYTHON_PREFERENCE: only-managed

docs/source/builder/build-variants.md

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,27 +7,21 @@ available. This list will be updated as new PyTorch versions are released.
77

88
## CPU aarch64-darwin
99

10-
- `torch210-cpu-aarch64-darwin`
1110
- `torch211-cpu-aarch64-darwin`
1211
- `torch212-cpu-aarch64-darwin`
1312

1413
## Metal aarch64-darwin
1514

16-
- `torch210-metal-aarch64-darwin`
1715
- `torch211-metal-aarch64-darwin`
1816
- `torch212-metal-aarch64-darwin`
1917

2018
## CPU aarch64-linux
2119

22-
- `torch210-cxx11-cpu-aarch64-linux`
2320
- `torch211-cxx11-cpu-aarch64-linux`
2421
- `torch212-cxx11-cpu-aarch64-linux`
2522

2623
## CUDA aarch64-linux
2724

28-
- `torch210-cxx11-cu126-aarch64-linux`
29-
- `torch210-cxx11-cu128-aarch64-linux`
30-
- `torch210-cxx11-cu130-aarch64-linux`
3125
- `torch211-cxx11-cu126-aarch64-linux`
3226
- `torch211-cxx11-cu128-aarch64-linux`
3327
- `torch211-cxx11-cu130-aarch64-linux`
@@ -37,15 +31,11 @@ available. This list will be updated as new PyTorch versions are released.
3731

3832
## CPU x86_64-linux
3933

40-
- `torch210-cxx11-cpu-x86_64-linux`
4134
- `torch211-cxx11-cpu-x86_64-linux`
4235
- `torch212-cxx11-cpu-x86_64-linux`
4336

4437
## CUDA x86_64-linux
4538

46-
- `torch210-cxx11-cu126-x86_64-linux`
47-
- `torch210-cxx11-cu128-x86_64-linux`
48-
- `torch210-cxx11-cu130-x86_64-linux`
4939
- `torch211-cxx11-cu126-x86_64-linux`
5040
- `torch211-cxx11-cu128-x86_64-linux`
5141
- `torch211-cxx11-cu130-x86_64-linux`
@@ -55,16 +45,13 @@ available. This list will be updated as new PyTorch versions are released.
5545

5646
## ROCm x86_64-linux
5747

58-
- `torch210-cxx11-rocm70-x86_64-linux`
59-
- `torch210-cxx11-rocm71-x86_64-linux`
6048
- `torch211-cxx11-rocm71-x86_64-linux`
6149
- `torch211-cxx11-rocm72-x86_64-linux`
6250
- `torch212-cxx11-rocm71-x86_64-linux`
6351
- `torch212-cxx11-rocm72-x86_64-linux`
6452

6553
## XPU x86_64-linux
6654

67-
- `torch210-cxx11-xpu20253-x86_64-linux`
6855
- `torch211-cxx11-xpu20253-x86_64-linux`
6956
- `torch212-cxx11-xpu20253-x86_64-linux`
7057

docs/source/builder/writing-kernels.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,47 @@ def mykernel(x: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tenso
351351
return out
352352
```
353353

354+
## Registering Torch operators
355+
356+
You may want to register Torch ops from your kernel's Python code or
357+
register fake ops for `torch.compile` support. It is important to register
358+
such ops in the namespace that kernel-builder makes for your kernel
359+
build. This is required for compliant kernels to ensure that multiple
360+
versions of the same kernel can be loaded at the same time without
361+
namespace conflicts.
362+
363+
You can use the `add_op_namespace_prefix` to prefix an op name with the
364+
correct prefix. So for instance, replace
365+
366+
```python
367+
@torch.library.register_fake("relu::relu_fwd")
368+
def relu_fwd_fake(input: torch.Tensor) -> torch.Tensor:
369+
return torch.empty_like(input)
370+
```
371+
372+
by
373+
374+
```python
375+
from ._ops import add_op_namespace_prefix
376+
377+
@torch.library.register_fake(add_op_namespace_prefix("relu_fwd"))
378+
def relu_fwd_fake(input: torch.Tensor) -> torch.Tensor:
379+
return torch.empty_like(input)
380+
```
381+
382+
As mentioned in the above, the `_ops` module is generated by kernel-builder.
383+
384+
kernel-builder uses a hook to reject incorrect usage of Torch op registration
385+
functions. However, it can only catch direct use of certain `torch.library`
386+
decorators. For instance, the hook would not reject the following decorator,
387+
so it should be seen as a last-resort check if human review failed:
388+
389+
```python
390+
@some_indirection_for_register_fake("relu::relu_fwd")
391+
def relu_fwd_fake(input: torch.Tensor) -> torch.Tensor:
392+
return torch.empty_like(input)
393+
```
394+
354395
## Kernel tests
355396

356397
Kernel tests are stored in the `tests` directory. Since running all

docs/source/installation.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,9 @@ or if you want the latest version from the `main` branch:
1717
```bash
1818
pip install "kernels[benchmark] @ git+https://github.com/huggingface/kernels#subdirectory=kernels"
1919
```
20+
21+
> [!IMPORTANT]
22+
> We strongly recommend not using a free-threaded Python build yet.
23+
These builds are not only experimental, but do not support the stable ABI
24+
on Python versions before 3.15. Kernels are compiled with the stable ABI
25+
to support a wide range of Python versions.

examples/kernels/flake.nix

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
drv = sys: out: out.packages.${sys}.default;
8585
torchVersions = _defaultVersions: [
8686
{
87-
torchVersion = "2.10";
87+
torchVersion = "2.11";
8888
cudaVersion = "12.8";
8989
systems = [
9090
"x86_64-linux"

examples/kernels/relu-specific-torch/flake.nix

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
path = ./.;
1616
torchVersions = defaultVersions: [
1717
{
18-
torchVersion = "2.10";
18+
torchVersion = "2.11";
1919
cudaVersion = "12.8";
2020
systems = [
2121
"x86_64-linux"

examples/kernels/relu-torch-bounds/build.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ backends = [
1111
repo-id = "kernels-test/relu-torch-bounds"
1212

1313
[torch]
14-
minver = "2.10"
15-
maxver = "2.10"
14+
minver = "2.11"
15+
maxver = "2.11"
1616
src = [
1717
"torch-ext/torch_binding.cpp",
1818
"torch-ext/torch_binding.h",

0 commit comments

Comments
 (0)