Skip to content

Commit 061da8c

Browse files
committed
Merge branch 'dev'
2 parents 9ed71cb + 929a122 commit 061da8c

15 files changed

+374
-43
lines changed

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,8 @@ mathematical operations require intrinsics (e.g., `__hadd2` performs addition fo
2121
type conversion is awkward (e.g., `__nv_cvt_halfraw2_to_fp8x2` converts float16 to float8),
2222
and some functionality is missing (e.g., one cannot convert a `__half` to `__nv_bfloat16`).
2323

24-
_Kernel Float_ resolves this by offering a single data type `kernel_float::vec<T, N>`
25-
that stores `N` elements of type `T`.
26-
Internally, the data is stored using the most optimal type available, for example, `vec<half, 2>` stores a `__half2` and `vec<fp8_e5m2, 4>` uses a `__nv_fp8x4_e5m2`.
24+
_Kernel Float_ resolves this by offering a single data type `kernel_float::vec<T, N>` that stores `N` elements of type `T`.
25+
Internally, the data is stored as a fixed-sized array of elements.
2726
Operator overloading (like `+`, `*`, `&&`) has been implemented such that the most optimal intrinsic for the available types is selected automatically.
2827
Many mathetical functions (like `log`, `exp`, `sin`) and common operations (such as `sum`, `range`, `for_each`) are also available.
2928

@@ -36,7 +35,8 @@ In a nutshell, _Kernel Float_ offers the following features:
3635

3736
* Single type `vec<T, N>` that unifies all vector types.
3837
* Operator overloading to simplify programming.
39-
* Support for half (16 bit) and quarter (8 bit) floating-point precision.
38+
* Support for half (16 bit) floating-point arithmetic, with a fallback to single precision for unsupported operations.
39+
* Support for quarter (8 bit) floating-point types.
4040
* Easy integration as a single header file.
4141
* Written for C++17.
4242
* Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).

docs/conf.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@
3939
# ones.
4040
extensions = [
4141
"breathe",
42-
#"myst_parser",
43-
"sphinx_mdinclude",
42+
"myst_parser",
43+
#"sphinx_mdinclude",
4444
]
4545

4646
#source_suffix = {

docs/guides.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Guides
2+
=============
3+
.. toctree::
4+
:maxdepth: 1
5+
6+
guides/introduction.rst
7+
guides/promotion.rst
8+
guides/prelude.rst
+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
def name(x):
2+
if x == ("b", 1):
3+
return "b"
4+
5+
return f"{x[0]}{x[1]}"
6+
7+
def promote(a, b):
8+
x = a[0]
9+
y = b[0]
10+
11+
if x == y:
12+
return (x, max(a[1], b[1]))
13+
14+
if x == "b":
15+
return b
16+
17+
if y == "b":
18+
return a
19+
20+
if x in ("f", "bf") and y in ("i", "u"):
21+
return a
22+
23+
if y in ("f", "bf") and x in ("i", "u"):
24+
return b
25+
26+
if x in ("f", "bf") and y in ("f", "bf"):
27+
if a[1] > b[1]:
28+
return a
29+
elif b[1] > a[1]:
30+
return b
31+
else:
32+
return ("f", a[1] * 2)
33+
34+
return None
35+
36+
37+
if __name__ == "__main__":
38+
types = [("b", 1)]
39+
types += [("i", n) for n in [8, 16, 32, 64]]
40+
types += [("u", n) for n in [8, 16, 32, 64]]
41+
types += [("f", n) for n in [8, 16, 32, 64]]
42+
43+
types.insert(types.index(("f", 32)), ("bf", 16))
44+
45+
lines = []
46+
47+
header = [""]
48+
for a in types:
49+
header.append(f"**{name(a)}**")
50+
lines.append(",".join(header))
51+
52+
for a in types:
53+
line = [f"**{name(a)}**"]
54+
55+
for b in types:
56+
c = promote(a, b)
57+
line.append(name(c) if c else "x")
58+
59+
lines.append(",".join(line))
60+
61+
with open("promotion_table.csv", "w") as f:
62+
f.write("\n".join(lines))

docs/guides/introduction.md

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
Getting started
2+
===============
3+
4+
Kernel Float is a header-only library that makes it easy to work with vector types and low-precision floating-point types, mainly focusing on CUDA kernel code.
5+
6+
Installation
7+
------------
8+
9+
The easiest way to use the library is get the single header file from github:
10+
11+
```bash
12+
wget https://raw.githubusercontent.com/KernelTuner/kernel_float/main/single_include/kernel_float.h
13+
```
14+
15+
Next, include this file into your program.
16+
It is conventient to define a namespace alias `kf` to shorten the full name `kernel_float`.
17+
18+
19+
```C++
20+
#include "kernel_float.h"
21+
namespace kf = kernel_float;
22+
```
23+
24+
25+
Example C++ code
26+
----------------
27+
28+
Kernel Float essentially offers a single data-type `kernel_float::vec<T, N>` that stores `N` elements of type `T`.
29+
This type can be initialized normally using list-initialization (e.g., `{a, b, c}`) and elements can be accessed using the `[]` operator.
30+
Operation overload is available to perform binary operations (such as `+`, `*`, and `&`), where the optimal intrinsic for the available types is selected automatically.
31+
32+
Many mathetical functions (like `log`, `sin`, `cos`) are also available, see the [API reference](../api) for the full list of functions.
33+
In some cases, certain operations might not be natively supported by the platform for the some floating-point type.
34+
In these cases, Kernel Float falls back to performing the operations in 32 bit precision.
35+
36+
The code below shows a very simple example of how to use Kernel Float:
37+
38+
```C++
39+
#include "kernel_float.h"
40+
namespace kf = kernel_float;
41+
42+
int main() {
43+
using Type = float;
44+
const int N = 8;
45+
46+
kf::vec<int, N> i = kf::range<int, N>();
47+
kf::vec<Type, N> x = kf::cast<Type>(i);
48+
kf::vec<Type, N> y = x * kf::sin(x);
49+
Type result = kf::sum(y);
50+
printf("result=%f", double(result));
51+
52+
return EXIT_SUCCESS;
53+
}
54+
```
55+
56+
Notice how easy it would be to change the floating-point type `Type` or the vector length `N` without affecting the rest of the code.

docs/guides/prelude.md

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
Using `kernel_float::prelude`
2+
===
3+
4+
When working with Kernel Float, you'll find that you need to prefix every function and type with the `kernel_float::...` prefix.
5+
This can be a bit cumbersome.
6+
It's strongly discouraged not to dump the entire `kernel_float` namespace into the global namespace (with `using namespace kernel_float`) since
7+
many symbols in Kernel Float may clash with global symbols, causing conflicts and issues.
8+
9+
To work around this, the library provides a handy `kernel_float::prelude` namespace. This namespace contains a variety of useful type and function aliases that won't conflict with global symbols.
10+
11+
To make use of it, use the following code:
12+
13+
14+
```C++
15+
#include "kernel_float.h"
16+
using namespace kernel_float::prelude;
17+
18+
// You can now use aliases like `kf`, `kvec`, `kint`, etc.
19+
```
20+
21+
The prelude defines many aliases, include the following:
22+
23+
| Prelude name | Full name |
24+
|---|---|
25+
| `kf` | `kernel_float` |
26+
| `kvec<T, N>` | `kernel_float::vec<T, N>` |
27+
| `into_kvec(v)` | `kernel_float::into_vec(v)` |
28+
| `make_kvec(a, b, ...)` | `kernel_float::make_vec(a, b, ...)` |
29+
| `kvec2<T>`, `kvec3<T>`, ... | `kernel_float::vec<T, 2>`, `kernel_float::vec<T, 3>`, ... |
30+
| `kint<N>` | `kernel_float::vec<int, N>` |
31+
| `kint2`, `kint3`, ... | `kernel_float::vec<int, 2>`, `kernel_float::vec<int, 3>`, ... |
32+
| `klong<N>` | `kernel_float::vec<long, N>` |
33+
| `klong2`, `klong3`, ... | `kernel_float::vec<long, 2>`, `kernel_float::vec<long, 3>`, ... |
34+
| `kbfloat16x<N>` | `kernel_float::vec<bfloat16, N>` |
35+
| `kbfloat16x2`, `kbfloat16x3`, ... | `kernel_float::vec<bfloat16, 2>`, `kernel_float::vec<bfloat16, 3>`, ... |
36+
| `khalf<N>` | `kernel_half::vec<half, N>` |
37+
| `khalf2`, `khalf3`, ... | `kernel_half::vec<half, 2>`, `kernel_half::vec<half, 3>`, ... |
38+
| `kfloat<N>` | `kernel_float::vec<float, N>` |
39+
| `kfloat2`, `kfloat3`, ... | `kernel_float::vec<float, 2>`, `kernel_float::vec<float, 3>`, ... |
40+
| `kdouble<N>` | `kernel_float::vec<double, N>` |
41+
| `kdouble2`, `kdouble3`, ... | `kernel_float::vec<double, 2>`, `kernel_float::vec<double, 3>`, ... |
42+
| ... | ... |

docs/guides/promotion.rst

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Type Promotion
2+
==============
3+
4+
For operations that involve two input arguments (or more), ``kernel_float`` will first convert the inputs into a common type before applying the operation.
5+
For example, when adding ``vec<int, N>`` to a ``vec<float, N>``, both arguments must first be converted into a ``vec<float, N>``.
6+
7+
This procedure is called "type promotion" and is implemented as follows.
8+
First, all arguments are converted into a vector by calling ``into_vec``.
9+
Next, all arguments must have length ``N`` or length ``1`` and vectors of length ``1`` are resized to become length ``N``.
10+
Finally, the vector element types are promoted into a common type.
11+
12+
The rules for element type promotion in ``kernel_float`` are slightly different than in regular C++.
13+
In short, for two element types ``T`` and ``U``, the promotion rules can be summarized as follows:
14+
15+
* If one of the types is ``bool``, the result is the other type.
16+
* If one type is a floating-point type and the other is a signed or unsigned integer, the result is the floating-point type.
17+
* If both types are floating-point types, the result is the largest of the two types. An exception here is combining ``half`` and ``bfloat16``, which results in ``float``.
18+
* If both types are integer types of the same signedness, the result is the largest of the two types.
19+
* Combining a signed integer and unsigned integer type is not allowed.
20+
21+
Overview
22+
--------
23+
24+
The type promotion rules are shown in the table below.
25+
The labels are as follows:
26+
27+
* ``b``: boolean
28+
* ``iN``: signed integer of ``N`` bits (e.g., ``int``, ``long``)
29+
* ``uN``: unsigned integer of ``N`` bits (e.g., ``unsigned int``, ``size_t``)
30+
* ``fN``: floating-point type of ``N`` bits (e.g., ``float``, ``double``)
31+
* ``bf16``: bfloat16 floating-point format.
32+
33+
.. csv-table:: Type Promotion Rules.
34+
:file: promotion_table.csv

docs/guides/promotion_table.csv

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
,**b**,**i8**,**i16**,**i32**,**i64**,**u8**,**u16**,**u32**,**u64**,**f8**,**f16**,**bf16**,**f32**,**f64**
2+
**b**,b,i8,i16,i32,i64,u8,u16,u32,u64,f8,f16,bf16,f32,f64
3+
**i8**,i8,i8,i16,i32,i64,x,x,x,x,f8,f16,bf16,f32,f64
4+
**i16**,i16,i16,i16,i32,i64,x,x,x,x,f8,f16,bf16,f32,f64
5+
**i32**,i32,i32,i32,i32,i64,x,x,x,x,f8,f16,bf16,f32,f64
6+
**i64**,i64,i64,i64,i64,i64,x,x,x,x,f8,f16,bf16,f32,f64
7+
**u8**,u8,x,x,x,x,u8,u16,u32,u64,f8,f16,bf16,f32,f64
8+
**u16**,u16,x,x,x,x,u16,u16,u32,u64,f8,f16,bf16,f32,f64
9+
**u32**,u32,x,x,x,x,u32,u32,u32,u64,f8,f16,bf16,f32,f64
10+
**u64**,u64,x,x,x,x,u64,u64,u64,u64,f8,f16,bf16,f32,f64
11+
**f8**,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f16,bf16,f32,f64
12+
**f16**,f16,f16,f16,f16,f16,f16,f16,f16,f16,f16,f16,f32,f32,f64
13+
**bf16**,bf16,bf16,bf16,bf16,bf16,bf16,bf16,bf16,bf16,bf16,f32,bf16,f32,f64
14+
**f32**,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f32,f64
15+
**f64**,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64

docs/index.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,20 @@
44
:caption: Contents
55

66
Kernel Float <self>
7+
guides
78
api
89
license
910
Github repository <https://github.com/KernelTuner/kernel_float>
1011

1112

12-
.. mdinclude:: ../README.md
13+
.. include:: ../README.md
14+
:parser: myst_parser.sphinx_
1315

1416

1517

1618

1719
Indices and tables
18-
============
20+
==================
1921

2022
* :ref:`genindex`
2123
* :ref:`modindex`

include/kernel_float/fp8.h

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
#ifndef KERNEL_FLOAT_FP8_H
2+
#define KERNEL_FLOAT_FP8_H
3+
4+
#include "macros.h"
5+
6+
#if KERNEL_FLOAT_FP8_AVAILABLE
7+
#include <cuda_fp8.h>
8+
9+
#include "vector.h"
10+
11+
namespace kernel_float {
12+
KERNEL_FLOAT_DEFINE_PROMOTED_FLOAT(__nv_fp8_e4m3)
13+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(float, __nv_fp8_e4m3)
14+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(double, __nv_fp8_e4m3)
15+
16+
KERNEL_FLOAT_DEFINE_PROMOTED_FLOAT(__nv_fp8_e5m2)
17+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(float, __nv_fp8_e5m2)
18+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(double, __nv_fp8_e5m2)
19+
} // namespace kernel_float
20+
21+
#if KERNEL_FLOAT_FP16_AVAILABLE
22+
#include "fp16.h"
23+
24+
namespace kernel_float {
25+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(__half, __nv_fp8_e4m3)
26+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(__half, __nv_fp8_e5m2)
27+
} // namespace kernel_float
28+
#endif // KERNEL_FLOAT_FP16_AVAILABLE
29+
30+
#if KERNEL_FLOAT_BF16_AVAILABLE
31+
#include "bf16.h"
32+
33+
namespace kernel_float {
34+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(__nv_bfloat16, __nv_fp8_e4m3)
35+
KERNEL_FLOAT_DEFINE_PROMOTED_TYPE(__nv_bfloat16, __nv_fp8_e5m2)
36+
} // namespace kernel_float
37+
#endif // KERNEL_FLOAT_BF16_AVAILABLE
38+
39+
#endif // KERNEL_FLOAT_FP8_AVAILABLE
40+
#endif // KERNEL_FLOAT_FP8_H

include/kernel_float/macros.h

+4
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,12 @@
3232
#endif
3333

3434
#ifndef KERNEL_FLOAT_FP8_AVAILABLE
35+
#ifdef __CUDACC_VER_MAJOR__
36+
#define KERNEL_FLOAT_FP8_AVAILABLE (__CUDACC_VER_MAJOR__ >= 12)
37+
#else
3538
#define KERNEL_FLOAT_FP8_AVAILABLE (0)
3639
#endif
40+
#endif
3741

3842
#define KERNEL_FLOAT_ASSERT(expr) \
3943
do { \

include/kernel_float/prelude.h

+15-3
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
#include "bf16.h"
55
#include "constant.h"
66
#include "fp16.h"
7+
#include "fp8.h"
78
#include "vector.h"
89

910
namespace kernel_float {
@@ -66,18 +67,29 @@ KERNEL_FLOAT_TYPE_ALIAS(float16x, __half)
6667
#endif
6768

6869
#if KERNEL_FLOAT_BF16_AVAILABLE
69-
KERNEL_FLOAT_TYPE_ALIAS(bfloat16, __nv_bfloat16)
70-
KERNEL_FLOAT_TYPE_ALIAS(bf16, __nv_bfloat16)
70+
KERNEL_FLOAT_TYPE_ALIAS(bfloat16x, __nv_bfloat16)
71+
KERNEL_FLOAT_TYPE_ALIAS(bf16x, __nv_bfloat16)
72+
#endif
73+
74+
#if KERNEL_FLOAT_BF8_AVAILABLE
75+
KERNEL_FLOAT_TYPE_ALIAS(float8x, __nv_fp8_e4m3)
76+
KERNEL_FLOAT_TYPE_ALIAS(float8_e4m3x, __nv_fp8_e4m3)
77+
KERNEL_FLOAT_TYPE_ALIAS(float8_e5m2x, __nv_fp8_e5m2)
7178
#endif
7279

7380
template<size_t N>
7481
static constexpr extent<N> kextent = {};
7582

7683
template<typename... Args>
7784
KERNEL_FLOAT_INLINE kvec<promote_t<Args...>, sizeof...(Args)> make_kvec(Args&&... args) {
78-
return make_vec(std::forward<Args>(args)...);
85+
return ::kernel_float::make_vec(std::forward<Args>(args)...);
7986
};
8087

88+
template<typename V>
89+
KERNEL_FLOAT_INLINE into_vector_type<V> into_kvec(V&& input) {
90+
return ::kernel_float::into_vec(std::forward<V>(input));
91+
}
92+
8193
template<typename T = double>
8294
using kconstant = constant<T>;
8395

include/kernel_float/vector.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ struct vector: public S {
279279
* - For vector-like types (e.g., `int2`, `dim3`), it returns `vec<T, N>`.
280280
*/
281281
template<typename V>
282-
KERNEL_FLOAT_INLINE into_vector_type<V> into_vector(V&& input) {
282+
KERNEL_FLOAT_INLINE into_vector_type<V> into_vec(V&& input) {
283283
return into_vector_impl<V>::call(std::forward<V>(input));
284284
}
285285

0 commit comments

Comments
 (0)