You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ILGPU provides an interface for programming GPUs that uses a sane programming language, C#.
3
+
ILGPU provides an interface for programming GPUs that uses a sane programming language, C#.
4
4
ILGPU takes your normal C# code (perhaps with a few small changes) and transforms it into either
5
-
OpenCL or PTX (think CUDA assembly). This combines all the power, flexibility, and performance of
6
-
CUDA / OpenCL with the ease of use of C#.
5
+
OpenCL or PTX (think CUDA assembly). This combines all the power, flexibility, and performance of
6
+
CUDA / OpenCL with the ease of use of C#.
7
7
8
8
# Setting up ILGPU.
9
9
10
10
This tutorial is a little different now because we are going to be looking at the ILGPU 1.0.0.
11
11
12
-
ILGPU should work on any 64-bit platform that .Net supports. I have even used it on the inexpensive nvidia jetson nano with pretty decent cuda performance.
12
+
ILGPU should work on any 64-bit platform that .Net supports. I have even used it on the inexpensive nvidia jetson nano
13
+
with pretty decent cuda performance.
13
14
14
-
Technically ILGPU supports F# but I don't use F# enough to really tutorialize it. I will be sticking to C# in these tutorials.
15
+
Technically ILGPU supports F# but I don't use F# enough to really tutorialize it. I will be sticking to C# in these
16
+
tutorials.
15
17
16
18
### High level setup steps.
17
19
18
20
If enough people care I can record a short video of this process, but I expect this will be enough for most programmers.
19
21
20
-
1. Install the most recent [.Net SDK](https://dotnet.microsoft.com/download/visual-studio-sdks) for your chosen platform.
22
+
1. Install the most recent [.Net SDK](https://dotnet.microsoft.com/download/visual-studio-sdks) for your chosen
23
+
platform.
21
24
2. Create a new C# project.
22
-

25
+

The following is my understanding of the performance quirks with GPUs due to memory and cache and coalescent memory access.
3
+
The following is my understanding of the performance quirks with GPUs due to memory and cache and coalescent memory
4
+
access.
4
5
Just like with Primer 01, if you have a decent understanding of CUDA or OpenCL you can skip this.
5
6
6
7
Ok, buckle up.
7
8
8
9
## Memory and bandwidth and threads. Oh my!
9
10
10
11
### Computers need memory, and memory is slow<sup>0</sup>. (Like, really slow)
12
+
11
13
Back in the day (I assume, the first computer I remember using had DDR-200) computer memory
12
-
was FAST. Most of the time the limiting factor was the CPU, though correctly timing video output was also
13
-
a driving force. As an example, the C64 ran the memory at 2x the CPU frequency so the VIC-II
14
-
graphics chip could share the CPU memory by stealing half the cycles. In the almost 40 years since the C64, humanity
15
-
has gotten much better at making silicon and precious metals do our bidding. Feeding
14
+
was FAST. Most of the time the limiting factor was the CPU, though correctly timing video output was also
15
+
a driving force. As an example, the C64 ran the memory at 2x the CPU frequency so the VIC-II
16
+
graphics chip could share the CPU memory by stealing half the cycles. In the almost 40 years since the C64, humanity
17
+
has gotten much better at making silicon and precious metals do our bidding. Feeding
16
18
data into the CPU from memory has become the slow part. Memory is slow.
17
19
18
20
Why is memory slow? To be honest, it seems to me that it's caused by two things:
19
21
20
22
1. Physics<br/>
21
-
Programmers like to think of computers as an abstract thing, a platonic ideal.
22
-
But here in the real world there are no spherical cows, no free lunch. Memory values are ACTUAL
23
-
ELECTRONS traveling through silicon and precious metals.
23
+
Programmers like to think of computers as an abstract thing, a platonic ideal.
24
+
But here in the real world there are no spherical cows, no free lunch. Memory values are ACTUAL
25
+
ELECTRONS traveling through silicon and precious metals.
24
26
25
27
In general, the farther from the thing doing the math the ACTUAL ELECTRONS are the slower it is
26
28
to access.
27
29
28
30
2. We ~~need~~ want a lot of memory.<br/>
29
-
We can make memory that is almost as fast as our processors, but it must literally be directly made into the processor cores in silicon.
30
-
Not only is this is very expensive, the more memory in silicon the less room for processor stuff.
31
+
We can make memory that is almost as fast as our processors, but it must literally be directly made into the
32
+
processor cores in silicon.
33
+
Not only is this is very expensive, the more memory in silicon the less room for processor stuff.
31
34
32
35
### How do processors deal with slow memory?
33
36
34
-
This leads to an optimization problem. Modern processor designers use a complex system of tiered
37
+
This leads to an optimization problem. Modern processor designers use a complex system of tiered
35
38
memory consisting of several layers of small, fast, on-die memory and large, slow, distant, off-die memory.
36
39
37
-
A processor can also perform a few tricks to help us deal with the fact that memory is slow.
38
-
One example is prefetching. If a program uses memory at location X it probably will use the
39
-
memory at location X+1 therefore the processor *prefetchs* a whole chunk of memory and puts it in
40
-
the cache, closer to the processor. This way if you do need the memory at X+1 it is already in cache.
40
+
A processor can also perform a few tricks to help us deal with the fact that memory is slow.
41
+
One example is prefetching. If a program uses memory at location X it probably will use the
42
+
memory at location X+1 therefore the processor *prefetchs* a whole chunk of memory and puts it in
43
+
the cache, closer to the processor. This way if you do need the memory at X+1 it is already in cache.
41
44
42
-
I am getting off topic. For a more detailed explanation, see this thing I found on [google](https://formulusblack.com/blog/compute-performance-distance-of-data-as-a-measure-of-latency/).
45
+
I am getting off topic. For a more detailed explanation, see this thing I found
46
+
on [google](https://formulusblack.com/blog/compute-performance-distance-of-data-as-a-measure-of-latency/).
43
47
44
48
# What does this mean for ILGPU?
45
49
46
-
#### GPU's have memory, and memory is slow.
50
+
#### GPU's have memory, and memory is slow.
47
51
48
-
GPUs on paper have TONS of memory bandwidth, my GPU has around 10x the memory bandwidth my CPU does. Right? Yeah...
52
+
GPUs on paper have TONS of memory bandwidth, my GPU has around 10x the memory bandwidth my CPU does. Right? Yeah...
49
53
50
54
###### Kinda
51
-
If we go back into spherical cow territory and ignore a ton of important details, we can illustrate an
55
+
56
+
If we go back into spherical cow territory and ignore a ton of important details, we can illustrate an
52
57
important quirk in GPU design that directly impacts performance.
53
58
54
-
My CPU, a Ryzen 5 3600 with dual channel DDR4, gets around 34 GB/s of memory bandwidth. The GDDR6 in my GPU, a RTX 2060, gets around 336 GB/s of memory bandwidth.
59
+
My CPU, a Ryzen 5 3600 with dual channel DDR4, gets around 34 GB/s of memory bandwidth. The GDDR6 in my GPU, a RTX 2060,
0 commit comments