Skip to content

Commit 984c4ce

Browse files
committedMay 3, 2022
Merge branch 'development' of github.com:gjbex/Python-for-HPC into development
2 parents 3c16f0f + aec988b commit 984c4ce

13 files changed

+5894
-943
lines changed
 

‎python_for_hpc.pptx

131 KB
Binary file not shown.

‎source-code/README.md

+6
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
This is source code that is either used in the presentation, or was developed
44
to create it. There is some material not covered in the presentation as well.
55

6+
67
## Requirements
78

89
* Python version: at least 3.6
@@ -18,6 +19,10 @@ to create it. There is some material not covered in the presentation as well.
1819
* jupyter
1920
* ipywidgets
2021

22+
* For the GPU code:
23+
* pycuda
24+
* scikit-cuda
25+
2126

2227
## What is it?
2328

@@ -39,3 +44,4 @@ to create it. There is some material not covered in the presentation as well.
3944
1. `pypy`: code to experiment with the Pypy interpreter.
4045
1. `file-formats`: influcence of file formats on performance.
4146
1. `gpu`: some examples of using GPUs.
47+
1. `performance`: general considerations about performance.

‎source-code/gpu/README.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# GPU
2+
3+
Sample code for performing computations on a GPU.
4+
5+
6+
## What is it?
7+
8+
1. `pycuda.ipynb`: jupyter notebook illustrating pyCUDA.
9+
1. `curand.ipynb`: jupyter notebook illustrating generating random
10+
numbers on a GPU.
11+
1. `scikit_cuda.ipynb`: jupyter notebook illustrating linear algebra
12+
on a GPU device.
13+
1. `numba.ipynb`: jupyter notebook illustrating using numba for
14+
GPU computing.

‎source-code/gpu/curand.ipynb

+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "2a4181f6-f103-4025-a45c-5bd31bbc3d12",
6+
"metadata": {},
7+
"source": [
8+
"# Requirements"
9+
]
10+
},
11+
{
12+
"cell_type": "code",
13+
"execution_count": 1,
14+
"id": "eac06215-e0dd-430b-a157-a5aab4904b65",
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"import math\n",
19+
"import numpy as np\n",
20+
"import pycuda.autoinit\n",
21+
"from pycuda import gpuarray\n",
22+
"from pycuda.compiler import SourceModule"
23+
]
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"id": "5832d66d-0446-499c-a17e-ffcac9bdbd0a",
28+
"metadata": {},
29+
"source": [
30+
"We determine $\\pi$ as the ratio between a circle of radius 1 and the square that circumscribes it. The area of the circle will be approximated by the number of randomly selected points that fall into it, compared to the (larger) number of points that fall into the square.\n",
31+
"\n",
32+
"If we choose $x$ and $y$ independently from a uniform distribution $[0, 1[$, then $(x, y)$ represents a point and lies in a circle with radius 1 if $x^2 + y^2 \\le 1$. Since this is only one quarter of the circle and circumscribing square, we get $\\pi$ by dividing the number of points in the circle by the total number of points, and multiplying by 4."
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"id": "32ca5892-664f-4317-a41d-b4f7a4d3f156",
38+
"metadata": {},
39+
"source": [
40+
"# Implementation"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"id": "2bc4dbd6-5018-427d-9ee9-347bfe6cd9d9",
46+
"metadata": {},
47+
"source": [
48+
"To generate random numbers for each thread on the GPU, we use the curand library, which is a C++ library. Since we use plain C for our CUDA code, we have to make sure that the header file is read ouside of an extermal C block. The `SourceModule` add this automatically by default, so we make sure this isn't done by specifying the appropriate option. In the source code, we implement our kernel in an external C blcok.\n",
49+
"\n",
50+
"The random number generator is intialized using the `curand_init` function that takes a seed as its first argument. We ensure that it is unique for each thread by adding a thread-specific constant to the clock time.\n",
51+
"\n",
52+
"Random numbers are sampled from a uniform distribution using the `curand_uniform` function."
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": 48,
58+
"id": "122a78d1-9d07-4484-8f30-a01ed58a9715",
59+
"metadata": {},
60+
"outputs": [],
61+
"source": [
62+
"source_code = '''\n",
63+
" #include <curand_kernel.h>\n",
64+
" \n",
65+
" typedef unsigned long long cu_long;\n",
66+
" \n",
67+
" extern \"C\" {\n",
68+
" __global__ void estimate_pi(cu_long nr_tries, cu_long *nr_hits) {\n",
69+
" curandState rand_state;\n",
70+
" int thread_id = blockIdx.x*blockDim.x + threadIdx.x;\n",
71+
" curand_init((cu_long) clock() + (cu_long) thread_id,\n",
72+
" (cu_long) 0, (cu_long) 0, &rand_state);\n",
73+
" float x, y;\n",
74+
" for (cu_long i = 0; i < nr_tries; ++i) {\n",
75+
" x = curand_uniform(&rand_state);\n",
76+
" y = curand_uniform(&rand_state);\n",
77+
" if (x*x + y*y < 1.0f) {\n",
78+
" nr_hits[thread_id]++;\n",
79+
" }\n",
80+
" }\n",
81+
" }\n",
82+
" }\n",
83+
"'''\n",
84+
"\n",
85+
"kernels = SourceModule(no_extern_c=True, source=source_code)\n",
86+
"pi_kernel = kernels.get_function('estimate_pi')"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "e54d0d19-9545-46b6-b79f-5a3a110e7a66",
92+
"metadata": {},
93+
"source": [
94+
"Set the number of threads per block and the number of blocks per grid, and create an array of the appropriate size to store the counts for each thread. Also specify the number of points to try"
95+
]
96+
},
97+
{
98+
"cell_type": "code",
99+
"execution_count": 44,
100+
"id": "71134a08-5195-4205-ae1c-51c7298704e7",
101+
"metadata": {},
102+
"outputs": [],
103+
"source": [
104+
"threads_per_block = 32\n",
105+
"blocks_per_grid = 512\n",
106+
"total_threads = threads_per_block*blocks_per_grid\n",
107+
"nr_hits = gpuarray.zeros((total_threads, ), dtype=np.uint64)\n",
108+
"nr_tries = np.uint64(2**24)"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"id": "84c1a177-7a4f-4b27-9a11-4b318601ac4d",
114+
"metadata": {},
115+
"source": [
116+
"Now we can execute the kernel and compute the value of $\\pi$."
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": 45,
122+
"id": "a872e28c-9256-4d50-b46c-f8381131eda1",
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"pi_kernel(nr_tries, nr_hits, grid=(blocks_per_grid, 1, 1), block=(threads_per_block, 1, 1))"
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": 46,
132+
"id": "53ddbb43-0f22-427e-83a6-c6d0264e958e",
133+
"metadata": {},
134+
"outputs": [],
135+
"source": [
136+
"pi_computed = 4.0*np.sum(nr_hits.get())/(nr_tries*total_threads)"
137+
]
138+
},
139+
{
140+
"cell_type": "markdown",
141+
"id": "19bbb167-50b8-4bdc-8c92-1fb2f69d1e1b",
142+
"metadata": {},
143+
"source": [
144+
"Checking the accuracy as comared with $\\pi$'s true value shows that it is correct upto a millionth."
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": 47,
150+
"id": "f86f0258-64c8-44a0-b8b3-5a6f5d24e362",
151+
"metadata": {},
152+
"outputs": [
153+
{
154+
"name": "stdout",
155+
"output_type": "stream",
156+
"text": [
157+
"1.0e-01 True\n",
158+
"1.0e-02 True\n",
159+
"1.0e-03 True\n",
160+
"1.0e-04 True\n",
161+
"1.0e-05 True\n",
162+
"1.0e-06 True\n",
163+
"1.0e-07 False\n",
164+
"1.0e-08 False\n",
165+
"1.0e-09 False\n",
166+
"1.0e-10 False\n",
167+
"1.0e-11 False\n",
168+
"1.0e-12 False\n"
169+
]
170+
}
171+
],
172+
"source": [
173+
"for tol in np.logspace(-1, -12, num=12):\n",
174+
" print(f'{tol:.1e} {math.isclose(pi_computed, math.pi, rel_tol=tol)}')"
175+
]
176+
}
177+
],
178+
"metadata": {
179+
"kernelspec": {
180+
"display_name": "Python 3 (ipykernel)",
181+
"language": "python",
182+
"name": "python3"
183+
},
184+
"language_info": {
185+
"codemirror_mode": {
186+
"name": "ipython",
187+
"version": 3
188+
},
189+
"file_extension": ".py",
190+
"mimetype": "text/x-python",
191+
"name": "python",
192+
"nbconvert_exporter": "python",
193+
"pygments_lexer": "ipython3",
194+
"version": "3.9.7"
195+
}
196+
},
197+
"nbformat": 4,
198+
"nbformat_minor": 5
199+
}

‎source-code/gpu/numba.ipynb

+1,583
Large diffs are not rendered by default.

‎source-code/gpu/pycuda.ipynb

100755100644
+1,112-941
Large diffs are not rendered by default.

‎source-code/gpu/scikit_cuda.ipynb

+877
Large diffs are not rendered by default.

‎source-code/mpi4py/halo.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,9 @@ def halos_to_str(left, upper, right, lower):
6565
print(f'rank {rank}, left = {left}, up = {up}, right = {right}, down = {down}')
6666
# note that the column data is stored non-contiguously, and has to be copied
6767
# before sending it
68-
send_buffer = np.array(matrix[:, 0])
68+
send_buffer = np.array(matrix[:, 0]).copy()
6969
cart_comm.Sendrecv(send_buffer, left, recvbuf=right_halo, source=right)
70-
send_buffer = np.array(matrix[:, -1])
70+
send_buffer = np.array(matrix[:, -1]).copy()
7171
cart_comm.Sendrecv(send_buffer, right, recvbuf=left_halo, source=left)
7272
# row data is contiguous and can be used as a sendbuffer without copying
7373
cart_comm.Sendrecv(matrix[0, :], up, recvbuf=lower_halo, source=down)

‎source-code/numba/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,6 @@ can be obtained without much effort.
1212
1. `Primes`: code to compute the first n prime numbers comparing a pure Python
1313
implementation with numba JIT and eager JIT.
1414
1. `Ufunc`: defining a numpy ufunc using numba.
15+
1. `numba_parallel.ipynb`: jupyter notebook experimenting with numba's
16+
parallel capabilities.
17+
1. `computing_pi.ipynb`: jupyter notebook illustrating speedup by numba.

‎source-code/numba/computing_pi.ipynb

+176
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "dad1f0e2-bc13-4d67-b40e-e5f8946e7e6f",
6+
"metadata": {},
7+
"source": [
8+
"# Requirements"
9+
]
10+
},
11+
{
12+
"cell_type": "code",
13+
"execution_count": 1,
14+
"id": "694413b5-380c-4239-8bce-09b90df7fe79",
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"from numba import njit\n",
19+
"import numpy as np\n",
20+
"import random"
21+
]
22+
},
23+
{
24+
"cell_type": "markdown",
25+
"id": "aee5369d-7b69-4fb4-8567-52bd8e92571b",
26+
"metadata": {},
27+
"source": [
28+
"# Random $\\pi$"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"id": "70202ae4-ad82-4c91-aea0-a3aeccfb7bdc",
34+
"metadata": {},
35+
"source": [
36+
"Compute $\\pi$ by generating random points in a square and counting how many there are in the circle inscribed in the square."
37+
]
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": 5,
42+
"id": "b5f095c0-58f6-4098-829c-6e696ae2a2bd",
43+
"metadata": {},
44+
"outputs": [],
45+
"source": [
46+
"def compute_pi(nr_tries):\n",
47+
" hits = 0\n",
48+
" for _ in range(nr_tries):\n",
49+
" x = random.random()\n",
50+
" y = random.random()\n",
51+
" if x**2 + y**2 < 1.0:\n",
52+
" hits += 1\n",
53+
" return 4.0*hits/nr_tries"
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": 6,
59+
"id": "805f4c9f-5d19-486a-988e-bf103683c37c",
60+
"metadata": {},
61+
"outputs": [],
62+
"source": [
63+
"@njit\n",
64+
"def compute_pi_jit(nr_tries):\n",
65+
" hits = 0\n",
66+
" for _ in range(nr_tries):\n",
67+
" x = random.random()\n",
68+
" y = random.random()\n",
69+
" if x**2 + y**2 < 1.0:\n",
70+
" hits += 1\n",
71+
" return 4.0*hits/nr_tries"
72+
]
73+
},
74+
{
75+
"cell_type": "code",
76+
"execution_count": 32,
77+
"id": "f7a7bb7e-6ad1-4b6d-bb5b-d99ebedf7991",
78+
"metadata": {},
79+
"outputs": [],
80+
"source": [
81+
"@njit(['float64(int64)'])\n",
82+
"def compute_pi_jit_sign(nr_tries):\n",
83+
" hits = 0\n",
84+
" for _ in range(nr_tries):\n",
85+
" x = random.random()\n",
86+
" y = random.random()\n",
87+
" if x**2 + y**2 < 1.0:\n",
88+
" hits += 1\n",
89+
" return 4.0*hits/nr_tries"
90+
]
91+
},
92+
{
93+
"cell_type": "code",
94+
"execution_count": 9,
95+
"id": "13f3c23d-674e-43b2-b503-a83c20cf5075",
96+
"metadata": {},
97+
"outputs": [
98+
{
99+
"name": "stdout",
100+
"output_type": "stream",
101+
"text": [
102+
"27.1 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
103+
]
104+
}
105+
],
106+
"source": [
107+
"%timeit compute_pi(100_000)"
108+
]
109+
},
110+
{
111+
"cell_type": "code",
112+
"execution_count": 10,
113+
"id": "de965fa5-b3e3-4548-8d41-661baf6abe65",
114+
"metadata": {},
115+
"outputs": [
116+
{
117+
"name": "stdout",
118+
"output_type": "stream",
119+
"text": [
120+
"687 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
121+
]
122+
}
123+
],
124+
"source": [
125+
"%timeit compute_pi_jit(100_000)"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": 34,
131+
"id": "f240d35d-2fdb-45db-9e59-d392887c9a16",
132+
"metadata": {},
133+
"outputs": [
134+
{
135+
"name": "stdout",
136+
"output_type": "stream",
137+
"text": [
138+
"685 µs ± 8.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
139+
]
140+
}
141+
],
142+
"source": [
143+
"%timeit compute_pi_jit_sign(np.int64(100_000))"
144+
]
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "da96c2f7-afc2-4122-ad80-62d7c57272e9",
149+
"metadata": {},
150+
"source": [
151+
"Using numba's just-in-time compiler significantly speeds up the computations."
152+
]
153+
}
154+
],
155+
"metadata": {
156+
"kernelspec": {
157+
"display_name": "Python 3 (ipykernel)",
158+
"language": "python",
159+
"name": "python3"
160+
},
161+
"language_info": {
162+
"codemirror_mode": {
163+
"name": "ipython",
164+
"version": 3
165+
},
166+
"file_extension": ".py",
167+
"mimetype": "text/x-python",
168+
"name": "python",
169+
"nbconvert_exporter": "python",
170+
"pygments_lexer": "ipython3",
171+
"version": "3.9.7"
172+
}
173+
},
174+
"nbformat": 4,
175+
"nbformat_minor": 5
176+
}

‎source-code/numba/numba_parallel.ipynb

+438
Large diffs are not rendered by default.

‎source-code/performance/README.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Performance
2+
3+
These are some resources about general performance considerations.
4+
5+
6+
## What is it?
7+
8+
1. `number_puzzle.ipynb`: jupyter notebbook solving a number puzzle
9+
in a variety of ways, shwoing some aspects of performance
10+
optimization.

‎source-code/performance/number_puzzle.ipynb

+1,474
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)
Please sign in to comment.