Skip to content

Commit 34bf54c

Browse files
committed
slides and videos
1 parent d4f012e commit 34bf54c

22 files changed

+2633
-0
lines changed

slides/2_26/11-Convolution.key

42.4 MB
Binary file not shown.

slides/2_26/11-Convolution.pdf

25.7 MB
Binary file not shown.

slides/2_26/channels.ipynb

+295
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"slideshow": {
7+
"slide_type": "slide"
8+
}
9+
},
10+
"source": [
11+
"# Multiple Input and Output Channels\n",
12+
"\n",
13+
"**Multiple Input Channels**\n",
14+
"\n",
15+
"![Cross-correlation with 2 input channels](http://www.d2l.ai/_images/conv_multi_in.svg)"
16+
]
17+
},
18+
{
19+
"cell_type": "code",
20+
"execution_count": 1,
21+
"metadata": {
22+
"attributes": {
23+
"classes": [],
24+
"id": "",
25+
"n": "1"
26+
}
27+
},
28+
"outputs": [],
29+
"source": [
30+
"import d2l\n",
31+
"from mxnet import nd\n",
32+
"\n",
33+
"def corr2d_multi_in(X, K):\n",
34+
" # First, traverse along the 0th dimension (channel dimension) of X and K. \n",
35+
" # Then, add them together by using * \n",
36+
" return nd.add_n(*[d2l.corr2d(x, k) for x, k in zip(X, K)])"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {
42+
"slideshow": {
43+
"slide_type": "slide"
44+
}
45+
},
46+
"source": [
47+
"We can construct the input array `X` and the kernel array `K` of the above diagram to validate the output of the cross-correlation operation."
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": 2,
53+
"metadata": {
54+
"attributes": {
55+
"classes": [],
56+
"id": "",
57+
"n": "2"
58+
}
59+
},
60+
"outputs": [
61+
{
62+
"data": {
63+
"text/plain": [
64+
"\n",
65+
"[[ 56. 72.]\n",
66+
" [104. 120.]]\n",
67+
"<NDArray 2x2 @cpu(0)>"
68+
]
69+
},
70+
"execution_count": 2,
71+
"metadata": {},
72+
"output_type": "execute_result"
73+
}
74+
],
75+
"source": [
76+
"X = nd.array([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],\n",
77+
" [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])\n",
78+
"K = nd.array([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])\n",
79+
"\n",
80+
"corr2d_multi_in(X, K)"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {
86+
"slideshow": {
87+
"slide_type": "slide"
88+
}
89+
},
90+
"source": [
91+
"**Multiple Output Channels**\n",
92+
"\n",
93+
"For multiple output channels we simply generate multiple outputs and then stack them together. "
94+
]
95+
},
96+
{
97+
"cell_type": "code",
98+
"execution_count": 3,
99+
"metadata": {
100+
"attributes": {
101+
"classes": [],
102+
"id": "",
103+
"n": "3"
104+
}
105+
},
106+
"outputs": [],
107+
"source": [
108+
"def corr2d_multi_in_out(X, K):\n",
109+
" # Traverse along the 0th dimension of K, and each time, perform cross-correlation \n",
110+
" # operations with input X. All of the results are merged together using the stack function.\n",
111+
" return nd.stack(*[corr2d_multi_in(X, k) for k in K])"
112+
]
113+
},
114+
{
115+
"cell_type": "markdown",
116+
"metadata": {},
117+
"source": [
118+
"We construct a convolution kernel with 3 output channels by concatenating the kernel array `K` with `K+1` (plus one for each element in `K`) and `K+2`."
119+
]
120+
},
121+
{
122+
"cell_type": "code",
123+
"execution_count": 4,
124+
"metadata": {
125+
"attributes": {
126+
"classes": [],
127+
"id": "",
128+
"n": "4"
129+
}
130+
},
131+
"outputs": [
132+
{
133+
"data": {
134+
"text/plain": [
135+
"(3, 2, 2, 2)"
136+
]
137+
},
138+
"execution_count": 4,
139+
"metadata": {},
140+
"output_type": "execute_result"
141+
}
142+
],
143+
"source": [
144+
"K = nd.stack(K, K + 1, K + 2)\n",
145+
"K.shape"
146+
]
147+
},
148+
{
149+
"cell_type": "markdown",
150+
"metadata": {
151+
"slideshow": {
152+
"slide_type": "slide"
153+
}
154+
},
155+
"source": [
156+
"We can have multiple input and output channels."
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": 5,
162+
"metadata": {
163+
"attributes": {
164+
"classes": [],
165+
"id": "",
166+
"n": "5"
167+
}
168+
},
169+
"outputs": [
170+
{
171+
"name": "stdout",
172+
"output_type": "stream",
173+
"text": [
174+
"(2, 3, 3)\n",
175+
"(3, 2, 2, 2)\n",
176+
"\n",
177+
"[[[ 56. 72.]\n",
178+
" [104. 120.]]\n",
179+
"\n",
180+
" [[ 76. 100.]\n",
181+
" [148. 172.]]\n",
182+
"\n",
183+
" [[ 96. 128.]\n",
184+
" [192. 224.]]]\n",
185+
"<NDArray 3x2x2 @cpu(0)>\n"
186+
]
187+
}
188+
],
189+
"source": [
190+
"print(X.shape)\n",
191+
"print(K.shape)\n",
192+
"print(corr2d_multi_in_out(X, K))"
193+
]
194+
},
195+
{
196+
"cell_type": "markdown",
197+
"metadata": {
198+
"slideshow": {
199+
"slide_type": "slide"
200+
}
201+
},
202+
"source": [
203+
"## $1\\times 1$ Convolutions\n",
204+
"\n",
205+
"![1x1 convolutions](http://www.d2l.ai/_images/conv_1x1.svg)"
206+
]
207+
},
208+
{
209+
"cell_type": "code",
210+
"execution_count": 6,
211+
"metadata": {
212+
"attributes": {
213+
"classes": [],
214+
"id": "",
215+
"n": "6"
216+
}
217+
},
218+
"outputs": [],
219+
"source": [
220+
"def corr2d_multi_in_out_1x1(X, K):\n",
221+
" c_i, h, w = X.shape\n",
222+
" c_o = K.shape[0]\n",
223+
" X = X.reshape((c_i, h * w))\n",
224+
" K = K.reshape((c_o, c_i))\n",
225+
" Y = nd.dot(K, X) # Matrix multiplication in the fully connected layer.\n",
226+
" return Y.reshape((c_o, h, w))"
227+
]
228+
},
229+
{
230+
"cell_type": "markdown",
231+
"metadata": {
232+
"slideshow": {
233+
"slide_type": "slide"
234+
}
235+
},
236+
"source": [
237+
"This is equivalent to cross-correlation with an appropriately narrow $1\\times 1$ kernel."
238+
]
239+
},
240+
{
241+
"cell_type": "code",
242+
"execution_count": 7,
243+
"metadata": {
244+
"attributes": {
245+
"classes": [],
246+
"id": "",
247+
"n": "7"
248+
}
249+
},
250+
"outputs": [
251+
{
252+
"data": {
253+
"text/plain": [
254+
"True"
255+
]
256+
},
257+
"execution_count": 7,
258+
"metadata": {},
259+
"output_type": "execute_result"
260+
}
261+
],
262+
"source": [
263+
"X = nd.random.uniform(shape=(3, 3, 3))\n",
264+
"K = nd.random.uniform(shape=(2, 3, 1, 1))\n",
265+
"\n",
266+
"Y1 = corr2d_multi_in_out_1x1(X, K)\n",
267+
"Y2 = corr2d_multi_in_out(X, K)\n",
268+
"\n",
269+
"(Y1 - Y2).norm().asscalar() < 1e-6"
270+
]
271+
}
272+
],
273+
"metadata": {
274+
"celltoolbar": "Slideshow",
275+
"kernelspec": {
276+
"display_name": "Python 3",
277+
"language": "python",
278+
"name": "python3"
279+
},
280+
"language_info": {
281+
"codemirror_mode": {
282+
"name": "ipython",
283+
"version": 3
284+
},
285+
"file_extension": ".py",
286+
"mimetype": "text/x-python",
287+
"name": "python",
288+
"nbconvert_exporter": "python",
289+
"pygments_lexer": "ipython3",
290+
"version": "3.7.2"
291+
}
292+
},
293+
"nbformat": 4,
294+
"nbformat_minor": 2
295+
}

slides/2_26/channels.pdf

101 KB
Binary file not shown.

0 commit comments

Comments
 (0)