Skip to content

Commit aa09cb4

Browse files
authoredJan 30, 2022
Add section on autodiff / MyGrad (#190)
* add autodiff section * improve autodiff section
1 parent 13fa40e commit aa09cb4

File tree

4 files changed

+448
-0
lines changed

4 files changed

+448
-0
lines changed
 
+444
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,444 @@
1+
---
2+
jupyter:
3+
jupytext:
4+
text_representation:
5+
extension: .md
6+
format_name: markdown
7+
format_version: '1.3'
8+
jupytext_version: 1.13.6
9+
kernelspec:
10+
display_name: Python 3
11+
language: python
12+
name: python3
13+
---
14+
15+
<!-- #raw raw_mimetype="text/restructuredtext" -->
16+
.. meta::
17+
:description: MyGrad is a library that provides drop-in automatic differentiation for NumPy
18+
<!-- #endraw -->
19+
20+
<div class="alert alert-info">
21+
22+
**A Note to the Reader**:
23+
24+
This section requires some basic familiarity with Calculus; the reader will be expected to know what it means to take the derivative of a function, and to have some familiarity with [Liebnitz notation](https://en.wikipedia.org/wiki/Leibniz%27s_notation) for representing derivatives.
25+
26+
</div>
27+
28+
29+
# Automatic Differentiation
30+
31+
<!-- #region -->
32+
33+
(Full disclosure: I created MyGrad, which we will be discussing here. Like PLYMI, MyGrad is a completely free and open-source educational resource.)
34+
35+
This section is not about the essentials of NumPy, rather it is about a 3rd party library, [MyGrad](https://github.com/rsokl/MyGrad), that adds a new capability to NumPy. It adds automatic differentiation: the ability to algorithmically evaluate derivatives of functions.
36+
37+
Automatic differentiation (a.k.a autodiff) is an important technology for scientific computing and machine learning, it enables us to measure rates of change (or "cause and effect") through our code via the derivatives of the mathematical functions that our code computes.
38+
Autodiff is proving to be so crucial to advancements in STEM-computing that it ought to be introduced to audiences early in their numerical computing journeys.
39+
This is the motivation for including this section in PLYMI's NumPy module.
40+
41+
An automatic differentiation library provides its users with a suite of mathematical functions and tools that are specially designed: any mathematical computation that you perform with this library can be used to also compute the *derivatives* of that result.
42+
To help paint a picture of this, consider the following psuedocode where we use an autodiff library to compute $f(x) = \sqrt{x}$ evaluated at $x=1$, as well as the derivative $\mathrm{d}f/\mathrm{d}x = 1/(2\sqrt{x})$, also evaluated at $x=1$.
43+
44+
```python
45+
# pseudocode illustrating autodiff in action
46+
>>> from autodiff_libray import sqrt, derivative
47+
48+
>>> x = 1.0
49+
>>> f = sqrt(x)
50+
>>> df_dx = derivative(f, x)
51+
52+
>>> f, df_dx
53+
(1.0, 0.5)
54+
```
55+
56+
See that we did not need to know or derive the fact that $\mathrm{d}f/\mathrm{d}x = 1/(2\sqrt{x})$ -- the autodiff library does this for us!
57+
This is what sets `autodiff_libray.sqrt` apart from `math.sqrt` from Python's standard library.
58+
59+
60+
Presently, some of the most popular Python-centric autodiff libraries include [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/), and [JAX](https://jax.readthedocs.io/en/latest/jax.numpy.html). Among these "industrial-grade" autodiff libraries, JAX strives provide the most NumPy-like experience. [MyGrad](https://github.com/rsokl/MyGrad) takes this one step further, and provides true drop-in automatic differentiation to NumPy.
61+
62+
<!-- #endregion -->
63+
64+
65+
## Introduction to MyGrad
66+
67+
<!-- #region -->
68+
69+
Install MyGrad into your Python environment. Open your terminal, activate your desired Python environment, and run the following command.
70+
71+
```shell
72+
pip install mygrad
73+
```
74+
75+
76+
Let's jump right in with a simple example of using MyGrad to evaluate the derivative of a function at a specific point.
77+
We'll take our function to be $f(x)=x^2$, and compute its instantaneous slope at $x=5$, i.e. $\frac{\mathrm{d}f}{\mathrm{d}x}\big|_{x=5}$.
78+
The derivative of this function is $\frac{\mathrm{d}f}{\mathrm{d}x}=2x$, thus $\frac{\mathrm{d}f}{\mathrm{d}x}\big|_{x=5} = 10$.
79+
Let's reproduce this result via auto-differentiation using MyGrad.
80+
81+
We begin by creating a `mygrad.Tensor`.
82+
This is MyGrad's analog to [numpy's ndarray](https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/IntroducingTheNDarray.html).
83+
[MyGrad's tensor](https://mygrad.readthedocs.io/en/latest/tensor.html) behaves just like NumPy's array in just about every way that you can think of, e.g. it supports multi-dimensional indexing, reshaping, and vectorized operations with broadcasting semantics, but it is also capable of facilitating automatic differentiation.
84+
This tensor will *store the point(s) at which we wish to evaluate our function and its derivative*.
85+
86+
```python
87+
# `mygrad.Tensor` behaves like `numpy.array` but it supports auto-diff
88+
>>> import mygrad as mg
89+
>>> x = mg.tensor(5.0)
90+
>>> x
91+
Tensor(5.0)
92+
```
93+
94+
We can then pass this tensor directly into NumPy's mathematical functions.
95+
In this example, our function is $f(x)=x^2$.
96+
We can compute this just as we would with NumPy: either with `x ** 2` or with `numpy.square(x)`.
97+
98+
```python
99+
# evaluating f(5)
100+
>>> fx = x ** 2
101+
>>> fx
102+
Tensor(25.0)
103+
```
104+
105+
`fx` stores the value of our function -- as a `Tensor` -- at the given evaluation points, which in this case is $f(5)=5^2=25$.
106+
107+
Now we can use MyGrad to evaluate the derivative of $f(x)$ at $x=5$.
108+
Invoking `fx.backward()` instructs MyGrad to evaluate the derivative of `fx` *for each variable that* `fx` *depends on* -- the derivatives of multivariable functions can also be computed.
109+
In this case, $x$ is the only such variable.
110+
111+
```python
112+
# trigger auto-differentiation of `fx` with respect to
113+
# all of the variables that it depends on
114+
>>> fx.backward()
115+
```
116+
117+
The value of $\frac{\mathrm{d}f}{\mathrm{d}x}\big|_{x=5}$ is stored in the attribute `x.grad`.
118+
119+
```python
120+
# accessing df/dx @ x=5
121+
>>> x.grad
122+
array(10.)
123+
```
124+
As expected, MyGrad computes the appropriate value for the evaluated derivative: $\frac{\mathrm{d}f}{\mathrm{d}x}\big|_{x=5}=2 \times 5=10$.
125+
Note that all `Tensor` instances have a `grad` attribute, but prior to invoking `fx.backward()`, `x.grad` would have simply returned `None`.
126+
127+
<!-- #endregion -->
128+
129+
It is important to reiterate that MyGrad *never gives us the actual function* $\frac{\mathrm{d}f}{\mathrm{d}x}$; it only computes the derivative evaluated at a specific input $x=10$.
130+
131+
132+
### MyGrad Adds "Drop-In" AutoDiff to NumPy
133+
134+
135+
<!-- #region -->
136+
137+
MyGrad's functions are intentionally designed to mirror NumPy's functions almost exactly.
138+
In fact, for all of the NumPy functions that MyGrad mirrors, we can pass a tensor to a NumPy function and it will be "coerced" into returning a tensor instead of a NumPy array – thus we can differentiate through NumPy functions!
139+
140+
```python
141+
# showing off "drop-in" autodiff through NumPy functions
142+
>>> import numpy as np
143+
144+
>>> x = mg.tensor(3.0)
145+
>>> y = np.square(x) # note that we are using a numpy function here!
146+
>>> y # y is a tensor, not a numpy array
147+
Tensor(9.)
148+
149+
>>> y.backward() # compute derivatives of y
150+
>>> x.grad # stores dy/dx @ x=3
151+
array(6.)
152+
```
153+
<!-- #endregion -->
154+
155+
<!-- #region -->
156+
How does this work?
157+
MyGrad's tensor is able to [tell NumPy's function to actually call a MyGrad function](https://numpy.org/neps/nep-0018-array-function-protocol.html).
158+
That is, the expression
159+
160+
```python
161+
np.square(a_mygrad_tensor)
162+
```
163+
164+
*actually* calls
165+
166+
```python
167+
mg.square(a_mygrad_tensor)
168+
```
169+
170+
under the hood.
171+
Not only is this convenient, but it also means that you can take a complex function that is written in terms of numpy functions and pass a tensor through it so that you can differentiate that function!
172+
173+
```python
174+
from some_library import complicated_numpy_function
175+
x = mg.tensor(...)
176+
out_tensor = complicated_numpy_function(x)
177+
out_tensor.backward() # compute d(complicated_numpy_function) / dx !
178+
```
179+
<!-- #endregion -->
180+
181+
182+
## Vectorized Auto-Differentiation
183+
184+
<!-- #region -->
185+
Like NumPy's array, MyGrad's tensor supports [vectorized operations](https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html), allowing us to evaluate the derivative of a function at multiple points simultaneously.
186+
Let's again take the function $f(x)=x^2$, which has the derivative $\frac{\mathrm{d}f}{\mathrm{d}x}=2x$.
187+
Now, instead of passing in a single number to `Tensor`, we can pass in a list of values corresponding to all the points at which we want the compute the derivative.
188+
We can then find the instantaneous slope of our function at these points, just as before. First we will pass `x` into our function of interest, namely $f(x)=x^2$.
189+
190+
```python
191+
# using vectorized operations to evaluate a function
192+
# at multiple locations
193+
>>> x = mg.tensor([2.0, -4.0, 1.0, 3.0])
194+
>>> fx = x ** 2
195+
>>> fx
196+
Tensor([ 4., 16., 1., 9.])
197+
```
198+
199+
Here MyGrad vectorizes the operation, performing it element-wise:
200+
201+
\begin{equation}
202+
f\big([2,\, -4,\, 1,\, 3]\big) = \big[f(2),\, f(-4),\, f(1),\, f(3)\big].
203+
\end{equation}
204+
205+
We can elegantly exploit this vectorization to find the derivative of $f(x)$ evaluated at each point in `x` by invoking `fx.backward()`.
206+
This will trigger the vectorized computation
207+
208+
\begin{equation}
209+
\bigg[\frac{\mathrm{d}f}{\mathrm{d}x}\bigg|_{x=2},\: \frac{\mathrm{d}f}{\mathrm{d}x}\bigg|_{x=-4},\: \frac{\mathrm{d}f}{\mathrm{d}x}\bigg|_{x=1},\: \frac{\mathrm{d}f}{\mathrm{d}x}\bigg|_{x=3} \bigg],
210+
\end{equation}
211+
212+
which will be stored in `x.grad`.
213+
It is important to recognize that `x.grad[i]` stores the derivative of `fx` evaluated at `x[i]`.
214+
215+
```python
216+
# Trigger vectorized auto-differentiation
217+
# Computes the instantaneous slope of
218+
# f(x) = x ** 2 at 2, 4, 1, and 3
219+
>>> fx.backward()
220+
>>> x.grad # df/dx @ x = 2, -4, 1, and 3, respectively
221+
array([ 4., -8., 2., 6.])
222+
```
223+
224+
As expected, MyGrad finds the appropriate value for the derivative evaluated at each respective element in `x`.
225+
<!-- #endregion -->
226+
227+
<div class="alert alert-info">
228+
229+
230+
## Visualizing the Derivative
231+
232+
The following code block demonstrates how easy it is to visualize a function's derivative by using MyGrad.
233+
Note MyGrad's `Tensor` stores a NumPy-array of its data, which can be accessed via the `.data` attribute.
234+
Any time a library needs to be passed a NumPy array, you can access this array from a tensor through this attribute.
235+
236+
Study the plot displayed below: notice that the derivative is always $0$ when the function has a horizontal slope, and that the derivative takes on a positive value wherever the parent function has a positive slope.
237+
238+
```python
239+
import mygrad as mg
240+
import numpy as np
241+
import matplotlib.pyplot as plt
242+
%matplotlib inline
243+
244+
def f(x):
245+
return np.sin(2 * x) * np.cos(x) * np.exp(-x / 3) * 100
246+
247+
def plot_func_and_deriv(x, func):
248+
fig, ax = plt.subplots()
249+
250+
x = mg.tensor(x)
251+
y = func(x) # compute f(x)
252+
y.backward() # compute df/dx
253+
254+
# plot f(x) vs x
255+
ax.plot(x.data, y.data, label="f(x)")
256+
257+
# plot df/dx vs x
258+
ax.plot(x.data, x.grad, ls="--", label="df/dx")
259+
ax.grid(True)
260+
ax.legend()
261+
return fig, ax
262+
263+
# We will plot f(x) and df/dx on the domain
264+
# [0, 10] using 10,000 evenly-spaced points
265+
x = mg.linspace(0, 10, 10000)
266+
267+
plot_func_and_deriv(x, f);
268+
```
269+
270+
## Seek and Derive
271+
272+
<!-- #region -->
273+
Computers equipped with automatic differentiation libraries can make short work of derivatives that are well-beyond the reach of mere mortals.
274+
Take the pathological function
275+
\begin{equation}
276+
f(x)=e^{(\arctan(82x^3+\ln(x)))}\sqrt{25x^{\frac{1}{22930}}+39e^{\frac{2}{x}}-\sin(x)},
277+
\end{equation}
278+
279+
the derivative of which would be miserable to do by hand.
280+
Thankfully we can have MyGrad compute the derivative at a collection of points for us, just as we did before.
281+
282+
```python
283+
# Tensor containing the values x = 1, 2, ..., 10
284+
>>> x = mg.arange(1.0, 11.0)
285+
286+
# Evaluated function at points x = 1, 2, ..., 10
287+
>>> fx = np.exp(np.arctan(82 * x ** 3 + np.log(x)))
288+
>>> fx *= np.sqrt(25 * x ** (1 / 22930) + 39 * np.exp(2 / x) - np.sin(x))
289+
290+
>>> fx.backward()
291+
292+
>>> x.grad # df/dx evaluated at x = 1, 2, ..., 10
293+
array([-7.44764313e+01, -1.09475963e+01, -3.78281290e+00, -1.86451297e+00,
294+
-1.29207692e+00, -1.07197583e+00, -7.90459238e-01, -3.96212428e-01,
295+
-8.16203127e-02, -3.17648949e-02])
296+
```
297+
298+
Even though it would be a pain to differentiate $f(x)$ by hand, MyGrad can handle taking the derivative with no problems.
299+
To find the derivative of a complex function, we simply must chain together the relevant functions and sit back – MyGrad will handle the rest.
300+
It accomplishes this feat by dutifully applying the chain rule over and over, using a simple algorithm called "back-propagation".
301+
The authors of MyGrad had to write down the symbolic derivative for each elementary function (e.g., $e^x$, $\sqrt{x}$, $\arctan(x)$, etc.), but MyGrad's code is responsible for systematically carrying out the chain rule to evaluate derivatives of arbitrarily-complex compositions of these functions.
302+
<!-- #endregion -->
303+
304+
<div class="alert alert-info">
305+
306+
**Reading Comprehension: Auto-differentiation**:
307+
308+
Using MyGrad, compute the derivatives of the following functions.
309+
Have MyGrad evaluate the derivatives on the interval $[-2,4]$ at $30$ evenly spaced points using `mygrad.linspace`.
310+
Additionally, plot these functions and their derivatives on the same domains, but using more densely-spaced points
311+
312+
* $f(x)=\frac{e^x}{e^x+1}$
313+
* $f(x)=e^{-\frac{(x-1)^2}{10}}$
314+
* $f(x)=\frac{\sin(x)}{x}-x^2$
315+
* $f(x)=9\sqrt{1+\frac{x^2}{9}}-9$
316+
317+
</div>
318+
319+
320+
## Applying Automatic Differentiation: Solving Optimization Problems
321+
<!-- #region -->
322+
323+
We are now familiar with what automatic differentiation is and what it does, but *why* is it so useful?
324+
One of the "killer applications" of autodiff libraries is that they help us solve challenging numerical optimization problems.
325+
These problems often read as: suppose we have some bounded, finite function $f(x)$; find the value of $x$ that *minimizes* $f(x)$.
326+
That is, the "optimum" that we want to find is the value $x_\mathrm{min}$ such that $f(x_\mathrm{min}) \leq f(x)$ for all other $x$.
327+
328+
329+
How does automatic differentiation help us to solve such a problem? The derivative of a function evaluated at some $x_o$ tells us the slope of the function -- whether it is decreasing or increasing -- at $x_o$.
330+
This is certainly useful information for helping us search for $x_\mathrm{min}$: always look in the direction of decreasing slope, until the slope goes to $0$.
331+
332+
We start our search for $x_{\mathrm{min}}$ by picking a random starting for value for $x_o$, use the autodiff library to compute $\frac{\mathrm{d}f}{\mathrm{d}x}\big|_{x=x_{o}}$ and then use that information to "step" $x_o$ in the direction that "descends" $f(x)$. We repeat this process until we see that $\frac{\mathrm{d}f}{\mathrm{d}x}\big|_{x=x_{o}} \approx 0$.
333+
It must be noted that this approach towards finding $x_\mathrm{min}$ is highly limited; saddle-points can stop us in our tracks, and we will only be able to find *local* minima with this strategy. Nonetheless, it is still very useful!
334+
335+
Let's take a simple example.
336+
We'll choose the function $f(x) = (x-8)^2$ and the starting point $x=-1.5$.
337+
As we search for $x_\mathrm{min}$ we don't want to make our updates to $x_o$
338+
too big, so we will scale our updates by a factor of $3/10$ (which is somewhat haphazardly here).
339+
340+
```python
341+
# Performing gradient descent on f(x) = (x - 8) ** 2
342+
x = mg.Tensor(-1.5)
343+
step_scale = 0.3
344+
num_steps = 10
345+
print(x)
346+
347+
for step_cnt in range(num_steps):
348+
f = (x - 8.0) ** 2 # evaluate f(xo)
349+
f.backward() # compute df/dx @ xo
350+
x = x - step_scale * x.grad # update xo in direction opposite of df/dx @ xo
351+
print(x)
352+
```
353+
```
354+
Tensor(-1.5)
355+
Tensor(4.2)
356+
Tensor(6.48)
357+
Tensor(7.392)
358+
Tensor(7.7568)
359+
Tensor(7.90272)
360+
Tensor(7.961088)
361+
Tensor(7.9844352)
362+
Tensor(7.99377408)
363+
Tensor(7.99750963)
364+
Tensor(7.99900385)
365+
```
366+
367+
Success! Our autodiff-driven optimization algorithm successfully guides us near the minimum $x_\mathrm{min}=8$.
368+
369+
This simple algorithm is known as [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) (a gradient is a collection of derivatives for a multi-variable function), and it is a powerful technique for finding local minima in differentiable functions.
370+
As we saw in the preceding section, autodiff libraries enable use to search for local optima of *very* complex functions, and we can often work with functions that will depend on *hundreds, thousands, or even many millions of variables*.
371+
In such cases, we have no hope of simply plotting the function and literally looking for the minimum, nor do we have any chance of writing down the function's derivative by hand.
372+
Fortunately, we have autodiff and gradient descent in our toolkit.
373+
374+
For those who have heard of neural networks and deep learning: autodiff libraries, used in conjunction with gradient descent, is how we often "teach" a neural network to perform a task. We use gradient descent to find the optimal parameter values of the neural network; the values are found such that they *minimize the average number of mistakes* the neural makes when performing training tasks.
375+
376+
This section has just scratched the surface of automatic differentiation. Reading about the different algorithms for performing automatic differentiation ([forward-mode differentation, back-propagation, and beyond](https://en.wikipedia.org/wiki/Automatic_differentiation#The_chain_rule,_forward_and_reverse_accumulation)), about computing higher-order derivatives, and about the [interesting advances in programming languages' approaches to automatic differentiation](https://fluxml.ai/blog/2019/02/07/what-is-differentiable-programming.html) are all fascinating and worthwhile endeavors.
377+
If you plan to take a course in differential calculus soon, see if you can incorporate autodiff into your coursework!
378+
379+
<!-- #endregion -->
380+
381+
## Reading Comprehension Exercise Solutions
382+
383+
384+
**Auto-differentiation: Solution**
385+
386+
```python
387+
def f(x):
388+
return mg.exp(x) / (mg.exp(x) + 1)
389+
390+
391+
x = mg.linspace(-2, 4, 30)
392+
fx = f(x)
393+
fx.backward()
394+
x.grad
395+
```
396+
397+
```python
398+
plot_func_and_deriv(mg.linspace(-2, 4, 1000), f);
399+
```
400+
401+
```python
402+
def f(x):
403+
return mg.exp(-(x - 1) ** 2 / 10)
404+
405+
406+
x = mg.linspace(-2, 4, 30)
407+
fx = f(x)
408+
fx.backward()
409+
x.grad
410+
```
411+
412+
```python
413+
plot_func_and_deriv(mg.linspace(-2, 4, 1000), f);
414+
```
415+
416+
```python
417+
def f(x):
418+
return mg.sinc(x) - x ** 2
419+
420+
421+
x = mg.linspace(-2, 4, 30)
422+
fx = f(x)
423+
fx.backward()
424+
x.grad
425+
```
426+
427+
```python
428+
plot_func_and_deriv(mg.linspace(-2, 4, 1000), f);
429+
```
430+
431+
```python
432+
def f(x):
433+
return 9 * mg.sqrt(1 + x ** 2 / 9) - 9
434+
435+
436+
x = mg.linspace(-2, 4, 30)
437+
fx = f(x)
438+
fx.backward()
439+
x.grad
440+
```
441+
442+
```python
443+
plot_func_and_deriv(mg.linspace(-2, 4, 1000), f);
444+
```

‎Python/module_3.rst

+1
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,4 @@ This module presents to us the essentials of NumPy. We will first define what th
2121
Module3_IntroducingNumpy/Broadcasting.md
2222
Module3_IntroducingNumpy/BasicIndexing.md
2323
Module3_IntroducingNumpy/AdvancedIndexing.md
24+
Module3_IntroducingNumpy/AutoDiff.md

‎README.md

+2
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ Next, we'll use the `conda-forge` package channel to install our dependencies
4545

4646
```shell
4747
conda install -c conda-forge sphinx==4.4.0 jupytext==1.13.6 nbsphinx==0.8.8 pandoc==2.1.3 sphinx_rtd_theme==1.0.0 ipykernel==6.7.0 numpy matplotlib
48+
49+
pip install mygrad
4850
```
4951

5052
Finally, we will install the `plymi` code base from this repo. Clone the present repository and run:

‎requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ sphinx_rtd_theme==1.0.0
77
ipykernel==6.7.0
88
numpy
99
matplotlib
10+
mygrad

0 commit comments

Comments
 (0)
Please sign in to comment.