You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/day4/dim_reduction.rst
+59-28Lines changed: 59 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,8 +22,15 @@ Guest Lecture by Professor **Anders Hast**
22
22
- We will look at tools for visualising what cannot easily be seen, i.e. high dimensionality reduction
23
23
- Share insights and experience from Anders's own research
24
24
25
-
The Essence of Machine Learning: Classification
26
-
-------------------------------------------------
25
+
visualisation <--> Science
26
+
--------------------------
27
+
28
+
.. figure:: ../img/varanoi.png
29
+
:width:300px
30
+
:align:right
31
+
:alt:varanoi_regions
32
+
33
+
Clustering
27
34
28
35
.. figure:: ../img/ml_classification.png
29
36
:width:300px
@@ -34,24 +41,33 @@ The Essence of Machine Learning: Classification
34
41
35
42
36
43
37
-
How can a model separate the "blue" from the "red"?
44
+
* We will look at tools for visualising what cannot easily be seen, i.e. high dimensionality reduction
45
+
* We will also see that you can make discoveries in your visualisations!
38
46
39
-
Which model is the best, the green curve or the black?
47
+
What is a typical machine learning task?
48
+
--------------------------------------
40
49
41
-
Classification challenges:
42
50
43
-
* **Black curve**: The model will guess "wrong" sometimes for new data
44
-
* **Green curve**: The model will make even more wrong guesses? Why?
45
-
"Outliers" or special cases have too much impact on the classification boundaries.
51
+
* Differ between different classes of features
52
+
* Features usually have more than 3 dimensions, hundreds or even thousands!
53
+
* The idea is to find a separating curve in high dimensional space
54
+
* Usually we visualise this in 2D since it is easier to understand!
55
+
* We will look at several techniques to do this!
56
+
* If we can separate in 2D it can often be done in High dimensional space and vice versa!
46
57
47
-
**Dimensionality reduction:**
48
58
49
-
Project from several dimensions to fewer, often 2D or 3D.
59
+
**Dimensionality reduction:**
50
60
51
-
*Remember*: we get a distorted picture of the high dimensional space!
61
+
* Project from several dimensions to fewer, often 2D or 3D
62
+
* Remember: we get a distorted picture of the high dimensional space!
63
+
* Some techniques
64
+
* SOM
65
+
* PCA
66
+
* t-SNE
67
+
* UMAP
52
68
53
69
.. figure:: ../img/dim_reduction_bunny.png
54
-
:width:300px
70
+
:width:500px
55
71
:align:center
56
72
:alt:dim_reduction_bunny
57
73
@@ -66,9 +82,16 @@ Some Dimensionality Reduction Techniques:
66
82
PCA (on Iris Data)
67
83
^^^^^^^^^^^^^^^^^^^
68
84
69
-
* Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width for 150 iris specimens. There are 50 specimens from each of three species.
70
-
* Pretty good separation of classes
71
-
* However PCA often fails for high dimensional data as the clusters will overlap!
85
+
* PCA = “find the directions where the data varies the most.”
86
+
* PCA finds a new coordinate system that fits the data:
87
+
* The first axis (1st principal component) points where the data spreads out the most.
88
+
* The second axis (2nd principal component) is perpendicular to the first and captures the next largest spread.
89
+
* The eigenvectors are the directions of those new axes — the principal components.
90
+
* The eigenvalues tell you how much variance (spread) each component captures.
91
+
* Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width for 150 iris specimens.
92
+
* There are 50 specimens from each of three species.
93
+
* One axis per data element (which ones are discriminant?)
94
+
* Follow each individual using the lines
72
95
73
96
.. admonition:: Iris and its PCA
74
97
:class: dropdown
@@ -85,18 +108,25 @@ PCA (on Iris Data)
85
108
:align:center
86
109
:alt:iris_pca
87
110
111
+
.. raw:: html
112
+
113
+
<divstyle="height: 20px;"></div>
114
+
115
+
.. figure:: ../img/iris_lines.png
116
+
:align:center
117
+
:alt:iris_lines
88
118
89
119
90
120
t-SNE
91
121
^^^^^^
92
122
123
+
* A dimensionality-reduction method for visualising high- dimensional data in 2D or 3D
124
+
* It keeps similar points close together and dissimilar ones far apart
125
+
* Works by turning distances between points into probabilities of being neighbours, both in the original space and in the low-dimensional map
126
+
* Then it moves points to make those probabilities match (minimizing KL divergence)
127
+
* Uses a Student's t-distribution in 2D to keep clusters separated and avoid crowding
93
128
94
-
* t-distributed stochastic neighbor embedding (t-SNE) is a `statistical <https://en.wikipedia.org/wiki/Statistics>`_ method for visualising high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
95
-
* The t-SNE algorithm comprises two main stages.
96
-
* First, t-SNE constructs a `probability distribution <https://en.wikipedia.org/wiki/Probability_distribution>`_ over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability.
97
-
* Second, t-SNE de nes a similar probability distribution over the points in the low-dimensional map, and it minimises the `Kullback–Leibler divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>`_ between the two distributions with respect to the locations of the points in the map.
98
-
99
-
.. admonition:: PCA vs t-SNE vs HOG
129
+
.. admonition:: PCA vs t-SNE vs UMAP
100
130
:class: dropdown
101
131
102
132
.. figure:: ../img/pca_example.png
@@ -119,21 +149,20 @@ t-SNE
119
149
120
150
<divstyle="height: 20px;"></div>
121
151
122
-
.. figure:: ../img/hog_tnse_example.png
152
+
.. figure:: ../img/umap.png
123
153
:align:center
124
-
:alt:pca_exhog_tnse_exampleample
154
+
:alt:umap_example
125
155
126
-
HOG & t-SNE
156
+
UMAP
127
157
128
158
129
159
130
160
UMAP
131
161
^^^^^^
132
162
133
-
* Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique.
134
-
* Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a `locally connected Riemannian manifold <https://en.wikipedia.org/wiki/Riemannian_manifold>`_ and that the `Riemannian metric <https://en.wikipedia.org/wiki/Riemannian_manifold#Riemannian_metrics_and_Riemannian_manifolds>`_ is locally constant or approximately locally constant.
135
-
* UMAP is newer and therefore preferred by many.
136
-
* However tends to separate clusters better! But is that always better?
163
+
* A nonlinear dimensionality-reduction method, like t-SNE, used to visualize high-dimensional data in 2D or 3D
164
+
* Based on manifold theory — it assumes your data lies on a curved surface within a high-dimensional space
165
+
* Builds a graph of local relationships (who's close to whom) in the original space, then finds a low-dimensional layout that preserves those relationships
137
166
138
167
Face Recognition (FR) Use case
139
168
--------------------------------
@@ -175,3 +204,5 @@ Exercise
175
204
Try running the notebook and give the correct dataset path wherever required.
176
205
177
206
The env required for this notebook is ``pip install numpy matplotlib scikit-learn scipy pillow plotly umap-learn``
207
+
208
+
Sample examples from documentations: https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html#sphx-glr-auto-examples-decomposition-plot-pca-iris-py , https://plotly.com/python/t-sne-and-umap-projections/
0 commit comments