Skip to content

Commit e8c451d

Browse files
committed
Update ml-pca.md
1 parent ea1b38e commit e8c451d

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

doc/python/ml-pca.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,16 +105,16 @@ fig.show()
105105

106106
When you will have too many features to visualize, you might be interested in only visualizing the most relevant components. Those components often capture a majority of the [explained variance](https://en.wikipedia.org/wiki/Explained_variation), which is a good way to tell if those components are sufficient for modelling this dataset.
107107

108-
In the example below, our dataset contains 8 features, but we only select the first 2 components.
108+
In the example below, our dataset contains 10 features, but we only select the first 2 components.
109109

110110
```python
111111
import pandas as pd
112112
import plotly.express as px
113113
from sklearn.decomposition import PCA
114-
from sklearn.datasets import fetch_california_housing
114+
from sklearn.datasets import load_diabetes
115115

116-
housing = fetch_california_housing(as_frame=True)
117-
df = housing.data
116+
diabetes = load_diabetes()
117+
df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
118118
n_components = 2
119119

120120
pca = PCA(n_components=n_components)
@@ -123,11 +123,11 @@ components = pca.fit_transform(df)
123123
total_var = pca.explained_variance_ratio_.sum() * 100
124124

125125
labels = {str(i): f"PC {i+1}" for i in range(n_components)}
126-
labels['color'] = 'Median Price'
126+
labels['color'] = 'Disease Progression'
127127

128128
fig = px.scatter_matrix(
129129
components,
130-
color=housing.target,
130+
color=diabetes.target,
131131
dimensions=range(n_components),
132132
labels=labels,
133133
title=f'Total Explained Variance: {total_var:.2f}%',

0 commit comments

Comments
 (0)