Skip to content

Commit 8826ad3

Browse files
feat: Implement Principal Component Analysis (PCA) (TheAlgorithms#12596)
- Added PCA implementation with dataset standardization. - Used Singular Value Decomposition (SVD) for computing principal components. - Fixed import sorting to comply with PEP 8 (Ruff I001). - Ensured type hints and docstrings for better readability. - Added doctests to validate correctness. - Passed all Ruff checks and automated tests.
1 parent f528ce3 commit 8826ad3

File tree

2 files changed

+87
-0
lines changed

2 files changed

+87
-0
lines changed

DIRECTORY.md

+2
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,7 @@
395395
* [Minimum Tickets Cost](dynamic_programming/minimum_tickets_cost.py)
396396
* [Optimal Binary Search Tree](dynamic_programming/optimal_binary_search_tree.py)
397397
* [Palindrome Partitioning](dynamic_programming/palindrome_partitioning.py)
398+
* [Range Sum Query](dynamic_programming/range_sum_query.py)
398399
* [Regex Match](dynamic_programming/regex_match.py)
399400
* [Rod Cutting](dynamic_programming/rod_cutting.py)
400401
* [Smith Waterman](dynamic_programming/smith_waterman.py)
@@ -608,6 +609,7 @@
608609
* [Mfcc](machine_learning/mfcc.py)
609610
* [Multilayer Perceptron Classifier](machine_learning/multilayer_perceptron_classifier.py)
610611
* [Polynomial Regression](machine_learning/polynomial_regression.py)
612+
* [Principle Component Analysis](machine_learning/principle_component_analysis.py)
611613
* [Scoring Functions](machine_learning/scoring_functions.py)
612614
* [Self Organizing Map](machine_learning/self_organizing_map.py)
613615
* [Sequential Minimum Optimization](machine_learning/sequential_minimum_optimization.py)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
"""
2+
Principal Component Analysis (PCA) is a dimensionality reduction technique
3+
used in machine learning. It transforms high-dimensional data into a lower-dimensional
4+
representation while retaining as much variance as possible.
5+
6+
This implementation follows best practices, including:
7+
- Standardizing the dataset.
8+
- Computing principal components using Singular Value Decomposition (SVD).
9+
- Returning transformed data and explained variance ratio.
10+
"""
11+
12+
import doctest
13+
14+
import numpy as np
15+
from sklearn.datasets import load_iris
16+
from sklearn.decomposition import PCA
17+
from sklearn.preprocessing import StandardScaler
18+
19+
20+
def collect_dataset() -> tuple[np.ndarray, np.ndarray]:
21+
"""
22+
Collects the dataset (Iris dataset) and returns feature matrix and target values.
23+
24+
:return: Tuple containing feature matrix (X) and target labels (y)
25+
26+
Example:
27+
>>> X, y = collect_dataset()
28+
>>> X.shape
29+
(150, 4)
30+
>>> y.shape
31+
(150,)
32+
"""
33+
data = load_iris()
34+
return np.array(data.data), np.array(data.target)
35+
36+
37+
def apply_pca(data_x: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]:
38+
"""
39+
Applies Principal Component Analysis (PCA) to reduce dimensionality.
40+
41+
:param data_x: Original dataset (features)
42+
:param n_components: Number of principal components to retain
43+
:return: Tuple containing transformed dataset and explained variance ratio
44+
45+
Example:
46+
>>> X, _ = collect_dataset()
47+
>>> transformed_X, variance = apply_pca(X, 2)
48+
>>> transformed_X.shape
49+
(150, 2)
50+
>>> len(variance) == 2
51+
True
52+
"""
53+
# Standardizing the dataset
54+
scaler = StandardScaler()
55+
data_x_scaled = scaler.fit_transform(data_x)
56+
57+
# Applying PCA
58+
pca = PCA(n_components=n_components)
59+
principal_components = pca.fit_transform(data_x_scaled)
60+
61+
return principal_components, pca.explained_variance_ratio_
62+
63+
64+
def main() -> None:
65+
"""
66+
Driver function to execute PCA and display results.
67+
"""
68+
data_x, data_y = collect_dataset()
69+
70+
# Number of principal components to retain
71+
n_components = 2
72+
73+
# Apply PCA
74+
transformed_data, variance_ratio = apply_pca(data_x, n_components)
75+
76+
print("Transformed Dataset (First 5 rows):")
77+
print(transformed_data[:5])
78+
79+
print("\nExplained Variance Ratio:")
80+
print(variance_ratio)
81+
82+
83+
if __name__ == "__main__":
84+
doctest.testmod()
85+
main()

0 commit comments

Comments
 (0)