- Homework 1 y 2 In this notebook, PySpark is used to handle football players data. Some basic PySpark functions are used to practice. It also has basic operations with RDD objects from the same dataset.
- Homework 3 y 4 HW 2 & 4 is basically give a presentation about what has been done regarding PySpark and RDDs with the dataset.
- Homework 5 PySpark's ML library is used to perform a K-means clustering over the players' in-game stats throughout the 2024 season. Then, a comparison is made against the results of the Clustering with the features explained by PCA. The outcome was an improvement in the Clusters.
- stats-spider-analysis A complete and detailed analysis to understand in-game stats from top 3 europe leagues vs LATAM representative leagues. You can see the publication at my Medium blog! https://medium.com/@emnovelo98