Skip to content

EmanuelNovelo/MCD-Big-Data-24

Repository files navigation

Big-Data-MCD-2024

Notebooks

  • Homework 1 y 2 In this notebook, PySpark is used to handle football players data. Some basic PySpark functions are used to practice. It also has basic operations with RDD objects from the same dataset.
  • Homework 3 y 4 HW 2 & 4 is basically give a presentation about what has been done regarding PySpark and RDDs with the dataset.
  • Homework 5 PySpark's ML library is used to perform a K-means clustering over the players' in-game stats throughout the 2024 season. Then, a comparison is made against the results of the Clustering with the features explained by PCA. The outcome was an improvement in the Clusters.

Additional analysis

Datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published