Parma Calcio 1913 Data Scientist Assignment

This repository contains my solution to the Parma Calcio 1913 Data Scientist technical assignment.
The project is divided into two main tasks, each organized in its own folder:

Task 1 – xG Model (task1_xg/): building and evaluating multiple expected goals (xG) models
Task 2 – Ballon d’Or 2015/16 (task2_ballon_dor/): ranking players from the Big 5 leagues season 2015/16 to determine the best player according to data

All data come from the StatsBomb Open Data repository and are accessed programmatically using statsbombpy, so no manual downloads are required.

Setup & Installation

Clone the repository locally:

git clone <repo_url>    # <repo_url> = "https://github.com/Manuele23/Parma-assignment.git"
cd <repo_name>          # <repo_name> = "Parma-assignment"

Option 1 – Using `setup.ipynb` (recommended)

Open and run setup.ipynb.
It will install all required dependencies automatically in your environment (it considers requirements.txt).

Option 2 – Manual installation with virtual environment

If you prefer an isolated environment:

# create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate   # on Linux/Mac
venv\Scripts\activate      # on Windows

# install dependencies
pip install -r requirements.txt

Reproducibility

Once the environment is ready:

Clone the repository
Set up the environment (via setup.ipynb or virtual env)
Run the notebooks in the provided order

[!] Git LFS (for large files)

One large file (shots_df.csv) is tracked with Git LFS.
You may need to install Git LFS if you want to download it directly from the repository,
but this is not strictly required since the file can also be regenerated by running the notebooks.

git lfs install
git lfs pull

Project Structure

├── task1_xg/                      
│   ├── data/                        # created Datasets for the task
│   ├── models/                      # trained models (excluded if large)
│   ├── outputs/                     # generated evaluation metrics for each model
│   ├── 01_data_exploration.ipynb  
│   ├── 02_shot_analysis.ipynb     
│   ├── 03_dataset_building.ipynb  
│   ├── 04_linear_regression.ipynb 
│   ├── 05_random_forest.ipynb     
│   ├── 06_xgboost.ipynb           
│   ├── 07_neural_network.ipynb    
│   ├── 08_model_comparison.ipynb  
│   ├── 09_ds_final.ipynb          
│   └── xg_demo.ipynb              
│
├── task2_ballon_dor/              
│   ├── data/                        # created Datasets for the task            
│   ├── 01_data_preparation.ipynb  
│   ├── 02_data_preprocessing.ipynb
│   └── 03_final_ranking.ipynb     
│
├── Assignment.pdf                 
├── requirements.txt               
├── setup.ipynb                    
├── README.md                      
└── .gitignore / .gitattributes

How to Run

Task 1 – Expected Goals (xG) Model

Navigate to task1_xg/
For a quick demo, run xg_demo.ipynb and go to the last cell to try the demo (note: this step may take a while, as demo widgets construction is long)
Otherwise, run the notebooks in order from 01 to 09 (note: this step may take a while, as data retrieval is long)
Note: the Random Forest model (model_rf.pkl) is not included in the repo because of its size:
- To regenerate it, run 05_random_forest.ipynb.
Final evaluation is in 08_model_comparison.ipynb and 09_ds_final.ipynb

Task 2 – Ballon d’Or 2015/16

Navigate to task2_ballon_dor/
If you just want to see the final ranking, run only:
- 03_final_ranking.ipynb
If you prefer to re-download the datasets and reprocess everything from scratch, run in order:
1. 01_data_preparation.ipynb (note: this step may take a while, as data retrieval is long)
2. 02_data_preprocessing.ipynb
3. 03_final_ranking.ipynb

Notes

Only StatsBomb open data was used, as required by the assignment
Each notebook includes detailed Markdown cells that explain the rationale behind the methodology, the assumptions made, and the simplifications adopted step by step
Models and outputs too large for GitHub are excluded, but can be easily regenerated locally by running the corresponding notebooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parma Calcio 1913 Data Scientist Assignment

Setup & Installation

Option 1 – Using `setup.ipynb` (recommended)

Option 2 – Manual installation with virtual environment

Reproducibility

[!] Git LFS (for large files)

Project Structure

How to Run

Task 1 – Expected Goals (xG) Model

Task 2 – Ballon d’Or 2015/16

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
task1_xg		task1_xg
task2_ballon_dor		task2_ballon_dor
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Assignment.pdf		Assignment.pdf
README.md		README.md
requirements.txt		requirements.txt
setup.ipynb		setup.ipynb

Manuele23/Parma-assignment

Folders and files

Latest commit

History

Repository files navigation

Parma Calcio 1913 Data Scientist Assignment

Setup & Installation

Option 1 – Using setup.ipynb (recommended)

Option 2 – Manual installation with virtual environment

Reproducibility

[!] Git LFS (for large files)

Project Structure

How to Run

Task 1 – Expected Goals (xG) Model

Task 2 – Ballon d’Or 2015/16

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Option 1 – Using `setup.ipynb` (recommended)

Packages