FF5 Portfolio Classification

Can classical ML predict which portfolios will beat the monthly median, using only Fama-French five-factor returns? That's the question. This repo has the data pipeline, four trained classifiers, and an IEEE conference paper with the results.

Best model: SVM (RBF) — 74.5% accuracy, 0.823 ROC-AUC on held-out test data.

What's the task?

Each month, 25 size/value-sorted portfolios either beat or miss the cross-sectional median return. We frame that as a binary classification problem. Features are the five FF5 factor realizations for that month plus the portfolio's size and value quintile.

Data

Download both files manually from the Kenneth R. French Data Library:

File	What to download	Save as
FF5 factors	Fama/French 5 Factors (2x3) [Monthly]	`data/raw/ff5-factors-monthly.csv`
25 portfolios	25 Portfolios Formed on Size and Book-to-Market (5x5) [Monthly]	`data/raw/portfolio-25-size-value.csv`

data/raw/ is gitignored — the CSVs are free to download and don't need to be in the repo.

Repo layout

ff5-portfolio-classification/
├── data/
│   ├── raw/                          # gitignored; download manually
│   └── processed/
│       └── ml-dataset.csv           # generated by notebook 1
├── notebooks/
│   ├── 1-data-preprocessing.ipynb
│   ├── 2-exploratory-analysis.ipynb
│   ├── 3-model-training.ipynb
│   └── 4-results-evaluation.ipynb
├── results/
│   ├── model-comparison.csv
│   ├── roc-curves.png
│   ├── confusion-matrices.png
│   ├── feature-importance.png
│   ├── model-comparison-bar.png
│   └── precision-recall-curves.png
├── paper/
│   ├── paper.tex
│   ├── IEEEtrans.cls
│   └── references.bib
├── requirements.txt
└── README.md

Setup

pip install -r requirements.txt

Running

Run the notebooks in order:

1-data-preprocessing.ipynb — merges factor and portfolio data, creates the binary label, saves data/processed/ml-dataset.csv
2-exploratory-analysis.ipynb — distributions, correlations, win-rate heatmaps
3-model-training.ipynb — trains LR, SVM, RF, XGBoost with TimeSeriesSplit CV; saves results/model-comparison.csv and results/model-artifacts.pkl
4-results-evaluation.ipynb — generates all figures (ROC curves, confusion matrices, feature importance, etc.)

jupyter notebook

Results

Model	CV ROC-AUC	Test Accuracy	Test ROC-AUC
Logistic Regression	0.533	49.3%	0.471
SVM (RBF)	0.807	74.5%	0.823
Random Forest	0.787	72.6%	0.809
Gradient Boosting (XGBoost)	0.809	73.4%	0.820

Train: 1963-07-31 to 2013-08-31. Test: 2013-08-31 to 2026-02-28.

Paper

The paper is in paper/paper.tex (IEEE conference format). To compile:

cd paper
pdflatex paper.tex
bibtex paper
pdflatex paper.tex
pdflatex paper.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FF5 Portfolio Classification

What's the task?

Data

Repo layout

Setup

Running

Results

Paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/processed		data/processed
notebooks		notebooks
paper		paper
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FF5 Portfolio Classification

What's the task?

Data

Repo layout

Setup

Running

Results

Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages