paper_2025_joint_model_importance

Research code for the 2025 paper project on Joint Model Importance based on Rashomon sets.

What this project does

Rather than interpreting a single "best" model, this project studies the Rashomon set — the set of all models whose predictive performance is within epsilon of the best model — and investigates how these near-equally good models differ in their behaviour and feature importance, and how they relate to one another.

In concrete terms:

Trained models from an upstream AutoML run are loaded (data/run_models_merged.rds, data/results_vic.RData, data/design.RData).
For each task, the Rashomon set is determined from the performance scores (get_RS() in init/functions.R).
For every model in the Rashomon set, predictions are computed on a canonical validation split (2/3), parallelised via batchtools on a socket cluster (30 CPUs).
A pairwise distance matrix between models is built from these predictions (Euclidean / Manhattan).
The relationships between models are visualised and analysed using:
- hierarchical clustering (hclust),
- classical (metric) multidimensional scaling (cmdscale),
- partitioning around medoids (cluster::pam).

Datasets (tasks)

Defined in init/assets/tasks.R:

Key	Task	Type
gc	German Credit	Classification
cs	COMPAS	Classification
bs	Bike Sharing	Regression
st	Synthetic Task (10,000 observations)	Regression

Learners

Defined in init/assets/learners.R — both regression and classification variants, with tuning spaces provided via mlr3tuning / paradox:

xgb — XGBoost
tree — rpart decision tree
nnet — single-hidden-layer neural network (nnet)
glmnet — elastic net (glmnet)
svm — support vector machine (e1071); kernel is part of the tuning space

Project structure

.
├── main.R                  # main script: load, build Rashomon sets, compute,
│                           # cluster, and visualise
├── init.R                  # sources everything under init/
├── init/
│   ├── source.R            # library imports + subdirectory loading
│   ├── functions.R         # get_performance_and_SVMkernel(), get_RS()
│   ├── zzz_settings.R      # task/learner lists, loads design + VIC
│   ├── assets/
│   │   ├── tasks.R         # mlr3 tasks (incl. synthetic task)
│   │   ├── learners.R      # mlr3 learners with tuning spaces
│   │   └── datasplits.R
│   └── batchtools/
│       └── registry.R      # batchtools registry setup
├── data/                   # pre-computed inputs (AutoML results, VIC, design)
├── renv/                   # renv project library
└── renv.lock               # pinned package versions

Prerequisites

R ≥ 4.3.0 (required for reproducible synthetic data via RNGversion("4.3.0"))
Restore the pinned package versions with renv:
```
renv::restore()
```
External model path: main.R reads individual sample models from /media/external/rashomon/datafiles/<task>/<learner>/... and writes the batchtools registry to /media/external/ewaldf/paper_2025_joint_model_importance. These paths must be adapted for local execution.

Running the analysis

source("main.R")

main.R is written as an interactive analysis script — cluster setup, job definition, submission, and post-processing run step-by-step from top to bottom. The central parameters at the top of the script are:

RS_epsilon = 0.01 — tolerance for the Rashomon set (1%)
distance_metrics = c("euclidean", "manhattan")

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
figures		figures
init		init
logos		logos
renv		renv
talk		talk
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
diagnose_rs.R		diagnose_rs.R
hp_trajectories.R		hp_trajectories.R
init.R		init.R
main.R		main.R
paper_2025_joint_model_importance.Rproj		paper_2025_joint_model_importance.Rproj
renv.lock		renv.lock
vic_distances.R		vic_distances.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paper_2025_joint_model_importance

What this project does

Datasets (tasks)

Learners

Project structure

Prerequisites

Running the analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

paper_2025_joint_model_importance

What this project does

Datasets (tasks)

Learners

Project structure

Prerequisites

Running the analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages