RFX-Fuse: Breiman and Cutler's Random Forests as a Forest Unified Learning and Similarity Engine - Extended with Native Explainable Similarity
RFX-Fuse (Random Forests X [X=compression] — Forest Unified Learning and Similarity Engine) delivers Breiman and Cutler's complete vision for Random Forests as a Forests Unified Machine Learning and Similarity Engine with native GPU/CPU support.
Breiman and Cutler designed Random Forests as more than an ensemble predictor. Their original implementation from the early 2000s included classification, regression, unsupervised learning, proximity-based similarity, outlier detection, missing value imputation, and visualization. Modern libraries like scikit-learn's random forests implementation (2010-2011) skipped many of these features.
These capabilities enable it to be a unified learning and similarity engine. With just 1-2 model objects, we can achieve comparable accuracy and output to 3-5 main industery tools. For example, 1 model has comparable output to 4 separate tools for Time Series Regression + native explainable similarity. 1 model = 1 set of trees grown once.
| Use Case | RFX-Fuse | Comparable Approach |
|---|---|---|
| Recommender Systems | 1–2 models | 5 tools (FAISS + XGBoost + Shap + Isolation Forests + Custom Code) |
| Finance Explainability | 1 model | 3 tools (XGBoost + Shap + Isolation Forests) |
| Time Series Regression | 1 model | 4 tools (XGBoost + Shap + Isolation Forests + FAISS) |
| Imputation Validation | 1 model | time series methods (general tabular: RFX-Fuse) |
| Anomaly Detection | 1 model | 3 tools (Isolation Forests + Shap + Custom Code) |
- Native Explainable Similarity: Breiman and Cutler's original similarity scoring via proximities enable comparable output with Faiss for NDCG + HR on retrieval. Proximity Importance gives the why.
Explanations available in Zenodo paper.
- Imputation Quality Validation for General Tabular Data — Rank imputation methods by how "real" the imputed data looks, without ground truth labels.
| Feature | RFX-Fuse | XGBoost | sklearn RF | FAISS |
|---|---|---|---|---|
| Classification | ✓ | ✓ | ✓ | — |
| Regression | ✓ | ✓ | ✓ | — |
| Unsupervised | ✓ | — | — | — |
| Overall importance | ✓ | ✓ | ✓ | — |
| Local importance (per-sample) | ✓ | SHAP | — | — |
| Proximity/similarity scoring | ✓ | — | — | ✓ |
| Overall proximity importance | ✓ | — | — | — |
| Local proximity importance | ✓ | — | — | — |
| Top-K similar with explanations | ✓ | — | — | — |
| Outlier detection with explanations | ✓ | — | — | — |
| Missing value imputation | ✓ | — | — | — |
pip install rfx-fuseCPU-only version (pip install rfx-fuse-cpu) coming soon.
git clone https://github.com/chriskuchar/RFX-Fuse.git
cd RFX-Fuse
pip install -e .git clone https://github.com/chriskuchar/RFX-Fuse.git
cd RFX-Fuse
pip install -e . --config-settings=cmake.args=-DRFX_CPU_ONLY=ON- CMake 3.12+
- Python 3.8+
- C++ compiler with C++17 support (GCC 7+, Clang 5+)
- OpenMP (usually included with compiler)
- CUDA toolkit 12.8+ (for GPU acceleration)
import RFXFuse as rfx
print(f"RFX-Fuse version: {rfx.__version__}")
print(f"CUDA enabled: {rfx.__cuda_enabled__}")Each use case has a complete demonstration script in the examples/ folder:
| Use Case | Demo Script | Description |
|---|---|---|
| Recommender Systems | examples/recommender_system/demo_recommender_system.py |
MovieLens 25M: similarity retrieval + ranking with explanations |
| Finance Explainability | examples/classification/demo_loan_classification.py |
Loan default prediction with 4-type explainability |
| Time Series Regression | examples/time_series/demo_time_series.py |
Bike sharing: prediction + outlier detection |
| Imputation Validation | examples/data_imputation/demo_imputation.py |
Rank imputation methods without ground truth |
| Anomaly Detection | examples/anomaly_detection/demo_anomaly_detection.py |
Breiman-Cutler outlier detection |
Run an example:
cd examples/time_series
python demo_time_series.pyRFX-Fuse Unsupervised for retrieval + RFX-Fuse Supervised for re-ranking on MovieLens 25M.
Explanations available in Zenodo paper.
Explanations available in Zenodo paper.
Explanations available in Zenodo paper.
Explanations available in Zenodo paper.
Single classifier provides regulatory-compliant explanations (ECOA, GDPR, Fair Lending).
Explanations available in Zenodo paper.
RFX-Fuse Regressor on UCI Bike Sharing dataset with full explainability.
Explanations available in Zenodo paper.
Novel capability for general tabular data. Rank imputation methods by how "real" the imputed data looks.
Explanations available in Zenodo paper.
Breiman-Cutler method: train on clean data, anomalies have high P(synthetic).
Explanations available in Zenodo paper.
For complete API documentation with all parameters, methods, and examples, see docs/API.md.
Environment: NVIDIA RTX 3060 (12GB), AMD Ryzen 7 5800X, 32GB RAM
| Use Case | Train Size | Features | Trees | Training Time |
|---|---|---|---|---|
| Recommender (Unsup) | 59,047 (×2) | 23 | 1,000 | 1,254s |
| Recommender (Sup) | 47,237 | 21 | 1,000 | 120s |
| Finance Classification | 46,396 | 15 | 500 | 69s |
| Bike Regression | 5,725 | 4 | 1,000 | 24s |
| Imputation Validation | 3,000 | 12 | 100 | 3.6s |
| Anomaly Detection | 15,000 | 8 | 100 | 112s |
Training times include predictions, similarity scoring, proximity importance, local importance, and all explainability features where applicable.
Coming soon.
For detailed methodology, see:
@article{kuchar2026rfxfuse,
author = {Kuchar, Chris},
title = {RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity},
year = {2026},
journal = {arXiv preprint arXiv:2511.19493},
url = {https://arxiv.org/html/2603.13234v1}
}This work aims to implement the full unified learning and similarity engine Dr. Leo Breiman and Dr. Cutler created when they made their Fortran/Java implementation in the early 2000s.
Special thanks to Dr. Adele Cutler for generously sharing original Breiman-Cutler Random Forest source materials, which made this faithful restoration and extension possible.
- Multi-class classification support
- This is the successor to https://github.com/chriskuchar/RFX.
MIT License - see LICENSE for details.









