Skip to content

Conversation

david-cortes-intel
Copy link
Contributor

Description

First attempt at vectorizing a bottleneck loop in random forests for regression. Right now this is implemented only for the case when there are no weights, and the PR as-is would generate conflicts with other PRs in progress that touch either vectorization or random forests.


Checklist:

Completeness and readability

  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.

@david-cortes-intel david-cortes-intel added the perf Performance optimization label Sep 12, 2025
@david-cortes-intel
Copy link
Contributor Author

From some preliminary testing, this could get about a 5-7% speed improvement in large datasets.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

2 similar comments
@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

So far showing speedups for RandomForestRegressor in some machines, but slowdowns in others; although the aggregates in the benchmark seem to have rather large standard errors so I wouldn't say the numbers are trustable:
image
image

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

Updated benchmarks:

  • RandomForestRegressor:
image image
  • ExtraTreesRegressor:
image image

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@david-cortes-intel david-cortes-intel marked this pull request as draft October 14, 2025 10:20
@david-cortes-intel
Copy link
Contributor Author

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf Performance optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant