[ENH] Optimize ElasticEnsemble _fit to avoid redundant cross-validation #3109
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #2854
What does this implement/fix? Explain your changes.
This PR optimizes the
_fitmethod in theElasticEnsembleclassifier to eliminate a redundant cross-validation step, speeding up the fitting process under specific conditions.Problem
The original implementation first used
GridSearchCVorRandomizedSearchCVto find the best parameters for a given distance measure. Then, to get the accuracy score for weighting the ensemble, it performed a second, separatecross_val_predictusing those best parameters.This second
cross_val_predictstep was redundant when the initial parameter search was already performed on the entire training set (proportion_train_in_param_finding == 1.0), as it was essentially re-running the same validation.Solution
I've introduced a new logic path that runs only when
self.proportion_train_in_param_finding == 1.0andnot self.majority_vote.The modification:
GridSearchCV/RandomizedSearchCVentirely.self.proportion_of_param_optionsto mimic the randomized search behavior).cross_val_predictonce for each parameter set inside a single loop, calculating the accuracy.This change combines the parameter search and the accuracy-for-weighting calculation into a single loop, completely removing the redundant N-fold cross-validation pass.
The original logic (using
GridSearchCV+ the secondcross_val_predict) is fully preserved for all other cases (i.e., when subsampling for parameter finding or whenmajority_voteisTrue), ensuring no existing behavior is broken.Does your contribution introduce a new dependency? If yes, which one?
No.
Any other comments?
PR checklist
For all contributions
__maintainer__at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.For developers with write access