Proper way of fitting classifiers before creating an heterogeneous pool

Hey, I'm working on a research paper focused on building a binary classification model in the biomedical domain.The dataset comprises approximately 800 data points. Let's say I want to feed an heterogeneous pool of classifiers to the dynamic selection methods. By following the instructions on the examples, I've found two different ways of splitting the dataset and fitting the base classifiers of the pool.

1) Split in train/test (e.g., 75/25) and then split the training in train/dsel (e.g., 50/50). 
In [this random forest example](https://deslib.readthedocs.io/en/latest/auto_examples/plot_random_forest.html#sphx-glr-auto-examples-plot-random-forest-py), the RF is fitted on the 75% training portion and the DS methods on the 50% DSEL portion. 
2) In all the other examples, the 50% training portion is used to fit the classifier and the 50% DSEL portion is used to fit DS methods.

Furthermore, I wanted to point out this tip taken from the [tutorial](https://deslib.readthedocs.io/en/latest/user_guide/tutorial.html#running-dynamic-selection-with-bagging) : 

> An important point here is that in case of small datasets or when the base classifier models in the pool are weak estimators such as Decision Stumps or Perceptrons, an overlap between the training data and DSEL may be beneficial for achieving better performance.

That seems my case, as my dataset is rather small compared to most datasets in the ML domain. Hence, I was thinking of fitting my base classifiers on the 75% part and then leveraging some overlap to get better performance (and this is really the case! In fact, overlapping leads to a median auc of 0.76 whereas non-overlapping gives 0.71).

What would be the best way of dealing with the problem ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proper way of fitting classifiers before creating an heterogeneous pool #264

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proper way of fitting classifiers before creating an heterogeneous pool #264

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions