This Python package allows both automated and customized treatment of missing values in datasets. The treatments that are implemented are:
- Listwise deletion
- Pairwise deletion
- Dropping variables
- Random sample imputation
- Random hot-deck imputation
- LOCF
- NOCB
- Most frequent substitution
- Mean and median substitution
- Constant value imputation
- Random value imputation
- Interpolation
- Interpolation with seasonal adjustment
- Linear regression imputation
- Stochastic regression imputation
- Logistic regression imputation
- K-nearest neighbors imputation
- Sequential regression multiple imputation
- Multiple imputation by chained equations
All these treatments can be applied to whole datasets or parts of them and allow for extensive customization. The package can also recommend a treatment for a given dataset, inform about the treatments that are applicable to it, and automatically apply the best treatment.
To install or update to the most recently published release, run:
pip install imputenaThis will fetch the release from PyPi and install it with all dependencies.
Clone this repository or download and unzip it. At the project root directory, run:
pip install .The documentation for the latest version is available at imputena.readthedocs.io.
The documentation is generated by sphinx using the docstrings. To do so, run
either of the following commands at the docs directory:
make html
make latexpdfThe generated documentation will be located in docs/build.
The tests for the implemented functions are located in the test directory and
use the unittest package.
To execute all tests, run the following command at the project root directory:
python -m unittestTo execute only the tests contained in a particular test class, for example
deletion/test_delete_listwise.py, run the following command at the
project root directory:
python -m unittest test.deletion.test_delete_listwise