Build software better, together

miriamspsantos / heterogeneous-distance-functions

A collection of heterogeneous distance functions handling missing values.

machine-learning missing-data distance-measures knn research-paper distance-functions heterogeneous-data missing-values heom hvdm knn-imputer knn-imputation

Updated Jan 24, 2022
MATLAB

SebastianRokholt / Data-Science-Projects

Star

A repository for various Data Science projects I've worked on, both university-related and in my spare time.

Updated Feb 27, 2024
Jupyter Notebook

SINGHxTUSHAR / Sensor-Fault-Detection

Star

Data fetched by wafers is to be passed through the machine learning pipeline and it is to be determined whether the wafer at hand is faulty or not apparently obliterating the need and thus cost of hiring manual labour.

pipelines flask-api classification-algorithm svc-model simple-imputer xgbclassifier knn-imputer gradientboostingclassifier deployment-docs randomforrestclassifier

Updated May 19, 2024
Jupyter Notebook

ZL63388 / data-preparation-codes

Star

This repository is a collection of basic code templates for Data Preparation. All codes I am sharing are from the practical exercises I did from the Data Science Infinity Program.

pandas feature-selection outlier-detection feature-scaling onehot-encoding simpleimputer knn-imputer

Updated Jul 27, 2021
Python

TheMrityunjayPathak / Feature-Engineering

Star

Feature Engineering with Python

pipeline data-normalization imbalanced-data outlier-removal zscore iqr label-encoding simple-imputer dummy-variables onehot-encoding ordinal-encoding column-transformer knn-imputer data-standardization modified-zscore

Updated Apr 11, 2025
Jupyter Notebook

MariaDimopoulou / Churn-Prediction-Customer-Segmentation-in-E-Commerce

Star

This project focuses on predicting customer churn in an e-commerce setting using machine learning techniques.

clustering pandas seaborn xgboost pca classification matplotlib roc-curve tsne dbscan kmeans-clustering smote silhouette-score knn-imputer

Updated Nov 23, 2023
Jupyter Notebook

zuhaib1214 / Feature-Engineering

Star

This repository is totally focused on Feature Engineering Concepts in detail, I hope you'll find it helpful.

standardization feature-engineering principal-component-analysis binarization z-score normalisation onehot-encoding simpleimputer ordinal-encoding labelencoder knn-imputer winsorization iterative-imputer percentile-method discritisation mean-median-imputation frequent-value-imputation

Updated Apr 7, 2023
Jupyter Notebook

mahnoorsheikh16 / Credit-Card-Default-Prediction

Star

This project focuses on predicting whether a customer will default on their credit card payment in the upcoming month. Utilizing historical transaction data and customer demographics, the project employs various machine learning algorithms to distinguish between risky and non-risky customers for better credit risk management.

Updated May 12, 2025
Jupyter Notebook

nf-i / data-imputation-python

Star

Data imputation is used when there are missing values in a dataset. It helps fill in these gaps with estimated values, enabling analysis and modeling. Imputation is crucial for maintaining dataset integrity and ensuring accurate insights from incomplete data.

python sklearn data-imputation simple-imputer knn-imputer sklearn-impute mice-imputer

Updated Oct 25, 2023
Python

KasiMuthuveerappan / CAB-Ensemble-Learning-CHURN-Prediction

Star

📘 This repository predicts OLA driver churn using ensemble methods—Bagging (Random Forest) and Boosting (XGBoost)—with KNN imputation and SMOTE. It reveals city-wise churn trends and key performance drivers, powering smarter, data-backed retention strategies for the ride-hailing industry.

random-forest exploratory-data-analysis xgboost decision-tree-classifier xgbclassifier bagging-ensemble smote-oversampler boosting-ensemble knn-imputer

Updated Apr 30, 2025
Jupyter Notebook

SamKazan / fraud-detection-ml

Star

Machine learning models for enhanced fraud detection in e-commerce transactions, exploring feature engineering, distance prediction, and clustering analysis.

python data-science clustering scikit-learn eda data-visualization seaborn xgboost matplotlib dbscan kmeans-clustering hierarchical-clustering dataanalytics mlxtend knn-imputer

Updated Jun 27, 2024
Jupyter Notebook

YD5463 / TabularDataProject

Star

we perpuse a method to fill nan values using clustering

python clustering dbscan-clustering knn-imputer

Updated Mar 13, 2023
Jupyter Notebook

bortch / second_hand_UK_car_challenge

Star

Kaggle UK Used Car challenge

machine-learning random-forest kaggle knn-imputer

Updated Oct 30, 2021
Python

dfavenfre / customer_deposit_classifier

Star

Streamlit app developed for bank customer deposit prediction, using a fine-tuned XGBClassifier model.

finance banking smote rfecv xgboost-classifier knn-imputer

Updated Aug 10, 2023
Jupyter Notebook

sayukiusui / Capstone-IDSCP

Star

My Capstone for the HarvardX Course "Introduction to Data Science with Python"

data-science data jupyter-notebook python3 dataset scipy data-analysis logistic-regression knn data-analysis-python data-science-projects matplotlib-pyplot scikit-learn-python covid19 covid-19-data-analysis data-visualization-python knn-imputer

Updated May 16, 2024
Jupyter Notebook

AmbreenMahhoor / What-Is-Complete-Case-Analysis-Or-CCA

Star

cca handling-missing-value knn-imputer mean-median-imputation frequent-value-imputation missing-indicator arbitrary-value-imputation automatically-select-imputer-parameters missing-category-imputation random-sample-imputation complete-case-analysis

Updated Aug 2, 2024
Jupyter Notebook

Allen-Ho-0302 / First-Time-Eligible-Arbitration-Salary-Prediction

Star

Modelling the relationship between a player’s first-time eligible arbitration salary and multiple variables.

python random-forest heatmap-visualization lightgbm-regressor smogn knn-imputer

Updated Sep 23, 2022
Jupyter Notebook

Seghelicious / Cars45

Star

cross-validation pipelines regression standardization preprocessing data-cleaning normalization correlation-coefficient random-forest-regressor model-development extreme-values knn-regressor knn-imputer grid-search-cv standard-scaler log-transform

Updated Mar 14, 2021
Jupyter Notebook

Gui-Sitton / Zyfra

Star

The company develops efficiency solutions for heavy industry. The model should predict the amount of pure gold extracted from gold ore. You have the data on extraction and purification. The model will help optimize production and eliminate unprofitable parameters.

python data-science machine-learning predictive-modeling knn-regression knn-imputer

Updated Sep 4, 2023
Jupyter Notebook

ZG3Z / bts-weather-clustering

Star

clustering geolocation nominatim kmeans-clustering plotly-express dataintegration knn-imputer aglomerative-hierarchical-clustering

Updated Jan 7, 2024
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

knn-imputer

Here are 28 public repositories matching this topic...

miriamspsantos / heterogeneous-distance-functions

SebastianRokholt / Data-Science-Projects

SINGHxTUSHAR / Sensor-Fault-Detection

ZL63388 / data-preparation-codes

TheMrityunjayPathak / Feature-Engineering

MariaDimopoulou / Churn-Prediction-Customer-Segmentation-in-E-Commerce

zuhaib1214 / Feature-Engineering

mahnoorsheikh16 / Credit-Card-Default-Prediction

nf-i / data-imputation-python

KasiMuthuveerappan / CAB-Ensemble-Learning-CHURN-Prediction

SamKazan / fraud-detection-ml

YD5463 / TabularDataProject

bortch / second_hand_UK_car_challenge

dfavenfre / customer_deposit_classifier

sayukiusui / Capstone-IDSCP

AmbreenMahhoor / What-Is-Complete-Case-Analysis-Or-CCA

Allen-Ho-0302 / First-Time-Eligible-Arbitration-Salary-Prediction

Seghelicious / Cars45

Gui-Sitton / Zyfra

ZG3Z / bts-weather-clustering

Improve this page

Add this topic to your repo