Skip to content

scarlett-k-nhs/feature_selection_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Diabetes & Prediabetes Feature Selection with XGBoost (Feature Selection Workshop)

Overview

This repository contains code and analyses for building an XGBoost model to classify individuals with diabetes based on healthcare and lifestyle survey data. The primary focus is on feature selection using a variety of methods to reduce the number of features while maximising f1.

Folder structure

+---notebooks
|   +---Feature_Importance_Cheatsheet        <- This provides example code for feature importance for an XGBoost model.
|   +---Feature_Importance_Workshop          <- This trains and XGBoost model for Feature Selection.
|   +---load_diabetes_data                   <- This code was used to load the data from UCI and balance the classes. 
|
|   README.md                                <- Quick start guide

Getting Started

The Feature_Importance_Workshop and the Feature_importance_cheatsheet runs in google collab notebooks.

Dataset Overview: CDC Diabetes Health Indicators

This dataset comprises healthcare statistics and lifestyle survey information about individuals in the United States. It was collected by the Centers for Disease Control and Prevention (CDC) and is publicly available through the UCI Machine Learning Repository.

Citation: Markelle Kelly, Rachel Longjohn, Kolby Nottingham, The UCI Machine Learning Repository

Useful Resources

The work in this repository has been influenced by a number of helpful articles and tutorials:

License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published