Backorder_analysis_prediction

Backgroud:

I am interested by advanced analytics related to supply chain, and decided to take on a modeling project to gain experience and better understand one particular challenge: predicting whether a product will go on backorder.

Product backorder may be the result of strong sales performance (e.g. the product is in such high demand that production cannot keep up with sales). However, backorders can upset consumers, lead to canceled orders and decreased customer loyalty. Companies want to avoid backorders, but also avoid overstocking every product (leading to higher inventory costs).

Data

I used data from Kaggle, which had approx. 1.9 million observations of parts in an 8 week period. The source of the data is unreferenced.

Outcome: whether the part went on backorder

Predictors: Current inventory, sales history, forecasted sales, recommended stocking amount, part risk flags etc. (22 predictors in total)

Modeling

Key considerations of the data:

Imbalanced outcome: Only 0.7% of parts actually go on backorder.
Outliers and skewed predictors: Part quantities (stock, sales etc.) can be on very different scales.
Missing data: A few variables have data that are missing (not at random).
n>>p: There are many observations (1.9 million) relative to the number of predictors (22).

Implemented Models

We made several modeling decisions to address these issues:

Random forest :

Perform well with imbalanced data typically
Robust to outliers and the skewed predictors: Because they are using tree partitioning algorithms and not producing coefficient estimates, outliers and skewness are not as much of a concern as for other predictive models.
Dealing with missing data: The few variables with missing data had mean imputed, and a binary variable was created to indicate whether the observation had missing data, in hopes to account for the missing data not being random.

Validation

KFold: We use 10-fold cross-validation in order to tune model parameters (maximum number of variables to try and minimum leaf size), as well as compare model performance.
The ROC Area Under the Curve (AUC) was used as a validation metric because the outcome is so imbalanced. By looking at ROC curves, we may determine a cutoff threshold for classification after fitting the models, rather than naively assuming a threshold of 0.5.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Backorder		Backorder
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Backorder_analysis_prediction

Backgroud:

Data

Outcome: whether the part went on backorder

Predictors: Current inventory, sales history, forecasted sales, recommended stocking amount, part risk flags etc. (22 predictors in total)

Modeling

Key considerations of the data:

Implemented Models

Random forest :

Validation

About

Uh oh!

Releases

Packages

Languages

poulrohan23/Backorder_analysis_prediction

Folders and files

Latest commit

History

Repository files navigation

Backorder_analysis_prediction

Backgroud:

Data

Outcome: whether the part went on backorder

Predictors: Current inventory, sales history, forecasted sales, recommended stocking amount, part risk flags etc. (22 predictors in total)

Modeling

Key considerations of the data:

Implemented Models

Random forest :

Validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages