Online News Popularity Prediction
This work will help online news companies to predict news popularity before publication ,the news popularity are often indicated by the amount of reads, likes or shares. For the web news stake holders, it’s very valuable if the recognition of the news articles are often accurately predicted before the publication. Thus, it's interesting and meaningful to use the machine learning techniques to predict the recognition of online news articles.In our project, the dataset including 39,643 news articles from website Mashable, we attempt to find the simplest classification learning algorithm to accurately predict if a news story will become popular or not before publication.
List of Predictive Attributes of Dataset:
For each instance of the dataset, it has 61 attributes which includes 1 target attribute (number of shares), 2 non-predictive features (URL of the article and Days between the article publication and the dataset acquisition) and 58 predictive features.
Graphs and Visualizations
Popular/unpopular news over different days of a week
Popular/unpopular news over different article category
Before algorithm implementation, for each algorithm, I also randomly split dataset with its own selected features into training set (90%) and testing set (10%). The logistic regression, RF and Adaboost are implemented by the sklearn function LogisticRegression(), RandomForestClassifier() and AdaBoostClassifier(), respectively.
Performance of three classifiers under default parameter settings:
Final Model Result and Accuracy Scores
Tested the model with training/testing set ratio 0.15
I came to conclusion after comparing the results obtained from all the three classifiers used that Random forest algorithm proves to be the most accurate amongst all giving us an accuracy rate of 67%.
Dataset Link : https://archive.ics.uci.edu/ml/datasets/Online+News+