Twitter has been a most popular social media for crypto related topic since bitcoin reach 1000 usd first time. In order to predict the sentiment of tweets ,here is a smaple of training machine learning model with 100k bitcoin related tweets.
Part 1 Data Preparation
- Data wrangling
- Standardization of the words of tweets
- Tokenization of words
- Vectorization of words for Machine Learning
- Dimention reduction of features (Singular Value Decomposition)
Part 2 Train dataset with machine learning modeals
- Split 80% of dataset and 20% dataset into training set and test set
- Train Random Forest model with training set
- Evaluate the resulte of model with accuracy score, recall score, precision score, z-1 score and confusion matrix
- Train Logistic Regression model with training set
- Evaluate the resulte of model with accuracy score, recall score, precision score, z-1 score and confusion matrix
- Optimize the Logistic Regression model with different therhold