Python 3.5 classification of tweets (positive or negative) using NLTK-3 and sklearn.
An analysis of the twitter data set included in the nltk corpus.
- An implementation of
nltk.NaiveBayesClassifiertrained against 1000 tweets. Implemented inTrain_Classifiers.py. - Using
sklearn- Naive Bayes:
-
MultinomialNB: -
BernoulliNB:
-
- Linear Model
-
LogisticRegression: -
SGDClassifier:
-
- SVM
-
SVC: -
LinearSVC: -
NuSVC:
-
- Naive Bayes:
Implemented in Scikit_Learn_Classifiers.py
- Implemented a voting system to choose the best out of all the learning methods. Implemented in
sentiment_mod.py
| Classifiers | Accuracy achieved |
|---|---|
nltk.NaiveBayesClassifier |
73.0% |
| ScikitLearn Implementations | |
BernoulliNB |
72.0% |
MultinomialNB |
75.0% |
LogisticRegression |
71.0% |
SGDClassifier |
69.0% |
SVC |
48.0% |
LinearSVC |
74.0% |
NuSVC |
75.0% |
The simplest way(and the suggested way) would be to install the the required packages and the dependencies by using either anaconda or miniconda
After that you can do
$ conda update conda
$ conda install scikit-learn nltkThe dataset used in this package is bundled along with the nltk package.
Run your python interpreter
>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('movie_reviews') NOTE: You can check system specific installation instructions from the official nltk website
Check if everything is good till now by running your interpreter again and importing these
>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn
>>> If these imports work for you. Then you are good to go!
- Clone the repo
$ git clone https://github.com/aalind0/Movie_Reviews-Sentiment_Analysis
$ cd Movie_Reviews-Sentiment_Analysis-
Order of running
-
NLTK_Naive_Bayes.py -
Scikit_Learn_Classifiers.py -
Voting_Algos.py -
Hack away!
"So what, Well this is pretty basic!"
Yes, it is but hey we all do start somewhere right?
Coming Up. I am working on a Twitter Sentiment Analysis project which first trains on a given data-set and then takes in the live twitter feeds, analyses them plus plots them for data visualization.
You can follow me on twitter @singh_aalind to keep tabs on it.
Hacked together by Aalind Singh.