@@ -158,7 +158,7 @@ sns.countplot(df.score)
158
158
plt.xlabel(' review score' );
159
159
```
160
160
161
- ![ png] ( images/pytorch-07/08_sentiment_analysis_with_bert_16_0 .png )
161
+ ![ png] ( images/pytorch-07/08.sentiment-analysis-with-bert_15_0 .png )
162
162
163
163
That's hugely imbalanced, but it's okay. We're going to convert the dataset into negative, neutral and positive sentiment:
164
164
@@ -185,7 +185,7 @@ plt.xlabel('review sentiment')
185
185
ax.set_xticklabels(class_names);
186
186
```
187
187
188
- ![ png] ( images/pytorch-07/08_sentiment_analysis_with_bert_20_0 .png )
188
+ ![ png] ( images/pytorch-07/08.sentiment-analysis-with-bert_19_0 .png )
189
189
190
190
The balance was (mostly) restored.
191
191
@@ -366,7 +366,7 @@ plt.xlim([0, 256]);
366
366
plt.xlabel(' Token count' );
367
367
```
368
368
369
- ![ png] ( images/pytorch-07/08_sentiment_analysis_with_bert_50_0 .png )
369
+ ![ png] ( images/pytorch-07/08.sentiment-analysis-with-bert_49_0 .png )
370
370
371
371
Most of the reviews seem to contain less than 128 tokens, but we'll be on the safe side and choose a maximum length of 160.
372
372
@@ -529,19 +529,19 @@ class SentimentClassifier(nn.Module):
529
529
self .bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME )
530
530
self .drop = nn.Dropout(p = 0.3 )
531
531
self .out = nn.Linear(self .bert.config.hidden_size, n_classes)
532
- self .softmax = nn.Softmax(dim = 1 )
533
532
534
533
def forward (self , input_ids , attention_mask ):
535
534
_, pooled_output = self .bert(
536
535
input_ids = input_ids,
537
536
attention_mask = attention_mask
538
537
)
539
538
output = self .drop(pooled_output)
540
- output = self .out(output)
541
- return self .softmax(output)
539
+ return self .out(output
542
540
```
543
541
544
- Our classifier delegates most of the heavy lifting to the BertModel. We use a dropout layer for some regularization and a fully-connected layer for our output. This should work like any other PyTorch model. Let's create an instance and move it to the GPU:
542
+ Our classifier delegates most of the heavy lifting to the BertModel. We use a dropout layer for some regularization and a fully- connected layer for our output. Note that we' re returning the raw output of the last layer since that is required for the cross-entropy loss function in PyTorch to work.
543
+
544
+ This should work like any other PyTorch model. Let' s create an instance and move it to the GPU
545
545
546
546
```py
547
547
model = SentimentClassifier(len (class_names))
@@ -561,10 +561,10 @@ print(attention_mask.shape) # batch size x seq length
561
561
torch.Size([16 , 160 ])
562
562
torch.Size([16 , 160 ])
563
563
564
- And get predictions from our (untrained) model:
564
+ To get the predicted probabilities from our trained model, we ' ll apply the softmax function to the outputs :
565
565
566
566
```py
567
- model(input_ids, attention_mask)
567
+ F.softmax( model(input_ids, attention_mask), dim = 1 )
568
568
```
569
569
570
570
tensor([[0.5879 , 0.0842 , 0.3279 ],
@@ -589,7 +589,7 @@ model(input_ids, attention_mask)
589
589
To reproduce the training procedure from the BERT paper, we' ll use the [AdamW](https://huggingface.co/transformers/main_classes/optimizer_schedules.html#adamw) optimizer provided by Hugging Face. It corrects weight decay, so it' s similar to the original paper. We' ll also use a linear scheduler with no warmup steps:
590
590
591
591
```py
592
- EPOCHS = 50
592
+ EPOCHS = 10
593
593
594
594
optimizer = AdamW(model.parameters(), lr = 2e-5 , correct_bias = False )
595
595
total_steps = len (train_data_loader) * EPOCHS
@@ -730,30 +730,58 @@ for epoch in range(EPOCHS):
730
730
best_accuracy = val_acc
731
731
```
732
732
733
- Epoch 1/50
733
+ Epoch 1 / 10
734
+ ----------
735
+ Train loss 0.7330631300571541 accuracy 0.6653729447463129
736
+ Val loss 0.5767546480894089 accuracy 0.7776365946632783
737
+
738
+ Epoch 2 / 10
734
739
----------
735
- Train loss 0.9025589391151885 accuracy 0.6324183191023922
736
- Val loss 0.8391157329082489 accuracy 0.7115628970775095
740
+ Train loss 0.4158683338330777 accuracy 0.8420012701997036
741
+ Val loss 0.5365073362737894 accuracy 0.832274459974587
737
742
738
- Epoch 2/50
743
+ Epoch 3 / 10
739
744
----------
740
- Train loss 0.8013420265765007 accuracy 0.7453955260743773
741
- Val loss 0.8175631034374237 accuracy 0.7357052096569251
745
+ Train loss 0.24015077009679367 accuracy 0.922023851527768
746
+ Val loss 0.5074492372572422 accuracy 0.8716645489199493
742
747
743
- .....
748
+ Epoch 4 / 10
749
+ ----------
750
+ Train loss 0.16012676668187295 accuracy 0.9546962105708843
751
+ Val loss 0.6009970247745514 accuracy 0.8703939008894537
744
752
745
- Epoch 49/50
753
+ Epoch 5 / 10
746
754
----------
747
- Train loss 0.6315805039475788 accuracy 0.9197657187213323
748
- Val loss 0.7163282692432403 accuracy 0.8424396442185516
755
+ Train loss 0.11209654617575301 accuracy 0.9675393409074872
756
+ Val loss 0.7367783848941326 accuracy 0.8742058449809403
749
757
750
- Epoch 50/50
758
+ Epoch 6 / 10
751
759
----------
752
- Train loss 0.631561377785814 accuracy 0.9199068520217346
753
- Val loss 0.7175787663459778 accuracy 0.841168996188056
760
+ Train loss 0.08572274737026433 accuracy 0.9764307388328276
761
+ Val loss 0.7251267762482166 accuracy 0.8843710292249047
754
762
755
- CPU times: user 2h 27min 31s, sys: 1h 7min, total: 3h 34min 32s
756
- Wall time: 3h 35min 51s
763
+ Epoch 7 / 10
764
+ ----------
765
+ Train loss 0.06132202987342602 accuracy 0.9833462705525369
766
+ Val loss 0.7083295831084251 accuracy 0.889453621346887
767
+
768
+ Epoch 8 / 10
769
+ ----------
770
+ Train loss 0.050604159273123096 accuracy 0.9849693035071626
771
+ Val loss 0.753860274553299 accuracy 0.8907242693773825
772
+
773
+ Epoch 9 / 10
774
+ ----------
775
+ Train loss 0.04373276197092931 accuracy 0.9862395032107826
776
+ Val loss 0.7506809896230697 accuracy 0.8919949174078781
777
+
778
+ Epoch 10 / 10
779
+ ----------
780
+ Train loss 0.03768671146314381 accuracy 0.9880036694658105
781
+ Val loss 0.7431786182522774 accuracy 0.8932655654383737
782
+
783
+ CPU times: user 29min 54s , sys: 13min 28s , total: 43min 23s
784
+ Wall time: 43min 43s
757
785
758
786
Note that we' re storing the state of the best model, indicated by the highest validation accuracy.
759
787
@@ -770,12 +798,14 @@ plt.legend()
770
798
plt.ylim([0 , 1 ]);
771
799
```
772
800
773
- ![ png] ( images/pytorch-07/08_sentiment_analysis_with_bert_94_0.png )
801
+ 
802
+
803
+ The training accuracy starts to approach 100 % after 10 epochs or so. You might try to fine- tune the parameters a bit more, but this will be good enough for us.
774
804
775
805
Don' t want to wait? Uncomment the next cell to download my pre-trained model:
776
806
777
807
```py
778
- # !gdown --id 1ZZFaHiJjsftT2fc4vUZbXZfVkYVDV5-y
808
+ # !gdown --id 1V8itWtowCYnb2Bc9KlK9SxGff9WwmogA
779
809
780
810
# model = SentimentClassifier(len(class_names))
781
811
# model.load_state_dict(torch.load('best_model_state.bin'))
@@ -798,9 +828,9 @@ test_acc, _ = eval_model(
798
828
test_acc.item()
799
829
```
800
830
801
- 0.8223350253807106
831
+ 0.883248730964467
802
832
803
- The accuracy is about 2 % lower on the test set. Our model seems to generalize well.
833
+ The accuracy is about 1 % lower on the test set . Our model seems to generalize well.
804
834
805
835
We' ll define a helper function to get the predictions from our model:
806
836
@@ -855,13 +885,13 @@ print(classification_report(y_test, y_pred, target_names=class_names))
855
885
856
886
precision recall f1- score support
857
887
858
- negative 0.81 0.81 0.81 245
859
- neutral 0.78 0.75 0.77 254
860
- positive 0.87 0.89 0.88 289
888
+ negative 0.89 0.87 0.88 245
889
+ neutral 0.83 0.85 0.84 254
890
+ positive 0.92 0.93 0.92 289
861
891
862
- accuracy 0.82 788
863
- macro avg 0.82 0.82 0.82 788
864
- weighted avg 0.82 0.82 0.82 788
892
+ accuracy 0.88 788
893
+ macro avg 0.88 0.88 0.88 788
894
+ weighted avg 0.88 0.88 0.88 788
865
895
866
896
Looks like it is really hard to classify neutral (3 stars) reviews. And I can tell you from experience, looking at many reviews, those are hard to classify.
867
897
@@ -880,7 +910,7 @@ df_cm = pd.DataFrame(cm, index=class_names, columns=class_names)
880
910
show_confusion_matrix(df_cm)
881
911
```
882
912
883
- ![ png] ( images/pytorch-07/08_sentiment_analysis_with_bert_106_0 .png )
913
+ 
884
914
885
915
This confirms that our model is having difficulty classifying neutral reviews. It mistakes those for negative and positive at a roughly equal frequency.
886
916
@@ -923,7 +953,7 @@ plt.xlabel('probability')
923
953
plt.xlim([0 , 1 ]);
924
954
```
925
955
926
- ![ png] ( images/pytorch-07/08_sentiment_analysis_with_bert_111_0 .png )
956
+ 
927
957
928
958
# ## Predicting on Raw Text
929
959
0 commit comments