Skip to content

Commit 0b9be6d

Browse files
committed
Update notebook
1 parent 8295017 commit 0b9be6d

11 files changed

+195
-1258
lines changed

08.sentiment-analysis-with-bert.ipynb

+128-1,221
Large diffs are not rendered by default.

manuscript/08.sentiment-analysis-with-bert.md

+67-37
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ sns.countplot(df.score)
158158
plt.xlabel('review score');
159159
```
160160

161-
![png](images/pytorch-07/08_sentiment_analysis_with_bert_16_0.png)
161+
![png](images/pytorch-07/08.sentiment-analysis-with-bert_15_0.png)
162162

163163
That's hugely imbalanced, but it's okay. We're going to convert the dataset into negative, neutral and positive sentiment:
164164

@@ -185,7 +185,7 @@ plt.xlabel('review sentiment')
185185
ax.set_xticklabels(class_names);
186186
```
187187

188-
![png](images/pytorch-07/08_sentiment_analysis_with_bert_20_0.png)
188+
![png](images/pytorch-07/08.sentiment-analysis-with-bert_19_0.png)
189189

190190
The balance was (mostly) restored.
191191

@@ -366,7 +366,7 @@ plt.xlim([0, 256]);
366366
plt.xlabel('Token count');
367367
```
368368

369-
![png](images/pytorch-07/08_sentiment_analysis_with_bert_50_0.png)
369+
![png](images/pytorch-07/08.sentiment-analysis-with-bert_49_0.png)
370370

371371
Most of the reviews seem to contain less than 128 tokens, but we'll be on the safe side and choose a maximum length of 160.
372372

@@ -529,19 +529,19 @@ class SentimentClassifier(nn.Module):
529529
self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
530530
self.drop = nn.Dropout(p=0.3)
531531
self.out = nn.Linear(self.bert.config.hidden_size, n_classes)
532-
self.softmax = nn.Softmax(dim=1)
533532

534533
def forward(self, input_ids, attention_mask):
535534
_, pooled_output = self.bert(
536535
input_ids=input_ids,
537536
attention_mask=attention_mask
538537
)
539538
output = self.drop(pooled_output)
540-
output = self.out(output)
541-
return self.softmax(output)
539+
return self.out(output
542540
```
543541

544-
Our classifier delegates most of the heavy lifting to the BertModel. We use a dropout layer for some regularization and a fully-connected layer for our output. This should work like any other PyTorch model. Let's create an instance and move it to the GPU:
542+
Our classifier delegates most of the heavy lifting to the BertModel. We use a dropout layer for some regularization and a fully-connected layer for our output. Note that we're returning the raw output of the last layer since that is required for the cross-entropy loss function in PyTorch to work.
543+
544+
This should work like any other PyTorch model. Let's create an instance and move it to the GPU
545545

546546
```py
547547
model = SentimentClassifier(len(class_names))
@@ -561,10 +561,10 @@ print(attention_mask.shape) # batch size x seq length
561561
torch.Size([16, 160])
562562
torch.Size([16, 160])
563563

564-
And get predictions from our (untrained) model:
564+
To get the predicted probabilities from our trained model, we'll apply the softmax function to the outputs:
565565

566566
```py
567-
model(input_ids, attention_mask)
567+
F.softmax(model(input_ids, attention_mask), dim=1)
568568
```
569569

570570
tensor([[0.5879, 0.0842, 0.3279],
@@ -589,7 +589,7 @@ model(input_ids, attention_mask)
589589
To reproduce the training procedure from the BERT paper, we'll use the [AdamW](https://huggingface.co/transformers/main_classes/optimizer_schedules.html#adamw) optimizer provided by Hugging Face. It corrects weight decay, so it's similar to the original paper. We'll also use a linear scheduler with no warmup steps:
590590

591591
```py
592-
EPOCHS = 50
592+
EPOCHS = 10
593593

594594
optimizer = AdamW(model.parameters(), lr=2e-5, correct_bias=False)
595595
total_steps = len(train_data_loader) * EPOCHS
@@ -730,30 +730,58 @@ for epoch in range(EPOCHS):
730730
best_accuracy = val_acc
731731
```
732732

733-
Epoch 1/50
733+
Epoch 1/10
734+
----------
735+
Train loss 0.7330631300571541 accuracy 0.6653729447463129
736+
Val loss 0.5767546480894089 accuracy 0.7776365946632783
737+
738+
Epoch 2/10
734739
----------
735-
Train loss 0.9025589391151885 accuracy 0.6324183191023922
736-
Val loss 0.8391157329082489 accuracy 0.7115628970775095
740+
Train loss 0.4158683338330777 accuracy 0.8420012701997036
741+
Val loss 0.5365073362737894 accuracy 0.832274459974587
737742

738-
Epoch 2/50
743+
Epoch 3/10
739744
----------
740-
Train loss 0.8013420265765007 accuracy 0.7453955260743773
741-
Val loss 0.8175631034374237 accuracy 0.7357052096569251
745+
Train loss 0.24015077009679367 accuracy 0.922023851527768
746+
Val loss 0.5074492372572422 accuracy 0.8716645489199493
742747

743-
.....
748+
Epoch 4/10
749+
----------
750+
Train loss 0.16012676668187295 accuracy 0.9546962105708843
751+
Val loss 0.6009970247745514 accuracy 0.8703939008894537
744752

745-
Epoch 49/50
753+
Epoch 5/10
746754
----------
747-
Train loss 0.6315805039475788 accuracy 0.9197657187213323
748-
Val loss 0.7163282692432403 accuracy 0.8424396442185516
755+
Train loss 0.11209654617575301 accuracy 0.9675393409074872
756+
Val loss 0.7367783848941326 accuracy 0.8742058449809403
749757

750-
Epoch 50/50
758+
Epoch 6/10
751759
----------
752-
Train loss 0.631561377785814 accuracy 0.9199068520217346
753-
Val loss 0.7175787663459778 accuracy 0.841168996188056
760+
Train loss 0.08572274737026433 accuracy 0.9764307388328276
761+
Val loss 0.7251267762482166 accuracy 0.8843710292249047
754762

755-
CPU times: user 2h 27min 31s, sys: 1h 7min, total: 3h 34min 32s
756-
Wall time: 3h 35min 51s
763+
Epoch 7/10
764+
----------
765+
Train loss 0.06132202987342602 accuracy 0.9833462705525369
766+
Val loss 0.7083295831084251 accuracy 0.889453621346887
767+
768+
Epoch 8/10
769+
----------
770+
Train loss 0.050604159273123096 accuracy 0.9849693035071626
771+
Val loss 0.753860274553299 accuracy 0.8907242693773825
772+
773+
Epoch 9/10
774+
----------
775+
Train loss 0.04373276197092931 accuracy 0.9862395032107826
776+
Val loss 0.7506809896230697 accuracy 0.8919949174078781
777+
778+
Epoch 10/10
779+
----------
780+
Train loss 0.03768671146314381 accuracy 0.9880036694658105
781+
Val loss 0.7431786182522774 accuracy 0.8932655654383737
782+
783+
CPU times: user 29min 54s, sys: 13min 28s, total: 43min 23s
784+
Wall time: 43min 43s
757785

758786
Note that we're storing the state of the best model, indicated by the highest validation accuracy.
759787

@@ -770,12 +798,14 @@ plt.legend()
770798
plt.ylim([0, 1]);
771799
```
772800

773-
![png](images/pytorch-07/08_sentiment_analysis_with_bert_94_0.png)
801+
![png](images/pytorch-07/08.sentiment-analysis-with-bert_93_0.png)
802+
803+
The training accuracy starts to approach 100% after 10 epochs or so. You might try to fine-tune the parameters a bit more, but this will be good enough for us.
774804

775805
Don't want to wait? Uncomment the next cell to download my pre-trained model:
776806

777807
```py
778-
# !gdown --id 1ZZFaHiJjsftT2fc4vUZbXZfVkYVDV5-y
808+
# !gdown --id 1V8itWtowCYnb2Bc9KlK9SxGff9WwmogA
779809

780810
# model = SentimentClassifier(len(class_names))
781811
# model.load_state_dict(torch.load('best_model_state.bin'))
@@ -798,9 +828,9 @@ test_acc, _ = eval_model(
798828
test_acc.item()
799829
```
800830

801-
0.8223350253807106
831+
0.883248730964467
802832

803-
The accuracy is about 2% lower on the test set. Our model seems to generalize well.
833+
The accuracy is about 1% lower on the test set. Our model seems to generalize well.
804834

805835
We'll define a helper function to get the predictions from our model:
806836

@@ -855,13 +885,13 @@ print(classification_report(y_test, y_pred, target_names=class_names))
855885

856886
precision recall f1-score support
857887

858-
negative 0.81 0.81 0.81 245
859-
neutral 0.78 0.75 0.77 254
860-
positive 0.87 0.89 0.88 289
888+
negative 0.89 0.87 0.88 245
889+
neutral 0.83 0.85 0.84 254
890+
positive 0.92 0.93 0.92 289
861891

862-
accuracy 0.82 788
863-
macro avg 0.82 0.82 0.82 788
864-
weighted avg 0.82 0.82 0.82 788
892+
accuracy 0.88 788
893+
macro avg 0.88 0.88 0.88 788
894+
weighted avg 0.88 0.88 0.88 788
865895

866896
Looks like it is really hard to classify neutral (3 stars) reviews. And I can tell you from experience, looking at many reviews, those are hard to classify.
867897

@@ -880,7 +910,7 @@ df_cm = pd.DataFrame(cm, index=class_names, columns=class_names)
880910
show_confusion_matrix(df_cm)
881911
```
882912

883-
![png](images/pytorch-07/08_sentiment_analysis_with_bert_106_0.png)
913+
![png](images/pytorch-07/08.sentiment-analysis-with-bert_105_0.png)
884914

885915
This confirms that our model is having difficulty classifying neutral reviews. It mistakes those for negative and positive at a roughly equal frequency.
886916

@@ -923,7 +953,7 @@ plt.xlabel('probability')
923953
plt.xlim([0, 1]);
924954
```
925955

926-
![png](images/pytorch-07/08_sentiment_analysis_with_bert_111_0.png)
956+
![png](images/pytorch-07/08.sentiment-analysis-with-bert_110_0.png)
927957

928958
### Predicting on Raw Text
929959

Loading
Loading
Loading
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)