https://github.com/IBM/pytorch-seq2seq/blob/f146087a9a271e9b50f46561e090324764b081fb/seq2seq/models/TopKDecoder.py#L83 . I think teacher_forcing should not be present in beam decoding, since ground truth tokens are not known during inference.