There are several research fields in which the edit-distance chosen as the objective function. For example, in Automatic Speech Recognition (ASR) the main metric of the quality of models is Word Error Rate (WER).
Unfortunately, directly optimize the edit-distance function is difficult. Therefore, in most cases, approaches based on a proxy function, like a cross-entropy. On the other hand, in the context of the sequence learning task this leads to several problems [1]:
-
Exposure Bias: the model is never exposed to its own errors during training, and so the inferred histories at test-time do not resemble the gold training histories.
-
Loss Evaluation Mismatch: training uses a word-level loss, while at test-time we target improving sequence-level evaluation metrics
-
Label Bias: since word probabilities at each time-step are locally normalized, guaranteeing that successors of incorrect histories receive the same mass as do the successors of the true history.
The following table summarizes the works that attempts to solve the mentioned problems. There are much more detailed overview of works, for example [2], but this list includes only works that use the edit-distance explicitly or implicitly. Moreover, most of these works formalize the sequence prediction task as an action-taking problem in Reinforcement Learning.
Year | Task | Reward level | Algorithms, Models | Affiliation | Authors, Link |
---|---|---|---|---|---|
2020 | ASR | Sentence | MWER, RNN-T | Amazon | Guo et al. |
2020 | MT | Sentence | MGS, parameter search | NYU | Welleck, Cho |
2020 | ASR | Sentence | Proper Noun, Phonetic Fuzzing, MWER, RNN-T, LAS | Peyser, Sainath, Pundak | |
2019 | NLP | Sentence | GPT-2, PPO, Human labeling | OpenAI | Ziegler, Stiennon et al. |
2019 | ASR | Sentence | Neural Architecture Search, REINFORCE, CTC | KPMG Nigeria, OAU | Baruwa et al. |
2019 | ASR | Sentence | Normalized MWER | Amazon | Gandhe, Rastrow |
2019 | ASR | Token | MBR, RNN-T | Tencent, USA | Weng et al. |
2019 | ASR | Token | ECTC-DOCD | China | Yi, Wang, Xu |
2019 | ASR | Sentence | MWER, RNN-T, LAS | Sainath, Pang et al | |
2019 | MT | Token | Reinforce-NAT, Non-Autoregressive Transformer | China, Tencent | Shao, Feng et al. |
2019 | MT, TS, APE | Token | Levenshtein Transformer, imitation learning | Facebook, New York | Gu, Wang, Zhao |
2018 | ASR | Token | MBR, softmax margin, PAPB, S2S | Brno, JHU, MERL | Baskar et al. |
2018 | ASR | Token | OCD, S2S | Google Brain | Sabour, Chan, Norouzi |
2018 | ASR | Token | REINFORCE, S2S | Nara, RIKEN | Tjandra et al. |
2018 | TS | Sentence | Alternating Actor-Critic | Hong Kong, Tencent | Li, Bing, Lam |
2018 | ASR | Sentence | REINFORCE, PPO, Reward shaping | Tokyo | Peng, Shibata, Shinozaki |
2017 | ASR | Sentence | REINFORCE, Self-critic | Salesforce | Zhou, Xiong, Socher |
2017 | ASR | Sentence | MWER, LAS, Sampling, N-best | Prabhavalkar et al. | |
2017 | ASR | Sentence | Expected Loss, RNA | Sak et al. | |
2017 | MT | Sentence | Actor-Critic, Critic-aware | Hong Kong, New York | Gu, Cho, Li |
2016 | ASR | Sentence | Reward Augmented ML | Google Brain | Norouzi et al. |
2016 | MT | Token | Actor-Critic | Montreal, McGill | Bahdanau et al. |
2015 | MT | Sentence | MIXER | Ranzato et al. | |
2015 | ASR | Token | Task Loss Estimation | Montreal, Wrocław | Bahdanau et al. |
2014 | ASR | Sentence | Expected Loss, CTC | DeepMind, Toronto | Graves, Jaitly |