You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently researching BERT and I'm a bit confused about the position embeddings. I've come across many articles, websites, and blogs, but they all seem to say different things. Some claim that BERT uses learnable position embeddings, while others suggest it uses sin/cosine functions like the original Transformer model. There are also some sources that mention multiple ways to construct positional embeddings. Does anyone have a clear explanation on this? Also, if it’s the learnable positional embeddings, can anyone recommend some useful articles on the topic? I’ve had trouble finding any solid references myself
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hey!
I'm currently researching BERT and I'm a bit confused about the position embeddings. I've come across many articles, websites, and blogs, but they all seem to say different things. Some claim that BERT uses learnable position embeddings, while others suggest it uses sin/cosine functions like the original Transformer model. There are also some sources that mention multiple ways to construct positional embeddings. Does anyone have a clear explanation on this? Also, if it’s the learnable positional embeddings, can anyone recommend some useful articles on the topic? I’ve had trouble finding any solid references myself
Thanks in advance!
The text was updated successfully, but these errors were encountered: