Confused about position embedding in BERT #1427

EvelienDesmedt · 2025-03-06T09:42:25Z

Hey!

I'm currently researching BERT and I'm a bit confused about the position embeddings. I've come across many articles, websites, and blogs, but they all seem to say different things. Some claim that BERT uses learnable position embeddings, while others suggest it uses sin/cosine functions like the original Transformer model. There are also some sources that mention multiple ways to construct positional embeddings. Does anyone have a clear explanation on this? Also, if it’s the learnable positional embeddings, can anyone recommend some useful articles on the topic? I’ve had trouble finding any solid references myself

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about position embedding in BERT #1427

Confused about position embedding in BERT #1427

EvelienDesmedt commented Mar 6, 2025

Confused about position embedding in BERT #1427

Confused about position embedding in BERT #1427

Comments

EvelienDesmedt commented Mar 6, 2025