Return true initial batch size information by jlarson4 · Pull Request #1190 · TransformerLensOrg/TransformerLens

jlarson4 · 2026-03-03T18:54:15Z

Description

HuggingFace's GPT-2 generates position_ids with shape [1, seq_len] regardless of batch size, relying on PyTorch broadcasting when adding to token embeddings. TransformerBridge's PosEmbedBridge wraps the HF wpe module, so the hook fires on wpe's output before broadcasting — capturing [1, seq, d_model] instead of [batch, seq, d_model].

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…e batch size to ensure replicable information

jlarson4 added 2 commits March 3, 2026 12:49

HF optimizes that batch size information, this converts it to the tru…

6f7c413

…e batch size to ensure replicable information

Fixed bug for read and clear only

6399f45

jlarson4 merged commit 6662189 into dev-3.x-canary Mar 3, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return true initial batch size information#1190

Return true initial batch size information#1190
jlarson4 merged 2 commits intodev-3.x-canaryfrom
bug/pos-embed-hook-firing

jlarson4 commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlarson4 commented Mar 3, 2026

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant