Add OLMo2 models (1B, 7B, 13B) support to fairseq2 #1410

YunchaoYang · 2025-11-05T17:04:11Z

What does this PR do? Please describe:

Add OLMo2 model architecture (1B, 7B, 13B) support in fairseq2
The key architecture changes include:
OLMO2 is similar to the LLaMA architecture with the following differences:
- Olmo2RMSNorm: In Olmo2 the order of operation for RMSNorm is normalize -> multiply by weight -> cast to original dtype.
- OLMO2TransformerLMDecoderLayer: OLMO2 uses Post-Norm in decoder layer: Attention/FFN -> Norm -> Add Residual, which is different than the existing Pre-Norm and Post-Norm order.
- OLMO2MultiheadAttention:
  - OLMO2 adds Q/K Norm in attention layers, the Q/K has slight difference in the order of normalization and reshape: Project → Normalize → Reshape → RoPE.
  - OLMO2 MHA instead of GQA. OLMO2-32B model use GQA.
- OLMO2RotaryEmbedding: The Rope Module reuse the existing ReferenceRotaryEncoder module.
An integration test is added to ensure the output is consistent with HF Transformers. The integration test has passed for the 1B model.

Note:
OLMO2MultiheadAttention inherits from StandardMultiheadAttention (marked @final)
because the only difference is the order of normalization in _project_q() and _project_kv().
Reimplementing the entire class would duplicate ~150 lines of boilerplate code. Right now, the type checker warning is suppressed.

Fixes #1402

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

…to ensure outputs align with HF Transformer

yunchaoyang1 user and others added 6 commits October 29, 2025 14:24

add __init__.py file for the olmo model

7a0fa0e

implement fs2 olmo2

7d90469

Add the olmo2-0425-1B model to fs2, with 3 integrations tests passed …

1bc4ee6

…to ensure outputs align with HF Transformer

add 13b model

3d1ad42

add olmo2 model to the hub

6f9296c

add test to olmo2 model output consistency

b12b88e

YunchaoYang requested review from MartinGleize, cbalioglu, cirquit and zyaoj as code owners November 5, 2025 17:04

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2025

YunchaoYang and others added 11 commits November 5, 2025 12:06

Merge branch 'main' into yy/add-olmo2-model

67596a2

fix the model name for 7b and 13b

1d7aa1b

use the ReferenceRotaryEncoder

050191a

rename arch and add olmo2 tokenizer family

8ca5026

update tests on using olmo2 own tokenizer

8aa2b7d

Tests passed! OLMO2 HF and FS2 produce same outputs

b5b1f6a

refactor olmo2 attention and normalization; fix linter type check issues

454a90d

fix format

f0fd306

run test_olmo2 on cpu

6860231

fix format issues in test_olmo2.py

0480acd

fix lint issues

2a07152

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OLMo2 models (1B, 7B, 13B) support to fairseq2 #1410

Add OLMo2 models (1B, 7B, 13B) support to fairseq2 #1410

Uh oh!

YunchaoYang commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add OLMo2 models (1B, 7B, 13B) support to fairseq2 #1410

Are you sure you want to change the base?

Add OLMo2 models (1B, 7B, 13B) support to fairseq2 #1410

Uh oh!

Conversation

YunchaoYang commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YunchaoYang commented Nov 5, 2025 •

edited

Loading