Skip to content

Conversation

@YunchaoYang
Copy link

@YunchaoYang YunchaoYang commented Nov 5, 2025

What does this PR do? Please describe:

  • Add OLMo2 model architecture (1B, 7B, 13B) support in fairseq2

  • The key architecture changes include:
    OLMO2 is similar to the LLaMA architecture with the following differences:

    • Olmo2RMSNorm: In Olmo2 the order of operation for RMSNorm is normalize -> multiply by weight -> cast to original dtype.
    • OLMO2TransformerLMDecoderLayer: OLMO2 uses Post-Norm in decoder layer: Attention/FFN -> Norm -> Add Residual, which is different than the existing Pre-Norm and Post-Norm order.
    • OLMO2MultiheadAttention:
      • OLMO2 adds Q/K Norm in attention layers, the Q/K has slight difference in the order of normalization and reshape: Project → Normalize → Reshape → RoPE.
      • OLMO2 MHA instead of GQA. OLMO2-32B model use GQA.
    • OLMO2RotaryEmbedding: The Rope Module reuse the existing ReferenceRotaryEncoder module.
  • An integration test is added to ensure the output is consistent with HF Transformers. The integration test has passed for the 1B model.

Note:
OLMO2MultiheadAttention inherits from StandardMultiheadAttention (marked @final)
because the only difference is the order of normalization in _project_q() and _project_kv().
Reimplementing the entire class would duplicate ~150 lines of boilerplate code. Right now, the type checker warning is suppressed.

Fixes #1402

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement OLMO2/3 models support in Fairseq2

2 participants