Loss masking for distillation #250

jlamypoirier · 2025-05-01T18:06:02Z

✨ Description

Pass a loss mask to kwargs so we it can be used for distillation loss, aka cross-entropy from logits

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

oleksost · 2025-05-02T12:53:44Z

IN the documentation, here, the parameter use_loss_masking_spans should be under batch and not under sampling, right?

jlamypoirier · 2025-05-02T15:33:49Z

@oleksost Yes, looks like we forgot to update

oleksost · 2025-05-02T17:05:26Z

fast_llm/functional/cross_entropy.py

    if group:
        Assert.eq(implementation, CrossEntropyImpl.fused)
-        return fused_cross_entropy_forward_backward(
+        return _fused_cross_entropy_forward_backward(
            logits, target, grad_output, logits_scale_factor, target_format, group
        )
    else:


In return _CROSS_ENTROPY_IMPLEMENTATIONS[implementation] we also need to pass loss mask?

oleksost · 2025-05-02T17:06:05Z

fast_llm/functional/cross_entropy.py

    if group:
        Assert.eq(implementation, CrossEntropyImpl.fused)
-        return fused_cross_entropy_forward_backward(
+        return _fused_cross_entropy_forward_backward(


Need to pass loss_mask?

…ocessing

fast_llm/functional/cross_entropy.py

oleksost · 2025-05-05T16:49:23Z

THis assert should be removed, no?

oleksost

Works well for me.

jlamypoirier added 30 commits March 26, 2025 00:10

stuff

5137757

Merge remote-tracking branch 'origin/main' into config_updates

f0cb32a

Update pretrained config

f26010e

stuff

b930a39

Merge branch 'config_updates' into update_pretrained_config

918a7a8

fixes

8117c47

fix

1c995d3

Merge branch 'main' into config_updates

3f90475

Merge branch 'config_updates' into update_pretrained_config

e389058

fixes

506fe92

fixes

971d3ef

Tests wip

6bf20cb

misc

c13fb19

tests

a20fcec

Merge branch 'main' into config_updates

9af26a7

Tests, fixes, remove tuple format

9af372d

fix

dded00a

Merge remote-tracking branch 'origin/main' into config_updates

42d5ca4

fix

986f9f3

Merge branch 'config_updates' into update_pretrained_config

5abc087

fixes

8e3e795

fixes

da6eb7b

Merge branch 'main' into config_updates

67e08aa

Merge branch 'config_updates' into update_pretrained_config

a09e6f3

fix

baad705

Test, fixes

b702837

Knowledge distillation, fix cross-entropy

a8684f8

Fixes, distillation

b781729

fixes

db6504b

Merge remote-tracking branch 'origin/main' into config_updates

7c2933a

jlamypoirier added 3 commits May 1, 2025 14:04

Loss masking for distillation

97ba9d4

test, misc

231d5d8

Merge branch 'reference_model_preprocessing' into distillation_loss_mask

d7922af

jlamypoirier added 3 commits May 2, 2025 11:59

fixes

005e623

fix

cad951a

Merge branch 'reference_model_preprocessing' into distillation_loss_mask

40970ec

oleksost reviewed May 2, 2025

View reviewed changes

jlamypoirier added 4 commits May 2, 2025 13:14

fix

9d95064

Merge remote-tracking branch 'origin/main' into reference_model_prepr…

2c96abb

…ocessing

fix

935c470

Merge remote-tracking branch 'origin/main' into reference_model_prepr…

d82ddbf

…ocessing

Base automatically changed from reference_model_preprocessing to main May 2, 2025 17:42

jlamypoirier added 2 commits May 2, 2025 13:42

Merge branch 'reference_model_preprocessing' into distillation_loss_mask

6949c49

Merge remote-tracking branch 'origin/main' into distillation_loss_mask

9c105e7

oleksost reviewed May 2, 2025

View reviewed changes

fast_llm/functional/cross_entropy.py Show resolved Hide resolved

jlamypoirier added 2 commits May 2, 2025 15:08

fixes

ae4d111

fixes

deb7ce6

jlamypoirier marked this pull request as ready for review May 2, 2025 22:00

jlamypoirier requested review from oleksost and tscholak May 2, 2025 22:00

fix

2db740b

oleksost approved these changes May 7, 2025

View reviewed changes

jlamypoirier added 2 commits May 7, 2025 15:01

Merge remote-tracking branch 'origin/main' into distillation_loss_mask

4a9a61e

fix

9c10df9

jlamypoirier merged commit f08ac90 into main May 7, 2025
2 checks passed

jlamypoirier deleted the distillation_loss_mask branch May 7, 2025 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss masking for distillation #250

Loss masking for distillation #250

jlamypoirier commented May 1, 2025

oleksost commented May 2, 2025 •

edited

Loading

jlamypoirier commented May 2, 2025

oleksost May 2, 2025 •

edited

Loading

oleksost May 2, 2025

oleksost commented May 5, 2025 •

edited

Loading

oleksost left a comment

Loss masking for distillation #250

Loss masking for distillation #250

Conversation

jlamypoirier commented May 1, 2025

✨ Description

🔍 Type of change

oleksost commented May 2, 2025 • edited Loading

jlamypoirier commented May 2, 2025

oleksost May 2, 2025 • edited Loading

Choose a reason for hiding this comment

oleksost May 2, 2025

Choose a reason for hiding this comment

oleksost commented May 5, 2025 • edited Loading

oleksost left a comment

Choose a reason for hiding this comment

oleksost commented May 2, 2025 •

edited

Loading

oleksost May 2, 2025 •

edited

Loading

oleksost commented May 5, 2025 •

edited

Loading