DPO #223

tobyzl2 · 2025-04-03T22:52:42Z

Description

This PR introduces the implementation of Direct Preference Optimization (DPO) training on Fast-LLM. DPO enhances model fine-tuning by directly incorporating user preferences into the optimization process, ensuring that the model better aligns with the desired output behavior.

Closes #209

Throughput Numbers (reference free - 04/14/25)

Model	Tok/Sec/GPU	Seq Len	Num Steps	# GPUs
Mistral 7b	10168.015	8192	400	32
Apriel 5b SFT	14722.243	8192	6000	32

Throughput Numbers (with reference model - 04/29/25)

Model	Tok/Sec/GPU	Seq Len	Num Steps	# GPUs
Mistral 7b	7837.249	8192	400	32
Apriel 5b SFT	11331.242	8192	6000	32

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

introduced DPO training with simplified DPO loss
allows users to configure DPO parameters (beta value)
allows option for packing to be turned off (currently dpo is implemented without packing)

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

tscholak

this looks great already, a few functional tests would be good.
maybe extend

Fast-LLM/tests/common.py

Line 278 in 5ba1f0f

def get_test_dataset(

fast_llm/data/dataset/gpt/memmap.py

fast_llm/data/dataset/gpt/sampled.py

fast_llm/data/tokenizer.py

fast_llm/functional/dpo.py

fast_llm/layers/language_model/head.py

…by/dpo

fast_llm/data/dataset/gpt/memmap.py

jlamypoirier

I'm worried too many unrelated things need to be set right for things to work (dataset format, sampling config, loss function), and if not things will crash too late with a cryptic error. Let's try to simplify this a bit.

jlamypoirier · 2025-04-11T04:36:44Z

fast_llm/data/dataset/gpt/config.py

+        desc="Read preference loss masking spans from the dataset.",
+        hint=FieldHint.feature,
+    )
+    enable_packing: bool | None = Field(


What's packing?

Made some changes to remove this flag, packing would be having multiple documents in the same sequence in pretraining but for DPO wanted to have a way to pad to the end of sequence instead.

fast_llm/data/dataset/gpt/memmap.py

jlamypoirier · 2025-04-11T04:39:26Z

fast_llm/data/dataset/gpt/config.py

@@ -57,6 +57,16 @@ class GPTSamplingConfig(SamplingConfig):
        desc="Read loss masking spans from the dataset.",
        hint=FieldHint.feature,
    )
+    use_preference_loss_masking_spans: bool | None = Field(


Does it make sense to have both normal and preference loss masking spans?

Also this might be better suited for GPTSamplingData. This way the trainer could set this value based on the training objective and avoid the complicated relationship between the parameters.

Just made some changes to remove this extra flag, it's reading it automatically from the flag in the saved memmap dataset now.

jlamypoirier · 2025-04-11T04:44:55Z

fast_llm/functional/config.py

+
+class LossFunctionType(str, enum.Enum):
+    cross_entropy = "cross_entropy"
+    dpo = "dpo"


Missing newline (please make sure to enable pre-commit)

fast_llm/layers/language_model/config.py

jlamypoirier · 2025-04-11T05:03:41Z

fast_llm/layers/language_model/head.py

+            loss, grad = compute_simplified_dpo_loss(
+                logits.flatten(0, -2),
+                labels,
+                kwargs[LanguageModelKwargs.chosen_spans],


How do we ensure it's there? Seems like this will crash unless:

All datasets have both chosen and rejected spans.

The sampling config for all datasets is set to use these spans.

It's a bad idea to wait so late for crash, we should aim to do the check sooner

Yeah I agree, let me see where I can add these checks so that we can detect it earlier.

fast_llm/functional/config.py

fast_llm/data/dataset/gpt/memmap.py

tobyzl2 · 2025-04-14T17:36:35Z

I'm worried too many unrelated things need to be set right for things to work (dataset format, sampling config, loss function), and if not things will crash too late with a cryptic error. Let's try to simplify this a bit.

Yeah this makes sense. Let me try to do a bit of refactoring so from the user perspective they will only have to specify one configuration (something like training objective = dpo) and this will automatically set the other flags without having to manually specify each one.

jlamypoirier

@tobyzl2 are you planning on adding distillation here? I suggest merging this one first since it's almost ready.

fast_llm/data/dataset/gpt/memmap.py

jlamypoirier · 2025-04-29T18:36:32Z

fast_llm/models/gpt/config.py

@@ -196,6 +201,9 @@ def _validate(self) -> None:
        if self.model.base_model.distillation_model is not None:
            # TODO: Support loss masking for distillation?
            assert not self.batch.use_loss_masking_spans
+        assert self.model.base_model.use_dpo_loss == self.batch.use_preference_loss_masking_spans


Seems to make the two parameters redundant

jlamypoirier · 2025-04-29T18:37:42Z

fast_llm/layers/language_model/head.py

        if target is not None:
-            if self._config.distillation_model is None:
+            if self._config.distillation_model is None or self._use_dpo_loss:


Checks for self._use_dpo_loss redundant since it doesn't support distillation?

…by/dpo

jlamypoirier · 2025-05-12T19:29:22Z

Are we planning on merging this?

tobyzl2 · 2025-05-12T22:40:12Z

Are we planning on merging this?

Yes, seeing if we need to wait to merge #255 first @tscholak ?

tscholak · 2025-05-13T12:05:28Z

Let's merge DPO @tobyzl2.
The other PR is blocked atm

Toby Liang and others added 5 commits March 28, 2025 22:35

initial dpo updates

818a162

Merge branch 'main' into toby/dpo

422a78b

dataset changes for dpo

40c96c8

adding dpo loss

f7796d4

Merge remote-tracking branch 'origin/main' into toby/dpo

54b686a

tobyzl2 changed the title ~~Toby/dpo~~ DPO Apr 3, 2025

tobyzl2 requested a review from sohamparikh April 3, 2025 22:55

tobyzl2 added 2 commits April 4, 2025 21:15

packing disabled filter sequennces longer than seq length

3c0199f

disable no packing for legacy sampling

0e1335b

tobyzl2 force-pushed the toby/dpo branch from 5dfd676 to 0e1335b Compare April 4, 2025 22:07

tscholak reviewed Apr 8, 2025

View reviewed changes

jlamypoirier reviewed Apr 8, 2025

View reviewed changes

fast_llm/layers/language_model/head.py Outdated Show resolved Hide resolved

tobyzl2 added 2 commits April 9, 2025 00:24

adding dpo tests

0e09098

Merge branch 'main' of https://github.com/ServiceNow/Fast-LLM into to…

edca385

…by/dpo

tobyzl2 marked this pull request as ready for review April 9, 2025 01:20

small fix

1075176

tscholak reviewed Apr 10, 2025

View reviewed changes

fast_llm/data/dataset/gpt/memmap.py Outdated Show resolved Hide resolved

tobyzl2 added 5 commits April 10, 2025 01:44

span tokenization updates

4156349

enable chosen/rejected text for preparator

9669211

removing assert

257d236

moving dpo loss call

aa8a871

renaming

d08bf4d

jlamypoirier reviewed Apr 11, 2025

View reviewed changes

tobyzl2 added 3 commits April 13, 2025 19:45

padding fix

b410210

dpo config changes

366a20b

memmap version fixes

dca842e

tobyzl2 closed this Apr 14, 2025

tobyzl2 reopened this Apr 14, 2025

removing dpo flags and new sampling class

ca86694

tobyzl2 added 6 commits April 28, 2025 00:32

adding distillation model check

92f28ee

update dpo test cases

2b2515f

FFixing sampled for dpo

bd9142f

test case fixes

4f26100

adding preference logps test case

41cc7fe

small fix

a6950f1

jlamypoirier reviewed Apr 29, 2025

View reviewed changes

tobyzl2 added 8 commits April 30, 2025 00:12

higher mbs fixes

fb9803d

test higher mbs

723f30e

small change

8063a21

updates

c3a8ebb

small changes

db5242f

small changes

ab139ca

remove comments

63041aa

Merge branch 'main' of https://github.com/ServiceNow/Fast-LLM into to…

e1c92f4

…by/dpo

jlamypoirier approved these changes May 1, 2025

View reviewed changes

tobyzl2 added 6 commits May 1, 2025 23:20

maxlen consistency

e60ad62

remove comments

85613f7

refractoring

2742692

Merge branch 'main' of https://github.com/ServiceNow/Fast-LLM into to…

29c9a4b

…by/dpo

fix

8b837c0

fix

16136ac

tscholak mentioned this pull request May 9, 2025

[WIP] GRPO #20

Closed

24 tasks

merge

b64626c

tobyzl2 merged commit 3ac976b into main May 13, 2025
4 checks passed

tobyzl2 deleted the toby/dpo branch May 13, 2025 21:44

jlamypoirier mentioned this pull request May 14, 2025

[bug] AttributeError: 'GPTBaseModelConfig' object has no attribute 'dpo_reference_model' #267

Closed

DPO #223

DPO #223

Uh oh!

Conversation

tobyzl2 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

tscholak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jlamypoirier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tobyzl2 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlamypoirier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlamypoirier commented May 12, 2025

Uh oh!

tobyzl2 commented May 12, 2025

Uh oh!

tscholak commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

tobyzl2 commented Apr 3, 2025 •

edited

Loading

tobyzl2 commented Apr 14, 2025 •

edited

Loading