KEP-2170: Create LLM training runtime for Llama 3.1 8B #2212

andreyvelich · 2024-08-14T15:31:40Z

Related: #2170

Once we implement storage initializers, trainers, and controllers, we should add the LLM training runtimes.
We can start with runtime for Llama 3.1 8B.

https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

/area runtime

Electronic-Waste · 2024-11-18T04:56:10Z

/assign

I can help with this. Please let me know if you have different plans @kubeflow/wg-training-leads .

andreyvelich · 2024-11-18T11:31:43Z

Thank you, Shao!
However, we need to work on the LLM Trainer before we add the post-training runtimes: #2321

Electronic-Waste · 2024-11-18T11:51:49Z

Thanks for pointing this out, Andrey!

Shall I unassign myself since this issue is related to #2321 ?

andreyvelich · 2024-11-18T12:54:43Z

If you could also help us with #2321 that would be great!
We have a few ideas with @saileshd1402, but we still investigate on how we can build that Trainer to support different LLMs and datasets.

Electronic-Waste · 2024-11-18T13:44:46Z

Sure, I'm glad to hear that I can help with #2321 !

github-actions · 2025-02-16T15:05:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Electronic-Waste · 2025-02-17T03:33:51Z

/remove-label lifecycle/stale

google-oss-prow · 2025-02-17T03:33:54Z

@Electronic-Waste: The label(s) /remove-label lifecycle/stale cannot be applied. These labels are supported: tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, lifecycle/needs-triage. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label lifecycle/stale

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Electronic-Waste · 2025-02-17T03:34:11Z

/remove-lifecycle stale

andreyvelich added this to KEP-2170: Kubeflow Training V2 API Aug 14, 2024

google-oss-prow bot added the area/runtime label Aug 14, 2024

andreyvelich mentioned this issue Aug 28, 2024

KEP-2170: Kubeflow Trainer V2 API #2170

Open

19 tasks

andreyvelich changed the title ~~KEP-2170: Create LLM training runtime for Llama 2 7b~~ KEP-2170: Create LLM training runtime for Llama 3.1 8B Oct 26, 2024

google-oss-prow bot assigned Electronic-Waste Nov 18, 2024

Electronic-Waste moved this from Todo to In Progress in KEP-2170: Kubeflow Training V2 API Nov 18, 2024

Electronic-Waste mentioned this issue Jan 23, 2025

KEP-2401: Kubeflow LLM Trainer V2 #2401

Open

github-actions bot added the lifecycle/stale label Feb 16, 2025

google-oss-prow bot removed the lifecycle/stale label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-2170: Create LLM training runtime for Llama 3.1 8B #2212

KEP-2170: Create LLM training runtime for Llama 3.1 8B #2212

andreyvelich commented Aug 14, 2024 •

edited

Loading

Electronic-Waste commented Nov 18, 2024

andreyvelich commented Nov 18, 2024

Electronic-Waste commented Nov 18, 2024

andreyvelich commented Nov 18, 2024

Electronic-Waste commented Nov 18, 2024

github-actions bot commented Feb 16, 2025

Electronic-Waste commented Feb 17, 2025

google-oss-prow bot commented Feb 17, 2025

Electronic-Waste commented Feb 17, 2025

KEP-2170: Create LLM training runtime for Llama 3.1 8B #2212

KEP-2170: Create LLM training runtime for Llama 3.1 8B #2212

Comments

andreyvelich commented Aug 14, 2024 • edited Loading

Electronic-Waste commented Nov 18, 2024

andreyvelich commented Nov 18, 2024

Electronic-Waste commented Nov 18, 2024

andreyvelich commented Nov 18, 2024

Electronic-Waste commented Nov 18, 2024

github-actions bot commented Feb 16, 2025

Electronic-Waste commented Feb 17, 2025

google-oss-prow bot commented Feb 17, 2025

Electronic-Waste commented Feb 17, 2025

andreyvelich commented Aug 14, 2024 •

edited

Loading