Skip to content

how to finetune PLM with Deepspeed or other method to reduce GPU mem usage #62

@idejie

Description

@idejie

I find that , for apps.plm.train, PLM has been applied FSDP. It can not be fine-tuned with two 80G GPU mem (OOM error), setting tp_size=2 and batch_size=1 . the code can be trained with Deepspeed?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions