how to finetune PLM with Deepspeed or other method to reduce GPU mem usage

I find that , for [apps.plm.train](https://github.com/facebookresearch/perception_models/blob/main/apps/plm/train.py), PLM has been applied FSDP. It can not be fine-tuned with two 80G GPU mem (OOM error), setting `tp_size=2` and `batch_size=1` . the code can be trained with  Deepspeed?