[BOUNTY - $100] Parallelise Model Loading #202

AlexCheema · 2024-09-05T15:40:48Z

Right now we already parallelise model downloads which works great and speeds things up a lot
However, loading the model into memory can also be slow for large models, and right now exo will do this sequentially one node at a time
In a similar way to model downloads, we should parallelise model loading into memory
This should probably expose some functionality of ensure_shard in the abstract InferenceEngine, maybe something like preload_model

The text was updated successfully, but these errors were encountered:

aybanda · 2024-09-09T11:59:43Z

@AlexCheema submitted PR #211
Hope it resolves your issue

If you feel like supporting me:

vovw · 2024-10-17T21:34:06Z

tried to take a turn at this #360
lmk if there are any req changes

minhdv82 · 2024-11-12T15:15:43Z

Is this bounty still open?

AlexCheema changed the title ~~[Bounty - $100] Parallelise Model Loading~~ [BOUNTY - $100] Parallelise Model Loading Sep 5, 2024

Provide feedback