Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOUNTY - $100] Parallelise Model Loading #202

Open
AlexCheema opened this issue Sep 5, 2024 · 3 comments
Open

[BOUNTY - $100] Parallelise Model Loading #202

AlexCheema opened this issue Sep 5, 2024 · 3 comments

Comments

@AlexCheema
Copy link
Contributor

  • Right now we already parallelise model downloads which works great and speeds things up a lot
  • However, loading the model into memory can also be slow for large models, and right now exo will do this sequentially one node at a time
  • In a similar way to model downloads, we should parallelise model loading into memory
  • This should probably expose some functionality of ensure_shard in the abstract InferenceEngine, maybe something like preload_model
@AlexCheema AlexCheema changed the title [Bounty - $100] Parallelise Model Loading [BOUNTY - $100] Parallelise Model Loading Sep 5, 2024
@aybanda
Copy link

aybanda commented Sep 9, 2024

@AlexCheema submitted PR #211
Hope it resolves your issue

If you feel like supporting me:

https://buymeacoffee.com/aybanda

@vovw
Copy link

vovw commented Oct 17, 2024

tried to take a turn at this #360
lmk if there are any req changes

@minhdv82
Copy link

Is this bounty still open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants