-
Notifications
You must be signed in to change notification settings - Fork 475
fix: Allow externally managed contexts with LLamaEmbedder #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: Allow externally managed contexts with LLamaEmbedder #1263
Conversation
Fixes SciSharp#1259 and potentially SciSharp#1247 with changes to how the caller manages the LLamaEmbedder.
|
@martindevans need some help with re-instating the old code that used to reset the kv_cache values. https://github.com/bmazzarol-bunnings/LLamaSharp/blob/test/context-cost-24/LLama/LLamaEmbedder.cs#L73-L74 Or if it is not required then all good. It looked important. |
|
Sorry for the delay on reviewing this. The only issue I see with the current approach is that the embedder uses |
|
@martindevans I have attempted something, make it clear if this is too far outside what you expected. I have a simple sequence id manager that ensures that a single embedding process against the embedding instance will have exclusive use of a sequence id within the range of max sequence id. I then clear the underlying memory associated with the sequence id and return it back for re-use once the embedding operation is complete. Questions,
|
|
Wow that's more than I expected, in a good way though! Should the sequence manager perhaps be moved into the LLamaContext? That way LLamaContext can expose |
|
@martindevans I have integrated the sequence manager into the LLamaContext. I integrated the return into a dispose implementation. Let me know what you think. I still need to test out the sequence memory usage scenarios to ensure non-sequential sequence id values can be used without issue. |
Fixes #1259 and potentially #1247 with changes to how the caller manages the LLamaEmbedder.