Skip to content

Conversation

@bmazzarol-bunnings
Copy link

Fixes #1259 and potentially #1247 with changes to how the caller manages the LLamaEmbedder.

Fixes SciSharp#1259 and potentially SciSharp#1247 with changes to how the caller manages the LLamaEmbedder.
@bmazzarol-bunnings
Copy link
Author

@martindevans need some help with re-instating the old code that used to reset the kv_cache values. https://github.com/bmazzarol-bunnings/LLamaSharp/blob/test/context-cost-24/LLama/LLamaEmbedder.cs#L73-L74

Or if it is not required then all good. It looked important.

@martindevans
Copy link
Member

Sorry for the delay on reviewing this.

The only issue I see with the current approach is that the embedder uses LLamaSeqId.Zero for all it's work. If sequence zero is being used for something else (e.g. another embedder sharing the same context) that would be bad!

@bmazzarol-bunnings
Copy link
Author

@martindevans I have attempted something, make it clear if this is too far outside what you expected.

I have a simple sequence id manager that ensures that a single embedding process against the embedding instance will have exclusive use of a sequence id within the range of max sequence id. I then clear the underlying memory associated with the sequence id and return it back for re-use once the embedding operation is complete.

Questions,

  1. Is the clearing code always required? The embedding model did not care, but the generative one did.
  2. Do the sequence numbers always have to be sequential? Don't think my implementation guarantees that (concurrent bag and all that)
  3. Is the clearing of the sequence clearing more than required? The old code looked targeted to KV cache?
  4. I assume once the global interpreter lock goes, embedding can be done in parallel against the LLamaEmbedder

@martindevans
Copy link
Member

Wow that's more than I expected, in a good way though!

Should the sequence manager perhaps be moved into the LLamaContext? That way LLamaContext can expose GetSequence/ReturnSequence methods so that any service that's using the LLamaContext can request an available sequence.

@bmazzarol-bunnings
Copy link
Author

@martindevans I have integrated the sequence manager into the LLamaContext.

I integrated the return into a dispose implementation. Let me know what you think.

I still need to test out the sequence memory usage scenarios to ensure non-sequential sequence id values can be used without issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: GenerateAsync via the IEmbeddingGenerator interface throws ObjectDisposedException on LLama.Native.SafeLLamaContextHandle

2 participants