Serializing the running model context and Cache-Augmented Generation (CAG) #15425
RomanKryvolapov
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am working on an Android application that uses several backends, one of them is llama.cpp.
https://github.com/RomanKryvolapov/Local_AI_Launcher
I would like to add Cache-Augmented Generation (CAG) to the application, but I have not found a reliable way to do this.
Simple copying through the buffer takes a lot of time.
Please tell me, maybe there are ready-made solutions compatible with Java JNI.
Is there any future support for NPU?
Google is currently working on such a project
https://ai.google.dev/edge/litert/next/npu
https://github.com/google-ai-edge/LiteRT-LM
Beta Was this translation helpful? Give feedback.
All reactions