Serializing the running model context and Cache-Augmented Generation (CAG) #15425

RomanKryvolapov · 2025-08-19T14:54:17Z

RomanKryvolapov
Aug 19, 2025

Hi,
I am working on an Android application that uses several backends, one of them is llama.cpp.

I would like to add Cache-Augmented Generation (CAG) to the application, but I have not found a reliable way to do this.
Simple copying through the buffer takes a lot of time.
Please tell me, maybe there are ready-made solutions compatible with Java JNI.
Is there any future support for NPU?
Google is currently working on such a project