Weird issue with Completion + CancelRequests #336

psocolovsky · 2025-05-04T21:52:44Z

psocolovsky
May 4, 2025

I am not sure if there is something I am not doing correctly.

basically seems that Completion and CancelRequests must be strongly serialized to mitigate an issue in the current version (LLMUnity v2.5.0). I am issuing commands from MainThread which should serialize operations already.

when I call cancel the current completion blocks inside the native method:

await Task.Run(() =>
llmlib.  **LLM_Completion** (LLMObject, json, streamWrapper.GetStringWrapper())
);

the issue disappears if I change the code to this (which I know already is not how it is supposed to work, but it touches the hotspot):

            await Task.Run(
                        () =>
                        {
                            lock (staticLock)
                                {
                                    llmlib.LLM_Completion(LLMObject, json, streamWrapper.GetStringWrapper());
                                }
                        });
[...]
        public void CancelRequest(int id_slot)
        {
            lock (staticLock)
                {
                    AssertStarted();
                    llmlib?.LLM_Cancel(LLMObject, id_slot);
                    CheckLLMStatus();
                }
        }

without enforcing a strong serialization it seems that the Task Run executes in another thread overlapping the execution of LLM_Cancel

without this guard the editor hangs forever after stopping which calls Destroy; Destroy operation completes correctly but the Completion Task does not return. this leads to the classic "Reloding Domain" stuck issue in Unity editor

I noticed that Cancel implementation in the native side is trivial, however completion is not and may take several frames to complete, which explains the intention behind await Task.Run()
the implementation suggest that it should be fully reentrant and maybe at some point it was, but current version is not, so it feels like a regression bug

use lock (staticLock) as I did to test, makes no sense, when you call cancel you want the completion to terminate asynchronously. otherwise a cancel will generate a stutter which is not acceptable

what do you think? Am I the only one seeing this? Am I using wrong version?

side notes: I am using multiple LLM, but I ruled out this as a factor, it happens also with one single LLM object. let me know what information can help reproduce this.

also: the code that I am using seems to work way better, besides the exception handler in the native side is using a global
sigjmp_buf point;

which would require a strong static mutex serialization like staticLock in my test. I believe sigjmp_buf point should be part of the LLM class in undreamai.h so that each call would not share this symbol for the case of concurrent slot / concurrent LLM objects in case they fail simultaneously or a catastrophic error will happen, just my 2 cents contribution I know it is a rare condition.

I can do some tests on an empty project for a test case

Answered by amakropoulos

May 5, 2025

I just fixed it and released a new version.
The solution is exactly what you suggested, using the same slot release process for both handle_cancel_action and stop_service (link)!

Yes please, I would love such contributions because these things get a bit out of my depth to be honest.
Again I really appreciate this issue and your investigation!

View full answer

psocolovsky · 2025-05-05T10:02:58Z

psocolovsky
May 5, 2025
Author

sorry for bringing up sigjmp_buf point, I thought just to mention that for record keeping; I see is far from trivial to do a correct implementation for a fully reentrant code. I am currently interrested in understanding the first issue, why that LLM_Completion does not return after cancel is being called.

0 replies

amakropoulos · 2025-05-05T10:09:28Z

amakropoulos
May 5, 2025
Maintainer

Thank you, wow, that's a really deep investigation ⭐ !
I'm updating llama.cpp and actually identified this issue only on Thursday.
I'm looking into it as we speak. It is quite tricky because there are some many low-level things happening and on top of it Unity.
It is low severity because it happens only when stopping and only at the Editor, but still needs a fix.
I'll keep you updated on the progress.

8 replies

psocolovsky May 5, 2025
Author

well, I did not investigate cpp because I currently cannot compile it, I may try later... handle_cancel_action and stop_service cancel tasks in a different way. which is something I don't understand; anyway it would take me days to understand llama.cpp details, I got a vague clue but I see there are too many things there.

Also, besides cancel:
Starting and setup up LLM seems complete, but destruction and completion look less convincing for atomicity (if this is a word); since you want it asynchronous and also want to avoid a mutex:

await Task.Run makes it fully asynchronous, calling destroy and completion in any order from c# may execute the functions in a different order in native lib, adding a mutex in c# will be either useless or an overkill
because calling destroy and completion happen at the same time and destroy waits for completion, but not the opposite then there is a chance, 1 in a million, that a late completion starts
you can mark the llm as being destroyed, completion will refuse to start if destruction is in progress; this test must happen after the completion made its state change visible to destroy (so they both interlock atomically); since we work fully asynchronously you may have to use atomic lib
starting the llm will clean the beingDestroyed marking. not sure if ctx_server.queue_tasks.running can work as a guard, it seems it won't be atomic (it would catch 99% of the cases, but that is the problem of race conditions, murphy's law teaches in the hard way)
fixing this will likely be a dozen lines of code, however this might be time consuming, so I understand it has never been a priority.
if I manage to compile this I may try to contribute

amakropoulos May 5, 2025
Maintainer

I just fixed it and released a new version.
The solution is exactly what you suggested, using the same slot release process for both handle_cancel_action and stop_service (link)!

Yes please, I would love such contributions because these things get a bit out of my depth to be honest.
Again I really appreciate this issue and your investigation!

Answer selected by psocolovsky

amakropoulos May 5, 2025
Maintainer

To compile LlamaLib you can have a look at the build_library.yaml GitHub Action ,or copy the commands from the completed Action (e.g. the latest release can be found here)

psocolovsky May 5, 2025
Author

ah man! you know what we can do about Destroy Completion (but also any other simlar call?)

        public void Destroy()
        {
            lock (staticLock)
                lock (startLock)
                {
                    try
                    {
                        if (llmlib != null)
                        {
                            if (LLMObject != IntPtr.Zero)
                            {
                                var handle = LLMObject;
                                LLMObject = IntPtr.Zero;    // invalidate the object so that concurrent calls will be rejected during destruction

                                llmlib.LLM_Stop(handle);
                                if (remote) llmlib.LLM_StopServer(handle);
                                StopLogging();
                                llmThread?.Join();
                                llmlib.LLM_Delete(handle);
                            }
                            llmlib.Destroy();
                            llmlib = null;
                        }
                        started = false;
                        failed = false;
                    }
                    catch (Exception e)
                    {
                        LLMUnitySetup.LogError(e.Message);
                    }
                }
        }

other calls that happen at the same time will pass zero, and the llamalib should detect this easy (I just checked it does not test for zero/null), but it could be a valid convention to mitigate this (very rare) race condition, and using very simple approach rather then hard to maintain code, what do you think?

although this is not guaranteeing atomic operation it will mitigate the potential issue increasing stability

psocolovsky May 5, 2025
Author

To compile LlamaLib you can have a look at the build_library.yaml GitHub Action ,or copy the commands from the completed Action (e.g. the latest release can be found here)

will do, thanks

amakropoulos May 5, 2025
Maintainer

good idea, I'll try it out!

Uh oh!

Weird issue with Completion + CancelRequests #336

Uh oh!

Uh oh!

psocolovsky May 4, 2025

Replies: 2 comments · 8 replies

Uh oh!

psocolovsky May 5, 2025 Author

Uh oh!

Uh oh!

amakropoulos May 5, 2025 Maintainer

Uh oh!

Uh oh!

psocolovsky May 5, 2025 Author

Uh oh!

amakropoulos May 5, 2025 Maintainer

Uh oh!

amakropoulos May 5, 2025 Maintainer

Uh oh!

psocolovsky May 5, 2025 Author

Uh oh!

psocolovsky May 5, 2025 Author

Uh oh!

amakropoulos May 5, 2025 Maintainer

psocolovsky
May 4, 2025

Replies: 2 comments 8 replies

psocolovsky
May 5, 2025
Author

amakropoulos
May 5, 2025
Maintainer

psocolovsky May 5, 2025
Author

amakropoulos May 5, 2025
Maintainer

amakropoulos May 5, 2025
Maintainer

psocolovsky May 5, 2025
Author

psocolovsky May 5, 2025
Author

amakropoulos May 5, 2025
Maintainer