Skip to content

Conversation

@Rose22
Copy link

@Rose22 Rose22 commented Nov 21, 2025

as we've discussed in discord,

this change alters the way koboldcpp determines what tool to use in some pretty drastic ways that vastly improve it's accuracy, especially with small LLM's. instead of doing one request to the LLM to prompt it and ask it if a tool should be used, and forcing it down to 5 tokens with grammar forcing a simple "yes/no" answer.. it now gives the LLM full freedom to write out it's decision and why it took that decision, with it's final decision text always added at the end of the response. then we take that response and use the yes/no grammar on that instead!

(this is a redo of the pull request because due to inexperience with git and github i messed up my branch with too many merges with upstream)

@LostRuins
Copy link
Owner

alright as requested i moved the tool list into memory so it won't be hurt by context shifting.

did a bit of tidying of the prompts but no real functional changes except for removing If there was no final decision stated, default to no. which I don't think is necessary.

if everything works good for you we can merge this

@Rose22
Copy link
Author

Rose22 commented Nov 22, 2025

did a bit of tidying of the prompts but no real functional changes except for removing If there was no final decision stated, default to no. which I don't think is necessary.

the reason i did this is because the llm would sometimes output something that wasn't reasoning but basically just a standard answer as if it was inside a conversation. so i decided to mitigate that using json... an added benefit to that is that we can skip most of the other calls to the LLM and get it down to just one!

it will need further testing again, but, i believe this is better than before

@LostRuins
Copy link
Owner

gave your new method a try and i think it does work better than before, I have added the JSON enforcement and removed the non-json fallback as its no longer triggerable.
from my tests its working just as good, if not better than before, single-pass is also faster
do take a look and see if you are happy with this version or if something is still lacking

i used your prompts but added one more for "required" mode instead of "auto" (as per openai spec)
also compacted the text down to single line as it was kinda messy (visual change only, same prompt)

Secondary Issue: I noticed some issues with the "always send tools at the start" approach you swapped to previously. this is not an issue with tool calls per-se but it's affecting the quality of the no-tool output.

Previously: When tools are not called, they are excluded.

Now: when tools are not called, they are still included at the top

Result: The AI talks about tools in response to simple questions. For example if I have a tool called "get_menu" and i ask it "what is the meaning of life", it tends to reply "I cannot answer that question as I only have access to the food menu, which does not include the philosophy of life".
if a tool call is NOT needed... we might have to remove tools from context like before in order to avoid poisoning the AI's normal replies

Copy link
Owner

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging

@LostRuins LostRuins merged commit eeb7363 into LostRuins:concedo_experimental Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants