-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Draft: autocompletion rework and granite completion fixes #5543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If the user focuses away from the window, the current inline completion is cleared. When the user comes back the completion is not regenerated. This commit addresses that problem by explicitly triggering inline completions on focus and tab changes.
✅ Deploy Preview for continuedev ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
We currently only try to provide inline completion if the suggestion widget is active, if at least 4 characters match the popup. The problem is that suggestion widget doesn't have a lot of smarts to figure out what to suggest, it mainly just picks from an alphabetically sorted list, so assuming its suggestion is what the user wants is dubious behavior unless the user has somehow made some motions to indicate the suggestion widget selection is useful. But, actually, the suggestion widget can be influenced by the inline completion hint, which comes from a smart LLM, so rather than influencing the inline completion from the suggestion widget, we should really be influencing the suggestion widget from the inline completion. As a first step toward achieving that goal, this commit reverts the 4 character logic that was added in commit 304056d A subsequent commit will work reversing the dependencies between inline completions and the suggestion widget.
Managing and integrating chunks of source code into the editor window is easy to get wrong, especially if there is overlap between what's there and what's getting integrated. To help facilitate that kind of work, this commit adds a new SourceFragment class. It encapsulates a chunk of code and provides ways to access it from the top and the bottom, and ways to split it into an array of lines, optionally trimming whitespace and blank lines in the process. SourceFragment also comes with a getRemainingCompletion method that determines if the bottom of the fragment matches the top of the passed in fragment. It then returns what's not yet integrated. There is also a getAsTruncatedFragment function that will chop off the specified tail of the fragment, optionally ignoring whitespace.
When the user hits tab to accept an inline completion the completion provider that provided the completion gets notified via its accept method. There are other parts of the code that would ideally like to know when the completion provider accept method is called. This commit adds a VSCode-like onAccept function to register disposable accept listeners.
Now that the core completion provider has disposable resources, we need to explicitly clean those up when the extension completion provider is disposed. This commit adds a call to do that.
VSCode currently has no way to disable the intellisense suggestion widget from popping up. We need tight control over when it can show and can't show, so it doesn't badly influence the autocomplete output. This commit adds 3 new commands: continue.disableSuggestionWidget continue.enableSuggestionWidget continue.isSuggestionWidgetDisabled That provide the ability to inhibit the suggestion widget when necessary. Under the hood, they're implemented by changing out the default type handler to a simpler one that merely moves text to the editor buffer, bypassing the completion provider logic.
VSCode really wants inline completion providers to generate completion items faster than we can realistically provide, given the LLM may take the order of seconds to come up with a reply. In some cases VSCode expects a response in less than 80ms. In order to cope, this commit makes the inline completion provider return null immediately, while working on the results in the background. Once they come in, it tells VSCode to try again. This commit also adds an onAccept listener to know when to clean up the inline completion after the user has green lighted it.
Right now the suggestion widget will show up before the inline completion is processed. This causes a couple of problems: 1. It is not informed by the results of an LLM so its suggestions are often off base. 2. It affects the editor buffer so can mislead the LLM on what sort of inline completions to provide. This commit flips things on their head: We now inhibit the suggestion widget until the inline completion is generated and even then, only uninhibit the the suggestion widget if it has at least one suggestion that is compatible with the inline suggestion. The overarching idea is that the LLM is the "smart" output, so it should be what drives completion, not the context-naïve alphabetically ordered list of completions provides by intellisense.
We want the inline completion to inform the suggestion widget, not the other way around, so we shouldn't use the suggestion widget suggestion when figuring out the inline completion. This commit: - Stops putting the suggestion widget suggestion in the prefix fed to the LLM - Stops including the suggestion widget suggestion in the AutocompleteInput - Stops including the suggestion widget suggestion in the range calculations for the inline completion item. - Drops the willDisplay function that checks if an inline completion matches the current suggestion and ditches the inline completion when there is a mismatch.
We're no longer using the suggestion widget output as part of our inline completion generation, so we no longer need to replace a range of text the user already typed with something different. This commit changes the ranges to be zero width, which means "insert at position exactly".
The granite models sometimes fail to completely follow instructions in two ways: 1. They'll often repeat the bit of the prefix on the same line as the completion. 2. They'll sometimes repeat the suffix and then even go further and hallucinate code after the suffix. The code already has a mitigation in place for 1., but nothing to handle 2. Furthermore, the mitigation for 1. is open coded, since it predates SourceFragement. This commit changes the code over to use SourceFragment and at the same time, addresses 2 by chopping off the spurious completion text at the suffix start.
5b5c08a
to
6b9a76e
Compare
So the code seems to work fine with qwen2.5-coder and claude-sonnet-3.7. codestral has some whitespace issues, but I think they may be specific to codestral. |
Some LLM models (e.g., granite 3.3) run past the suffix in their completion output when using fill in the middle support. This commit adds a way for an AutocompleteTemplate to generate its completion options (e.g., it's stop token sequences) dynamically. Now completionOptions can be a function or string, in the same way template can be a function or string.
6b9a76e
to
304687e
Compare
const truncatedCompletion = completionFragment.getAsTruncatedFragment({ | ||
suffix: suffixFragment, | ||
ignoreWhitespace: true | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, digging through the code with @panyamkeerthana today, i don't think this should be necessary because the same thing should be happening at a different layer. the StreamTransformPipeline does this:
charGenerator = stopAtStartOf(charGenerator, suffix);
here:
charGenerator = stopAtStartOf(charGenerator, suffix); |
So I may just be able to drop this commit entirely, or potentially drop this commit and tweak the implementation of stopAtStartOf
if it's not engaging for some reason. will do some testing.
// The granite models don't always stop at the suffix and instead just keep going, so extract the first non-whitespace | ||
// line of the suffix and add it as a stop token sequence | ||
const suffixFragment = new SourceFragment(suffix); | ||
const [firstLineOfSuffix = undefined] = suffixFragment.head(1, { | ||
ignoreWhitespace: true, | ||
}); | ||
|
||
const stopTokenSequences = [ | ||
...staticStopTokenSequences, | ||
firstLineOfSuffix, | ||
].filter((sequence) => sequence !== undefined); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that I know the design is to stop at the suffix for all LLMs, i should probably drop this and instead put this commit back on the branch:
commit 1cb51eaf08252c83f41da1760ed1d1591cce0a86
Author: Ray Strode <[email protected]>
Date: Mon Apr 28 15:28:46 2025 -0400
autocomplete: Use suffix as stop token sequence
Some LLM models (e.g., granite 3.3) run past the suffix in
their completion output when using fill in the middle support.
This commit adds the suffix as a stop token sequence to prevent
that from happening.
diff --git a/core/autocomplete/templating/index.ts b/core/autocomplete/templating/index.ts
index 9722ee328..27e91227f 100644
--- a/core/autocomplete/templating/index.ts
+++ b/core/autocomplete/templating/index.ts
@@ -5,4 +5,5 @@ import { AutocompleteLanguageInfo } from "../constants/AutocompleteLanguageInfo"
import { HelperVars } from "../util/HelperVars";
+import { SourceFragment } from "../../util/SourceFragment";
import { SnippetPayload } from "../snippets";
import {
@@ -111,9 +112,14 @@ export function renderPrompt({
);
- const stopTokens = getStopTokens(
+ // Some models don't stop at the suffix and just keep going, so extract the first line
+ // of the suffix and add it as a stop token sequence
+ const suffixFragment = new SourceFragment(suffix);
+ const [suffixStart = ""] = suffixFragment.head(1, { ignoreWhitespace: true });
+
+ const stopTokens = [...getStopTokens(
completionOptions,
helper.lang,
helper.modelName,
- );
+ ), suffixStart];
return {
That way LLMs aren't generating content that we just then throw away.
endLineLimit: bestMatchStartOffset, | ||
}; | ||
|
||
// Truncate everything before the overlap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should say "Truncate to the non-overlapping section" or something
} | ||
|
||
// If the source fragment is empty, we should just return the entire reference fragment | ||
// If true,matchingRange at this point covers the entire reference fragment, but no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space after comma. should probably mention that not specifying the range when it is the entire fragment hits a fast path.
granite 3.3 has fill in the middle support, so we shouldn't use hole filler with it.
getRepoName seems to sometimes take more than 10 seconds when it gets called. This defeats autocomplete. For now, until I can investigate, just get rid of the call.
304687e
to
c78adbc
Compare
"type", | ||
async (args) => { | ||
const editor = vscode.window.activeTextEditor; | ||
if (editor) { | ||
await editor.edit((editBuilder) => { | ||
editor.selections.forEach((selection) => { | ||
editBuilder.insert(selection.active, args.text); | ||
}); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@owtaylor hammered on this pr last week and noticed an odd issue... If the user is typing code and hits enter, VS Code no longer autoindents on the next line. I suspect the cause is this commit. We're no longer inserting the text into the editor from a "keyboard" source, I believe, so it turns off auto indentation logic. We either need to figure out how to get that logic back on, or drop this hack and come up with a different way of disabling the suggestion widget (change user settings? ugh)
So aside from autoindent there's a bigger problem, despite LLMs generally being smarter than intellisense, there are points when there's not enough "there, there" for it to anticipate what the user wants. In those cases, intellisense still outshines the LLM because it puts what to do next in the user's hands. So I have a new idea which I plan to experiment with soon. Rather than inhibit the suggestion widget until the LLM is ready, allow it to show at any point, but add a place holder item to the top of it that represents the LLMs (pending) response. In this way we shouldn't need hacks that break autoindent, we don't hide the suggestion widget in cases the user actually would prefer it and we still provide way for the user to take control while they wait. I'll see if I can whip this branch into something that meshes with that workflow. See also some small quick wins Owen and Keerthana did in Granite-Code#42 . When I update this branch i'll make sure it's on top of those changes. |
This is a draft merge request that reworks how autocomplete behaves with regard to the intellisense suggestion widget.
The core idea is that the LLM is the smart thing, so it should be informing the suggestion widget, not the other way around. The top item in the suggestion widget is frequently something the user doesn't want, so including it in autocomplete can make the autocomplete text go off in the wrong direction.
If we defer showing the suggestion widget until the LLM has come back, then we can get a good inline complete and still show the suggestion pop up, but with the LLM suggested menu item preselected.
We can also simplify the completion item range calculation, by doing this, because we're only ever inserting new code, not replacing existing code.
This is a draft pull request, because I've only tested it with granite models at the moment, and there are some hacks for granite models at end of the patchset.
Once I do some more testing (with e.g. qwen2.5-coder or some other non-granite model), and drop some of the bottom commits, I'll drop draft status.