Draft: autocompletion rework and granite completion fixes #5543

halfline · 2025-05-06T15:20:21Z

This is a draft merge request that reworks how autocomplete behaves with regard to the intellisense suggestion widget.

The core idea is that the LLM is the smart thing, so it should be informing the suggestion widget, not the other way around. The top item in the suggestion widget is frequently something the user doesn't want, so including it in autocomplete can make the autocomplete text go off in the wrong direction.

If we defer showing the suggestion widget until the LLM has come back, then we can get a good inline complete and still show the suggestion pop up, but with the LLM suggested menu item preselected.

We can also simplify the completion item range calculation, by doing this, because we're only ever inserting new code, not replacing existing code.

This is a draft pull request, because I've only tested it with granite models at the moment, and there are some hacks for granite models at end of the patchset.

Once I do some more testing (with e.g. qwen2.5-coder or some other non-granite model), and drop some of the bottom commits, I'll drop draft status.

If the user focuses away from the window, the current inline completion is cleared. When the user comes back the completion is not regenerated. This commit addresses that problem by explicitly triggering inline completions on focus and tab changes.

netlify · 2025-05-06T15:20:44Z

✅ Deploy Preview for continuedev ready!

Name	Link
🔨 Latest commit	`c78adbc`
🔍 Latest deploy log	https://app.netlify.com/sites/continuedev/deploys/681bb1ab9ac86c0008b34128
😎 Deploy Preview	https://deploy-preview-5543--continuedev.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

We currently only try to provide inline completion if the suggestion widget is active, if at least 4 characters match the popup. The problem is that suggestion widget doesn't have a lot of smarts to figure out what to suggest, it mainly just picks from an alphabetically sorted list, so assuming its suggestion is what the user wants is dubious behavior unless the user has somehow made some motions to indicate the suggestion widget selection is useful. But, actually, the suggestion widget can be influenced by the inline completion hint, which comes from a smart LLM, so rather than influencing the inline completion from the suggestion widget, we should really be influencing the suggestion widget from the inline completion. As a first step toward achieving that goal, this commit reverts the 4 character logic that was added in commit 304056d A subsequent commit will work reversing the dependencies between inline completions and the suggestion widget.

Managing and integrating chunks of source code into the editor window is easy to get wrong, especially if there is overlap between what's there and what's getting integrated. To help facilitate that kind of work, this commit adds a new SourceFragment class. It encapsulates a chunk of code and provides ways to access it from the top and the bottom, and ways to split it into an array of lines, optionally trimming whitespace and blank lines in the process. SourceFragment also comes with a getRemainingCompletion method that determines if the bottom of the fragment matches the top of the passed in fragment. It then returns what's not yet integrated. There is also a getAsTruncatedFragment function that will chop off the specified tail of the fragment, optionally ignoring whitespace.

When the user hits tab to accept an inline completion the completion provider that provided the completion gets notified via its accept method. There are other parts of the code that would ideally like to know when the completion provider accept method is called. This commit adds a VSCode-like onAccept function to register disposable accept listeners.

Now that the core completion provider has disposable resources, we need to explicitly clean those up when the extension completion provider is disposed. This commit adds a call to do that.

VSCode currently has no way to disable the intellisense suggestion widget from popping up. We need tight control over when it can show and can't show, so it doesn't badly influence the autocomplete output. This commit adds 3 new commands: continue.disableSuggestionWidget continue.enableSuggestionWidget continue.isSuggestionWidgetDisabled That provide the ability to inhibit the suggestion widget when necessary. Under the hood, they're implemented by changing out the default type handler to a simpler one that merely moves text to the editor buffer, bypassing the completion provider logic.

VSCode really wants inline completion providers to generate completion items faster than we can realistically provide, given the LLM may take the order of seconds to come up with a reply. In some cases VSCode expects a response in less than 80ms. In order to cope, this commit makes the inline completion provider return null immediately, while working on the results in the background. Once they come in, it tells VSCode to try again. This commit also adds an onAccept listener to know when to clean up the inline completion after the user has green lighted it.

Right now the suggestion widget will show up before the inline completion is processed. This causes a couple of problems: 1. It is not informed by the results of an LLM so its suggestions are often off base. 2. It affects the editor buffer so can mislead the LLM on what sort of inline completions to provide. This commit flips things on their head: We now inhibit the suggestion widget until the inline completion is generated and even then, only uninhibit the the suggestion widget if it has at least one suggestion that is compatible with the inline suggestion. The overarching idea is that the LLM is the "smart" output, so it should be what drives completion, not the context-naïve alphabetically ordered list of completions provides by intellisense.

We want the inline completion to inform the suggestion widget, not the other way around, so we shouldn't use the suggestion widget suggestion when figuring out the inline completion. This commit: - Stops putting the suggestion widget suggestion in the prefix fed to the LLM - Stops including the suggestion widget suggestion in the AutocompleteInput - Stops including the suggestion widget suggestion in the range calculations for the inline completion item. - Drops the willDisplay function that checks if an inline completion matches the current suggestion and ditches the inline completion when there is a mismatch.

We're no longer using the suggestion widget output as part of our inline completion generation, so we no longer need to replace a range of text the user already typed with something different. This commit changes the ranges to be zero width, which means "insert at position exactly".

The granite models sometimes fail to completely follow instructions in two ways: 1. They'll often repeat the bit of the prefix on the same line as the completion. 2. They'll sometimes repeat the suffix and then even go further and hallucinate code after the suffix. The code already has a mitigation in place for 1., but nothing to handle 2. Furthermore, the mitigation for 1. is open coded, since it predates SourceFragement. This commit changes the code over to use SourceFragment and at the same time, addresses 2 by chopping off the spurious completion text at the suffix start.

halfline · 2025-05-06T17:18:14Z

So the code seems to work fine with qwen2.5-coder and claude-sonnet-3.7. codestral has some whitespace issues, but I think they may be specific to codestral.

Some LLM models (e.g., granite 3.3) run past the suffix in their completion output when using fill in the middle support. This commit adds a way for an AutocompleteTemplate to generate its completion options (e.g., it's stop token sequences) dynamically. Now completionOptions can be a function or string, in the same way template can be a function or string.

halfline · 2025-05-07T17:31:28Z

core/autocomplete/postprocessing/index.ts

+    const truncatedCompletion = completionFragment.getAsTruncatedFragment({
+      suffix: suffixFragment,
+      ignoreWhitespace: true
+    });


actually, digging through the code with @panyamkeerthana today, i don't think this should be necessary because the same thing should be happening at a different layer. the StreamTransformPipeline does this:

charGenerator = stopAtStartOf(charGenerator, suffix);

here:

continue/core/autocomplete/filtering/streamTransforms/StreamTransformPipeline.ts

Line 36 in e4dd024

charGenerator = stopAtStartOf(charGenerator, suffix);

So I may just be able to drop this commit entirely, or potentially drop this commit and tweak the implementation of stopAtStartOf if it's not engaging for some reason. will do some testing.

halfline · 2025-05-07T17:38:04Z

core/autocomplete/templating/AutocompleteTemplate.ts

+    // The granite models don't always stop at the suffix and instead just keep going, so extract the first non-whitespace
+    // line of the suffix and add it as a stop token sequence
+    const suffixFragment = new SourceFragment(suffix);
+    const [firstLineOfSuffix = undefined] = suffixFragment.head(1, {
+      ignoreWhitespace: true,
+    });
+
+    const stopTokenSequences = [
+      ...staticStopTokenSequences,
+      firstLineOfSuffix,
+    ].filter((sequence) => sequence !== undefined);
+


now that I know the design is to stop at the suffix for all LLMs, i should probably drop this and instead put this commit back on the branch:

commit 1cb51eaf08252c83f41da1760ed1d1591cce0a86 Author: Ray Strode <[email protected]> Date: Mon Apr 28 15:28:46 2025 -0400 autocomplete: Use suffix as stop token sequence Some LLM models (e.g., granite 3.3) run past the suffix in their completion output when using fill in the middle support. This commit adds the suffix as a stop token sequence to prevent that from happening. diff --git a/core/autocomplete/templating/index.ts b/core/autocomplete/templating/index.ts index 9722ee328..27e91227f 100644 --- a/core/autocomplete/templating/index.ts +++ b/core/autocomplete/templating/index.ts @@ -5,4 +5,5 @@ import { AutocompleteLanguageInfo } from "../constants/AutocompleteLanguageInfo" import { HelperVars } from "../util/HelperVars"; +import { SourceFragment } from "../../util/SourceFragment"; import { SnippetPayload } from "../snippets"; import { @@ -111,9 +112,14 @@ export function renderPrompt({ ); - const stopTokens = getStopTokens( + // Some models don't stop at the suffix and just keep going, so extract the first line + // of the suffix and add it as a stop token sequence + const suffixFragment = new SourceFragment(suffix); + const [suffixStart = ""] = suffixFragment.head(1, { ignoreWhitespace: true }); + + const stopTokens = [...getStopTokens( completionOptions, helper.lang, helper.modelName, - ); + ), suffixStart]; return {

That way LLMs aren't generating content that we just then throw away.

halfline · 2025-05-07T17:46:20Z

core/util/SourceFragment.ts

+      endLineLimit: bestMatchStartOffset,
+    };
+
+    // Truncate everything before the overlap.


this should say "Truncate to the non-overlapping section" or something

halfline · 2025-05-07T17:47:04Z

core/util/SourceFragment.ts

+    }
+
+    // If the source fragment is empty, we should just return the entire reference fragment
+    // If true,matchingRange at this point covers the entire reference fragment, but no


missing space after comma. should probably mention that not specifying the range when it is the entire fragment hits a fast path.

granite 3.3 has fill in the middle support, so we shouldn't use hole filler with it.

getRepoName seems to sometimes take more than 10 seconds when it gets called. This defeats autocomplete. For now, until I can investigate, just get rid of the call.

halfline · 2025-05-19T17:06:56Z

extensions/vscode/src/commands.ts

+        "type",
+        async (args) => {
+          const editor = vscode.window.activeTextEditor;
+          if (editor) {
+            await editor.edit((editBuilder) => {
+              editor.selections.forEach((selection) => {
+                editBuilder.insert(selection.active, args.text);
+              });
+            });
+          }


@owtaylor hammered on this pr last week and noticed an odd issue... If the user is typing code and hits enter, VS Code no longer autoindents on the next line. I suspect the cause is this commit. We're no longer inserting the text into the editor from a "keyboard" source, I believe, so it turns off auto indentation logic. We either need to figure out how to get that logic back on, or drop this hack and come up with a different way of disabling the suggestion widget (change user settings? ugh)

halfline · 2025-05-27T18:58:27Z

So aside from autoindent there's a bigger problem, despite LLMs generally being smarter than intellisense, there are points when there's not enough "there, there" for it to anticipate what the user wants. In those cases, intellisense still outshines the LLM because it puts what to do next in the user's hands.

So I have a new idea which I plan to experiment with soon. Rather than inhibit the suggestion widget until the LLM is ready, allow it to show at any point, but add a place holder item to the top of it that represents the LLMs (pending) response.

In this way we shouldn't need hacks that break autoindent, we don't hide the suggestion widget in cases the user actually would prefer it and we still provide way for the user to take control while they wait.

I'll see if I can whip this branch into something that meshes with that workflow. See also some small quick wins Owen and Keerthana did in Granite-Code#42 . When I update this branch i'll make sure it's on top of those changes.

halfline requested a review from a team as a code owner May 6, 2025 15:20

halfline requested review from tomasz-stefaniak and removed request for a team May 6, 2025 15:20

github-project-automation bot added this to Issues and PRs May 6, 2025

github-project-automation bot moved this to Todo in Issues and PRs May 6, 2025

halfline marked this pull request as draft May 6, 2025 15:20

halfline added 10 commits May 6, 2025 11:54

autocomplete: Dispose of core CompletionProvider

491082a

Now that the core completion provider has disposable resources, we need to explicitly clean those up when the extension completion provider is disposed. This commit adds a call to do that.

halfline force-pushed the granite-completion-fixes branch from 5b5c08a to 6b9a76e Compare May 6, 2025 15:55

halfline force-pushed the granite-completion-fixes branch from 6b9a76e to 304687e Compare May 6, 2025 19:27

halfline commented May 7, 2025

View reviewed changes

halfline added 2 commits May 7, 2025 15:16

autocomplete: Use FIM for granite-3.3

3b855a1

granite 3.3 has fill in the middle support, so we shouldn't use hole filler with it.

wip! CompletionProvider: Don't call getRepoName

c78adbc

getRepoName seems to sometimes take more than 10 seconds when it gets called. This defeats autocomplete. For now, until I can investigate, just get rid of the call.

halfline force-pushed the granite-completion-fixes branch from 304687e to c78adbc Compare May 7, 2025 19:16

halfline commented May 19, 2025

View reviewed changes

owtaylor mentioned this pull request May 19, 2025

Improve interaction between Intellisense and Inline Completion Granite-Code/continue-for-granite#42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: autocompletion rework and granite completion fixes #5543

Draft: autocompletion rework and granite completion fixes #5543

Uh oh!

halfline commented May 6, 2025 •

edited

Loading

Uh oh!

netlify bot commented May 6, 2025 •

edited

Loading

Uh oh!

halfline commented May 6, 2025

Uh oh!

halfline May 7, 2025 •

edited

Loading

Uh oh!

halfline May 7, 2025

Uh oh!

halfline May 7, 2025

Uh oh!

halfline May 7, 2025 •

edited

Loading

Uh oh!

halfline May 19, 2025 •

edited

Loading

Uh oh!

halfline commented May 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Draft: autocompletion rework and granite completion fixes #5543

Are you sure you want to change the base?

Draft: autocompletion rework and granite completion fixes #5543

Uh oh!

Conversation

halfline commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for continuedev ready!

Uh oh!

halfline commented May 6, 2025

Uh oh!

halfline May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halfline May 7, 2025

Choose a reason for hiding this comment

Uh oh!

halfline May 7, 2025

Choose a reason for hiding this comment

Uh oh!

halfline May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halfline May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halfline commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

halfline commented May 6, 2025 •

edited

Loading

netlify bot commented May 6, 2025 •

edited

Loading

halfline May 7, 2025 •

edited

Loading

halfline May 7, 2025 •

edited

Loading

halfline May 19, 2025 •

edited

Loading

halfline commented May 27, 2025 •

edited

Loading