Skip to content

Draft: autocompletion rework and granite completion fixes #5543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

halfline
Copy link
Contributor

@halfline halfline commented May 6, 2025

This is a draft merge request that reworks how autocomplete behaves with regard to the intellisense suggestion widget.

The core idea is that the LLM is the smart thing, so it should be informing the suggestion widget, not the other way around. The top item in the suggestion widget is frequently something the user doesn't want, so including it in autocomplete can make the autocomplete text go off in the wrong direction.

If we defer showing the suggestion widget until the LLM has come back, then we can get a good inline complete and still show the suggestion pop up, but with the LLM suggested menu item preselected.

We can also simplify the completion item range calculation, by doing this, because we're only ever inserting new code, not replacing existing code.

This is a draft pull request, because I've only tested it with granite models at the moment, and there are some hacks for granite models at end of the patchset.

Once I do some more testing (with e.g. qwen2.5-coder or some other non-granite model), and drop some of the bottom commits, I'll drop draft status.

If the user focuses away from the window, the current
inline completion is cleared. When the user comes back
the completion is not regenerated.

This commit addresses that problem by explicitly
triggering inline completions on focus and tab
changes.
@halfline halfline requested a review from a team as a code owner May 6, 2025 15:20
@halfline halfline requested review from tomasz-stefaniak and removed request for a team May 6, 2025 15:20
@halfline halfline marked this pull request as draft May 6, 2025 15:20
Copy link

netlify bot commented May 6, 2025

Deploy Preview for continuedev ready!

Name Link
🔨 Latest commit c78adbc
🔍 Latest deploy log https://app.netlify.com/sites/continuedev/deploys/681bb1ab9ac86c0008b34128
😎 Deploy Preview https://deploy-preview-5543--continuedev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

halfline added 10 commits May 6, 2025 11:54
We currently only try to provide inline completion if
the suggestion widget is active, if at least 4 characters
match the popup.

The problem is that suggestion widget doesn't have a lot of smarts to
figure out what to suggest, it mainly just picks from an alphabetically
sorted list, so assuming its suggestion is what the user wants is
dubious behavior unless the user has somehow made some motions to
indicate the suggestion widget selection is useful.

But, actually, the suggestion widget can be influenced by the inline
completion hint, which comes from a smart LLM, so rather than influencing
the inline completion from the suggestion widget, we should really
be influencing the suggestion widget from the inline completion.

As a first step toward achieving that goal, this commit reverts the
4 character logic that was added in

commit 304056d

A subsequent commit will work reversing the dependencies between
inline completions and the suggestion widget.
Managing and integrating chunks of source code into the
editor window is easy to get wrong, especially if there is
overlap between what's there and what's getting integrated.

To help facilitate that kind of work, this commit adds a new
SourceFragment class. It encapsulates a chunk of code and provides
ways to access it from the top and the bottom, and ways to split
it into an array of lines, optionally trimming whitespace and
blank lines in the process.

SourceFragment also comes with a getRemainingCompletion
method that determines if the bottom of the fragment matches
the top of the passed in fragment. It then returns what's not
yet integrated.

There is also a getAsTruncatedFragment function that will chop
off the specified tail of the fragment, optionally ignoring
whitespace.
When the user hits tab to accept an inline completion the completion
provider that provided the completion gets notified via its accept
method.

There are other parts of the code that would ideally like to know
when the completion provider accept method is called.

This commit adds a VSCode-like onAccept function to register disposable
accept listeners.
Now that the core completion provider has disposable
resources, we need to explicitly clean those up when
the extension completion provider is disposed.

This commit adds a call to do that.
VSCode currently has no way to disable the intellisense suggestion
widget from popping up. We need tight control over when it can show
and can't show, so it doesn't badly influence the autocomplete output.

This commit adds 3 new commands:

continue.disableSuggestionWidget
continue.enableSuggestionWidget
continue.isSuggestionWidgetDisabled

That provide the ability to inhibit the suggestion widget when
necessary.

Under the hood, they're implemented by changing out the default
type handler to a simpler one that merely moves text to the editor
buffer, bypassing the completion provider logic.
VSCode really wants inline completion providers to generate completion
items faster than we can realistically provide, given the LLM may
take the order of seconds to come up with a reply. In some cases
VSCode expects a response in less than 80ms.

In order to cope, this commit makes the inline completion provider
return null immediately, while working on the results in the background.

Once they come in, it tells VSCode to try again.

This commit also adds an onAccept listener to know when to clean up the
inline completion after the user has green lighted it.
Right now the suggestion widget will show up before the inline
completion is processed. This causes a couple of problems:

1. It is not informed by the results of an LLM so its
   suggestions are often off base.

2. It affects the editor buffer so can mislead the LLM on what
   sort of inline completions to provide.

This commit flips things on their head: We now inhibit the
suggestion widget until the inline completion is generated and
even then, only uninhibit the the suggestion widget if it has
at least one suggestion that is compatible with the inline
suggestion.

The overarching idea is that the LLM is the "smart" output, so
it should be what drives completion, not the context-naïve
alphabetically ordered list of completions provides by
intellisense.
We want the inline completion to inform the suggestion widget, not
the other way around, so we shouldn't use the suggestion widget
suggestion when figuring out the inline completion.

This commit:

- Stops putting the suggestion widget suggestion in the prefix fed
  to the LLM

- Stops including the suggestion widget suggestion in the
  AutocompleteInput

- Stops including the suggestion widget suggestion in the range
  calculations for the inline completion item.

- Drops the willDisplay function that checks if an inline completion
  matches the current suggestion and ditches the inline completion
  when there is a mismatch.
We're no longer using the suggestion widget output as part of
our inline completion generation, so we no longer need to
replace a range of text the user already typed with something
different.

This commit changes the ranges to be zero width, which means
"insert at position exactly".
The granite models sometimes fail to completely follow instructions
in two ways:

1. They'll often repeat the bit of the prefix on the same line as
the completion.

2. They'll sometimes repeat the suffix and then even go further and
hallucinate code after the suffix.

The code already has a mitigation in place for 1., but nothing to
handle 2. Furthermore, the mitigation for 1. is open coded, since it
predates SourceFragement.

This commit changes the code over to use SourceFragment and at the
same time, addresses 2 by chopping off the spurious completion text
at the suffix start.
@halfline halfline force-pushed the granite-completion-fixes branch from 5b5c08a to 6b9a76e Compare May 6, 2025 15:55
@halfline
Copy link
Contributor Author

halfline commented May 6, 2025

So the code seems to work fine with qwen2.5-coder and claude-sonnet-3.7. codestral has some whitespace issues, but I think they may be specific to codestral.

Some LLM models (e.g., granite 3.3) run past the suffix in
their completion output when using fill in the middle support.

This commit adds a way for an AutocompleteTemplate to generate
its completion options (e.g., it's stop token sequences) dynamically.

Now completionOptions can be a function or string, in the same way
template can be a function or string.
@halfline halfline force-pushed the granite-completion-fixes branch from 6b9a76e to 304687e Compare May 6, 2025 19:27
Comment on lines +116 to +119
const truncatedCompletion = completionFragment.getAsTruncatedFragment({
suffix: suffixFragment,
ignoreWhitespace: true
});
Copy link
Contributor Author

@halfline halfline May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, digging through the code with @panyamkeerthana today, i don't think this should be necessary because the same thing should be happening at a different layer. the StreamTransformPipeline does this:

    charGenerator = stopAtStartOf(charGenerator, suffix);

here:

charGenerator = stopAtStartOf(charGenerator, suffix);

So I may just be able to drop this commit entirely, or potentially drop this commit and tweak the implementation of stopAtStartOf if it's not engaging for some reason. will do some testing.

Comment on lines +128 to +139
// The granite models don't always stop at the suffix and instead just keep going, so extract the first non-whitespace
// line of the suffix and add it as a stop token sequence
const suffixFragment = new SourceFragment(suffix);
const [firstLineOfSuffix = undefined] = suffixFragment.head(1, {
ignoreWhitespace: true,
});

const stopTokenSequences = [
...staticStopTokenSequences,
firstLineOfSuffix,
].filter((sequence) => sequence !== undefined);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I know the design is to stop at the suffix for all LLMs, i should probably drop this and instead put this commit back on the branch:

commit 1cb51eaf08252c83f41da1760ed1d1591cce0a86
Author: Ray Strode <[email protected]>
Date:   Mon Apr 28 15:28:46 2025 -0400

    autocomplete: Use suffix as stop token sequence
    
    Some LLM models (e.g., granite 3.3) run past the suffix in
    their completion output when using fill in the middle support.
    
    This commit adds the suffix as a stop token sequence to prevent
    that from happening.

diff --git a/core/autocomplete/templating/index.ts b/core/autocomplete/templating/index.ts
index 9722ee328..27e91227f 100644
--- a/core/autocomplete/templating/index.ts
+++ b/core/autocomplete/templating/index.ts
@@ -5,4 +5,5 @@ import { AutocompleteLanguageInfo } from "../constants/AutocompleteLanguageInfo"
 import { HelperVars } from "../util/HelperVars";
 
+import { SourceFragment } from "../../util/SourceFragment";
 import { SnippetPayload } from "../snippets";
 import {
@@ -111,9 +112,14 @@ export function renderPrompt({
         );
 
-  const stopTokens = getStopTokens(
+  // Some models don't stop at the suffix and just keep going, so extract the first line
+  // of the suffix and add it as a stop token sequence
+  const suffixFragment = new SourceFragment(suffix);
+  const [suffixStart = ""] = suffixFragment.head(1, { ignoreWhitespace: true });
+
+  const stopTokens = [...getStopTokens(
     completionOptions,
     helper.lang,
     helper.modelName,
-  );
+  ), suffixStart];
 
   return {

That way LLMs aren't generating content that we just then throw away.

endLineLimit: bestMatchStartOffset,
};

// Truncate everything before the overlap.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should say "Truncate to the non-overlapping section" or something

}

// If the source fragment is empty, we should just return the entire reference fragment
// If true,matchingRange at this point covers the entire reference fragment, but no
Copy link
Contributor Author

@halfline halfline May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space after comma. should probably mention that not specifying the range when it is the entire fragment hits a fast path.

halfline added 2 commits May 7, 2025 15:16
granite 3.3 has fill in the middle support, so we shouldn't use
hole filler with it.
getRepoName seems to sometimes take more than 10 seconds when it gets
called. This defeats autocomplete.

For now, until I can investigate, just get rid of the call.
@halfline halfline force-pushed the granite-completion-fixes branch from 304687e to c78adbc Compare May 7, 2025 19:16
Comment on lines +809 to +818
"type",
async (args) => {
const editor = vscode.window.activeTextEditor;
if (editor) {
await editor.edit((editBuilder) => {
editor.selections.forEach((selection) => {
editBuilder.insert(selection.active, args.text);
});
});
}
Copy link
Contributor Author

@halfline halfline May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owtaylor hammered on this pr last week and noticed an odd issue... If the user is typing code and hits enter, VS Code no longer autoindents on the next line. I suspect the cause is this commit. We're no longer inserting the text into the editor from a "keyboard" source, I believe, so it turns off auto indentation logic. We either need to figure out how to get that logic back on, or drop this hack and come up with a different way of disabling the suggestion widget (change user settings? ugh)

@halfline
Copy link
Contributor Author

halfline commented May 27, 2025

So aside from autoindent there's a bigger problem, despite LLMs generally being smarter than intellisense, there are points when there's not enough "there, there" for it to anticipate what the user wants. In those cases, intellisense still outshines the LLM because it puts what to do next in the user's hands.

So I have a new idea which I plan to experiment with soon. Rather than inhibit the suggestion widget until the LLM is ready, allow it to show at any point, but add a place holder item to the top of it that represents the LLMs (pending) response.

In this way we shouldn't need hacks that break autoindent, we don't hide the suggestion widget in cases the user actually would prefer it and we still provide way for the user to take control while they wait.

I'll see if I can whip this branch into something that meshes with that workflow. See also some small quick wins Owen and Keerthana did in Granite-Code#42 . When I update this branch i'll make sure it's on top of those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

1 participant