This is a workaround solution for autocomplete with vllm (I don't use ollama).
I also found that Instruct model can work with autocomplete so I just have to load 1 model for both tasks (chat & autocomplete).

vllm serve Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ --quantization awq --disable-log-requests --disable-log-stats

Continue config.json

{
  "models": [
    {
      "title": "Qwen2.5-Coder-7B-Instruct",
      "provider": "openai",
      "apiBase": "http://192.168.1.19:8000/v1/",
      "model": "Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder-7B-Instruct",
    "provider": "openai",
    "apiKey": "None",
    "completionOptions": {
      "stop": [
        "<|endoftext|>",
        "\n"
      ]
    },
    "apiBase": "http://192.168.1.19:8000/v1/",
    "model": "Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ"
  },
  "tabAutocompleteOptions": {
    "multilineCompletions": "never",
    "template": "You are a helpful assistant.<|fim_prefix|>{{{ prefix }}}<|fim_suffix|>{{{ suffix }}}<|fim_middle|>"
  },
  "customCommands": [],
  "allowAnonymousTelemetry": false,
  "docs": []
}

continuedev/continue#2388

Edit: correct config.json

How to use this in continue with Ollama? #94

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions