Skip to content

[ES|QL][Completion] Inference Constant folding optimization for COMPLETION #136863

@afoucret

Description

@afoucret

Problem

Users often use the COMPLETION command with constant input text for tasks like spellchecking, text validation, and content generation. This involves using static prompts that do not vary per row.

Example use case: spellchecking queries

FROM movie 
| COMPLETION spellcheck_query=CONCAT("Spellcheck the following query:", ?query)
| WHERE title: spellcheck_query
| SORT _score DESC

The current ES|QL implementation evaluates the completion per row, even when the prompt is constant. This results in:

  • Unnecessary load on inference endpoints
  • Increased API costs
  • Higher latency due to repeated identical requests

Additionally, the result of the inference evaluation is not considered foldable, which prevents:

  • Usage in WHERE clauses or fulltext query predicates
  • ES|QL optimizer from applying further optimizations on foldable expressions
  • Proper query planning and push-down optimizations

Acceptance criteria

  • Inference commands must detect when their input is foldable (constant) during query planning.
  • When foldable, execute inference once during the optimization phase, not per row.
  • The result of a constant COMPLETION operation must be treated as a foldable expression that can be:
    • Used in WHERE clauses with fulltext queries (match, query_string, etc.)
    • Subject to further optimizations (query push-down, etc.)
    • Non-constant prompts continue to work as before, evaluated per row.

Metadata

Metadata

Assignees

Labels

:Search Relevance/ES|QLSearch functionality in ES|QL:SearchOrg/RelevanceLabel for the Search (solution/org) Relevance teamTeam:Search - RelevanceThe Search organization Search Relevance teamTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearch

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions