Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Explain API not compatible with k-NN queries #875

Open
SeyedAlirezaFatemi opened this issue Apr 20, 2023 · 7 comments · May be fixed by #2403
Open

[BUG] Explain API not compatible with k-NN queries #875

SeyedAlirezaFatemi opened this issue Apr 20, 2023 · 7 comments · May be fixed by #2403
Assignees
Labels
backlog Enhancements Increases software capabilities beyond original client specifications

Comments

@SeyedAlirezaFatemi
Copy link

What is the bug?
The Explain API is not compatible with k-NN queries and assumes they have a score of 1. Well, I don't think there is a useful explanation to be given for k-NN queries, but when you have multiple k-NN queries or k-NN queries combined with text match queries, knowing the score of each k-NN part individually can be helpful.

How can one reproduce the bug?
Just make a k-NN query and set the explain flag to true.

What is the expected behavior?
Just have the score of the k-NN part correctly as the value field in the explanation.

Do you have any screenshots?
image

@SeyedAlirezaFatemi SeyedAlirezaFatemi added bug Something isn't working untriaged labels Apr 20, 2023
@navneet1v
Copy link
Collaborator

@SeyedAlirezaFatemi could you please let us know what you are looking in the explain api?

@SeyedAlirezaFatemi
Copy link
Author

I don't think there is anything significant in the explain API for kNN queries other than the value. I just need to check the value in case I combine a kNN query with other queries.

image

@neetikasinghal
Copy link

neetikasinghal commented Jan 24, 2025

[Feature]: Support Explain API for KNN

Overview

The explain API in OpenSearch is helpful to the customers to figure out how the relevance score is calculated for every result returned by a query.
The Explanation object provides a human-readable breakdown of the score calculation. It includes:

  • Score: The final computed score for the document.
  • Details: A list of contributing factors to the score (like term frequency, field length normalization, and IDF).
  • Description: A textual explanation of each factor.

Use-cases where explain API is helpful:

  • Debugging: To see why some documents rank higher than others.
  • Fine-tuning Queries: Adjusting boosts or weights for better relevance.
  • Understanding Scoring: An educational tool to understand how scores are computed.

KNN supports different types of searches like approximate nearest neighbor search, exact search, disk-based search, radial search, etc.
The types of KNN searches that exist today are shown in the figure below -

Image

There are various factors taken into consideration to execute a type of search based on the query given by the customer. Customers find it difficult to understand why a particular type of search was executed for KNN and would benefit with a support of explain API for the KNN queries.

Current State
With KNN, the explain API today just returns the score with minimal description and no details on how the score was calculated as given below -

hits:[
      {
        _shard:"[knn-index][0]",
        _node:"jtbYM-ZbSNGyuzlzgZcFjg",
        _index:"knn-index",
        _id:"1",
        _score:0.6666667,
        _source:{
          my_vector:[
            1.5,
            2.5
          ],
          price:12.2
        },
        _explanation:{
          value:0.6666667,
          description:"within top 2",
          details:[
          ]
        }
      },

Github issues seeking feature support
Issue link 1: #875
Issue link 2: opensearch-project/neural-search#698
Issue link 3: opensearch-project/neural-search#658

Feature aim
The aim here is to support the explain API for KNN queries such that the response returned by the API is self-explanatory to the customers to better understand the results for debugging/troubleshooting purposes.

Scope for the proposal
The scope of this document is to layout the proposed user experience for the explain API of KNN queries with -

Proposed User Experience

Explanation into two layers

First layer would tell whether the type of KNN search was Approximate NN or Disk-based or Radial.
Second layer would tell type of KNN search that was executed at the leaf with other details like space type, vector data type, query vector, vector dimension etc. The information included would help in giving an understanding about why a particular type of search was executed.

"_explanation": {
    "value": 84.7,
    "description": "the type of knn search executed was Approximate-NN",
    "details" : [
                {
                  "value" : 84.7,
                  "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = innerproduct where score is computed as `-rawScore + 1` from:",
                  "details" : [
                    {
                      "value" : -83.7,
                      "description" : "rawScore, returned from FAISS library",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }

Alternatives considered

  • Multiple levels of explain - breaking down the data into multiple levels make it less concise, also the value field = 0 doesn’t represent the correct meaning
"explanation" : {
    "value" : 3.5671005,
    "description" : "the type of knn search executed was Disk-based/Approximate-NN/Exact",
    "details" : [
      {
        "value" : 3.5671005,
        "description" : "the type of knn search executed was Approximate-NN/Exact",
        "details" : [
          {
            "value" : 0,
            "description" : "queryVector = [2,3]",
            "details" : [ ]
          },
          {
            "value" : 0,
            "description" : "spaceType = l2",
            "details" : []
          },
          {
            "value" : 0,
            "description" : "score translation = `-rawScore + 1`",
            "details" : []
          }
        ]
      }
    ]
  }
  • One level of explain - this doesn’t provide us much flexibility to add more details in the first layer of search. As an example, disk-based search has more details in the explanation for first layer as well. Also, it makes one layer too overloaded and less clear.
"_explanation": {
    "value": 84.7,
    "description": "the type of knn search executed Approximate-NN and search at leaf was exact with spaceType = INNER_PRODUCT, vectorDataType = FLOAT",
    "details": []
}

Release Plan for different phases

  • [Phase 1 - 3.0(Release)] - Add support for explain for Exact/ANN/Radial/Disk/Filtering search
  • [Phase 2 - 3.0(Release)] - Add support for adding details around score calculation for exact/disk-based search
  • [Phase 2- 3.0(Release)] - Add support of explain for nested queries

Future improvements

  • Add support for explain for lucene engine as well

@neetikasinghal
Copy link

neetikasinghal commented Jan 24, 2025

Proposed output for different KNN searches with explain

  1. Approximate nearest neighbor search
{
    "took": 216038,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 88.4,
        "hits": [
            {
                "_shard": "[my-knn-index-1][0]",
                "_node": "VHcyav6OTsmXdpsttX2Yug",
                "_index": "my-knn-index-1",
                "_id": "5",
                "_score": 88.4,
                "_source": {
                    "my_vector1": [
                        2.5,
                        3.5,
                        5.5,
                        7.4
                    ],
                    "price": 8.9
                },
                "_explanation": {
                    "value": 88.4,
                    "description": "the type of knn search executed was Approximate-NN",
                    "details": [
                        {
                            "value": 88.4,
                            "description": "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = innerproduct where score is computed as `-rawScore + 1` from:",
                            "details": [
                                {
                                    "value": -87.4,
                                    "description": "rawScore, returned from FAISS library",
                                    "details": []
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard": "[my-knn-index-1][0]",
                "_node": "VHcyav6OTsmXdpsttX2Yug",
                "_index": "my-knn-index-1",
                "_id": "2",
                "_score": 84.7,
                "_source": {
                    "my_vector1": [
                        2.5,
                        3.5,
                        5.6,
                        6.7
                    ],
                    "price": 5.5
                },
                "_explanation": {
                    "value": 84.7,
                    "description": "the type of knn search executed was Approximate-NN",
                    "details": [
                        {
                            "value": 84.7,
                            "description": "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = innerproduct where score is computed as `-rawScore + 1` from:",
                            "details": [
                                {
                                    "value": -83.7,
                                    "description": "rawScore, returned from FAISS library",
                                    "details": []
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}
  1. ANN with exact search
{
  "took": 87,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 84.7,
    "hits": [
      {
        "_shard": "[my-knn-index-1][0]",
        "_node": "MQVux8dZRWeznuEYKhMq0Q",
        "_index": "my-knn-index-1",
        "_id": "7",
        "_score": 84.7,
        "_source": {
          "my_vector2": [
            2.5,
            3.5,
            5.6,
            6.7
          ],
          "price": 5.5
        },
        "_explanation": {
          "value": 84.7,
          "description": "the type of knn search executed was Approximate-NN",
          "details": [
            {
              "value": 84.7,
              "description": "the type of knn search executed at leaf was Exact with spaceType = INNER_PRODUCT, vectorDataType = FLOAT, queryVector = [2.0, 3.0, 5.0, 6.0]",
              "details": []
            }
          ]
        }
      },
      {
        "_shard": "[my-knn-index-1][0]",
        "_node": "MQVux8dZRWeznuEYKhMq0Q",
        "_index": "my-knn-index-1",
        "_id": "8",
        "_score": 82.2,
        "_source": {
          "my_vector2": [
            4.5,
            5.5,
            6.7,
            3.7
          ],
          "price": 4.4
        },
        "_explanation": {
          "value": 82.2,
          "description": "the type of knn search executed was Approximate-NN",
          "details": [
            {
              "value": 82.2,
              "description": "the type of knn search executed at leaf was Exact with spaceType = INNER_PRODUCT, vectorDataType = FLOAT, queryVector = [2.0, 3.0, 5.0, 6.0]",
              "details": []
            }
          ]
        }
      }
    ]
  }
  1. Disk-based search
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 381.0,
    "hits" : [
      {
        "_shard" : "[my-vector-index][0]",
        "_node" : "pLaiqZftTX-MVSKdQSu7ow",
        "_index" : "my-vector-index",
        "_id" : "9",
        "_score" : 381.0,
        "_source" : {
          "my_vector_field" : [
            9.5,
            9.5,
            9.5,
            9.5,
            9.5,
            9.5,
            9.5,
            9.5
          ],
          "price" : 8.9
        },
        "_explanation" : {
          "value" : 381.0,
          "description" : "the type of knn search executed was Disk-based and the first pass k was 100 with vector dimension of 8, over sampling factor of 5.0, shard level rescoring enabled",
          "details" : [
            {
              "value" : 381.0,
              "description" : "the type of knn search executed at leaf was Approximate-NN with spaceType = HAMMING, vectorDataType = FLOAT, queryVector = [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]",
              "details" : [ ]
            }
          ]
        }
      }
    ]
  }
}
  1. Efficient filtering
{
  "took" : 51,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.8620689,
    "hits" : [
      {
        "_shard" : "[products-shirts][0]",
        "_node" : "9epk8WoFT8yvnUI0tAaJgQ",
        "_index" : "products-shirts",
        "_id" : "8",
        "_score" : 0.8620689,
        "_source" : {
          "item_vector" : [
            2.4,
            4.0,
            3.0
          ],
          "size" : "small",
          "rating" : 8
        },
        "_explanation" : {
          "value" : 0.8620689,
          "description" : "the type of knn search executed was Approximate-NN",
          "details" : [
            {
              "value" : 0.8620689,
              "description" : "the type of knn search executed at leaf was Exact since filteredIds = 2 is less than or equal to K = 10 with spaceType = L2, vectorDataType = FLOAT, queryVector = [2.0, 4.0, 3.0]",
              "details" : [ ]
            }
          ]
        }
      },
      {
        "_shard" : "[products-shirts][0]",
        "_node" : "9epk8WoFT8yvnUI0tAaJgQ",
        "_index" : "products-shirts",
        "_id" : "6",
        "_score" : 0.029691212,
        "_source" : {
          "item_vector" : [
            6.4,
            3.4,
            6.6
          ],
          "size" : "small",
          "rating" : 9
        },
        "_explanation" : {
          "value" : 0.029691212,
          "description" : "the type of knn search executed was Approximate-NN",
          "details" : [
            {
              "value" : 0.029691212,
              "description" : "the type of knn search executed at leaf was Exact since filteredIds = 2 is less than or equal to K = 10 with spaceType = L2, vectorDataType = FLOAT, queryVector = [2.0, 4.0, 3.0]",
              "details" : [ ]
            }
          ]
        }
      }
    ]
  }
}
  1. Radial Search
{
  "took" : 376529,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.98039204,
    "hits" : [
      {
        "_shard" : "[knn-index-test][0]",
        "_node" : "c9b4aPe4QGO8eOtb8P5D3g",
        "_index" : "knn-index-test",
        "_id" : "1",
        "_score" : 0.98039204,
        "_source" : {
          "my_vector" : [
            7.0,
            8.2
          ],
          "price" : 4.4
        },
        "_explanation" : {
          "value" : 0.98039204,
          "description" : "the type of knn search executed was Radial with the radius of 2.0",
          "details" : [
            {
              "value" : 0.98039204,
              "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = l2 where score is computed as `1 / (1 + rawScore)` from:",
              "details" : [
                {
                  "value" : 0.020000057,
                  "description" : "rawScore, returned from FAISS library",
                  "details" : [ ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[knn-index-test][0]",
        "_node" : "c9b4aPe4QGO8eOtb8P5D3g",
        "_index" : "knn-index-test",
        "_id" : "3",
        "_score" : 0.9615384,
        "_source" : {
          "my_vector" : [
            7.3,
            8.3
          ],
          "price" : 19.1
        },
        "_explanation" : {
          "value" : 0.9615384,
          "description" : "the type of knn search executed was Radial with the radius of 2.0",
          "details" : [
            {
              "value" : 0.9615384,
              "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = l2 where score is computed as `1 / (1 + rawScore)` from:",
              "details" : [
                {
                  "value" : 0.040000115,
                  "description" : "rawScore, returned from FAISS library",
                  "details" : [ ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[knn-index-test][0]",
        "_node" : "c9b4aPe4QGO8eOtb8P5D3g",
        "_index" : "knn-index-test",
        "_id" : "4",
        "_score" : 0.62111807,
        "_source" : {
          "my_vector" : [
            6.5,
            8.8
          ],
          "price" : 1.2
        },
        "_explanation" : {
          "value" : 0.62111807,
          "description" : "the type of knn search executed was Radial with the radius of 2.0",
          "details" : [
            {
              "value" : 0.62111807,
              "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = l2 where score is computed as `1 / (1 + rawScore)` from:",
              "details" : [
                {
                  "value" : 0.6099999,
                  "description" : "rawScore, returned from FAISS library",
                  "details" : [ ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[knn-index-test][0]",
        "_node" : "c9b4aPe4QGO8eOtb8P5D3g",
        "_index" : "knn-index-test",
        "_id" : "2",
        "_score" : 0.5524861,
        "_source" : {
          "my_vector" : [
            7.1,
            7.4
          ],
          "price" : 14.2
        },
        "_explanation" : {
          "value" : 0.5524861,
          "description" : "the type of knn search executed was Radial with the radius of 2.0",
          "details" : [
            {
              "value" : 0.5524861,
              "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = l2 where score is computed as `1 / (1 + rawScore)` from:",
              "details" : [
                {
                  "value" : 0.8100002,
                  "description" : "rawScore, returned from FAISS library",
                  "details" : [ ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

  1. KNN search with term query
curl -XGET "http://localhost:9200/my-knn-index-1/_search?explain=true&pretty" -H 'Content-Type: application/json' -d'                           
 {
    "query": {
        "bool": {
            "should": [
                {
                    "knn": {
                      "my_vector2": { // vector field name
                        "vector": [2, 3, 5, 6],
                        "k": 2
                      }
                    }
                },
                {
                    "term": {
                        "price": "4.4"
                    }
                }
            ]
        }
    }
}
'
{
  "took" : 51,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 84.7,
    "hits" : [
      {
        "_shard" : "[my-knn-index-1][0]",
        "_node" : "c9b4aPe4QGO8eOtb8P5D3g",
        "_index" : "my-knn-index-1",
        "_id" : "7",
        "_score" : 84.7,
        "_source" : {
          "my_vector2" : [
            2.5,
            3.5,
            5.6,
            6.7
          ],
          "price" : 5.5
        },
        "_explanation" : {
          "value" : 84.7,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 84.7,
              "description" : "the type of knn search executed was Approximate-NN",
              "details" : [
                {
                  "value" : 84.7,
                  "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = innerproduct where score is computed as `-rawScore + 1` from:",
                  "details" : [
                    {
                      "value" : -83.7,
                      "description" : "rawScore, returned from FAISS library",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      },
      {
        "_shard" : "[my-knn-index-1][0]",
        "_node" : "c9b4aPe4QGO8eOtb8P5D3g",
        "_index" : "my-knn-index-1",
        "_id" : "8",
        "_score" : 83.2,
        "_source" : {
          "my_vector2" : [
            4.5,
            5.5,
            6.7,
            3.7
          ],
          "price" : 4.4
        },
        "_explanation" : {
          "value" : 83.2,
          "description" : "sum of:",
          "details" : [
            {
              "value" : 82.2,
              "description" : "the type of knn search executed was Approximate-NN",
              "details" : [
                {
                  "value" : 82.2,
                  "description" : "the type of knn search executed at leaf was Approximate-NN with vectorDataType = FLOAT, spaceType = innerproduct where score is computed as `-rawScore + 1` from:",
                  "details" : [
                    {
                      "value" : -81.2,
                      "description" : "rawScore, returned from FAISS library",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            },
            {
              "value" : 1.0,
              "description" : "price:[1082969293 TO 1082969293]",
              "details" : [ ]
            }
          ]
        }
      }
    ]
  }
}

@neetikasinghal
Copy link

I will next be raising a PR for the explain API changes.

@Dharin-shah
Copy link

Dharin-shah commented Feb 27, 2025

Hello, can we also include naming the the knn query, this way we can have scores matched to the query itself
@neetikasinghal

Like below

 "knn": {
                    "_name": "my_knn",
                      "my_vector_field": {
                        "vector": [
                       
                        
                      ....
                      
                        ],
                        "k": 1000
                      }
                    }

and if we can attach the cosine similarity score for each hit that matched this vector

@neetikasinghal
Copy link

@Dharin-shah This is a different requirement, the scope of this issue is limited to explain api, please open a new feature request for this, thanks.

Hello, can we also include naming the the knn query, this way we can have scores matched to the query itself @neetikasinghal

Like below

 "knn": {
                    "_name": "my_knn",
                      "my_vector_field": {
                        "vector": [
                       
                        
                      ....
                      
                        ],
                        "k": 1000
                      }
                    }

and if we can attach the cosine similarity score for each hit that matched this vector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Enhancements Increases software capabilities beyond original client specifications
Projects
Status: Backlog (Hot)
Development

Successfully merging a pull request may close this issue.

5 participants