Skip to content

FullTextScore() in ORDER BY clause does not take affect in Cosmos DB NoSQL full-text queries with partition key #42241

@Wizmann

Description

@Wizmann
  • Package Name:
    azure-cosmos
  • Package Version:
    4.9.0
  • Operating System:
    Ubuntu 2204
  • Python Version:
    3.10.12

Describe the bug

When using Cosmos DB for NoSQL with Full-Text Search enabled (via fullTextPolicy and fullTextIndexes), the ORDER BY RANK FullTextScore(c.text, 'keyword') clause appears to have no effect on query results when partition key is enabled in the query. The ranking does not change even when the relevance of the documents clearly differs, and changing the keyword produces the same order of results.

To Reproduce
Steps to reproduce the behavior:

  1. Cosmos DB account with Full Text Search enabled (on CentralUS)
  2. Container created with:
full_text_policy = {
    "defaultLanguage": "en-US",
    "fullTextPaths": [{"path": "/text", "language": "en-US"}]
}
indexing_policy = {
    "indexingMode": "consistent",
    "automatic": True,
    "includedPaths": [{"path": "/*"}],
    "excludedPaths": [{"path": "/\"_etag\"/?"}],
    "fullTextIndexes": [{"path": "/text"}]
}

and Partition key: /userid

  1. Query used:
query = f"""
SELECT TOP 5 * FROM c
ORDER BY RANK FullTextScore(c.text, '{keyword}')
"""

results = list(container.query_items(query=query, partition_key=userid, enable_cross_partition_query=False))

Expected behavior
The query:

SELECT TOP 5 * FROM c 
ORDER BY RANK FullTextScore(c.text, 'fence')

should return the documents where c.text is most relevant to 'fence' first. In this example, the sentence:

"The cat jumped over the fence."

should be ranked highest.

Additional context

Sample code to reproduce the problem:

from azure.cosmos import CosmosClient, PartitionKey, exceptions
import uuid
import time

# Cosmos DB configuration
COSMOS_ENDPOINT = 'https://<COSMOS_ENDPOINT>.documents.azure.com:443/'
COSMOS_KEY = "<COSMOS_KEY>"
DATABASE_NAME = "<DATABASE_NAME>"
CONTAINER_NAME = "dogcat"

# Initialize Cosmos DB client
client = CosmosClient(COSMOS_ENDPOINT, COSMOS_KEY)
database = client.create_database_if_not_exists(id=DATABASE_NAME)

# Define full-text search policy and indexing policy
full_text_policy = {
    # Default language for full-text indexing
    "defaultLanguage": "en-US",
    # Apply full-text indexing to /text path
    "fullTextPaths": [
        {"path": "/text", "language": "en-US"}
    ]
}

indexing_policy = {
    "indexingMode": "consistent",
    "automatic": True,
    "includedPaths": [{"path": "/*"}],
    "excludedPaths": [{"path": "/\"_etag\"/?"}],
    # Enable full-text index on /text field
    "fullTextIndexes": [
        {"path": "/text"}
    ]
}

# Create the container with full-text indexing and policy (or retrieve if it already exists)
container = database.create_container_if_not_exists(
    id=CONTAINER_NAME,
    partition_key=PartitionKey(path="/userid"),
    indexing_policy=indexing_policy,
    full_text_policy=full_text_policy
)

# Remove all existing items from the container
def clear_container(container):
    query = "SELECT c.id, c.userid FROM c"
    for item in container.query_items(query=query, enable_cross_partition_query=True):
        container.delete_item(item=item["id"], partition_key=item["userid"])

clear_container(container)

# Insert test data under two partition keys: 'cat' and 'dog'
def insert_data(container):
    cat_sentences = [
        "The cat is sleeping on the couch.",
        "A black cat crossed the road.",
        "Cats are curious animals.",
        "I have a cat named Whiskers.",
        "The cat jumped over the fence."
    ]

    dog_sentences = [
        "The dog barked loudly last night.",
        "Dogs love going for walks.",
        "A golden retriever is a friendly dog.",
        "My dog plays fetch every day.",
        "The dog chased the ball into the yard."
    ]

    for text in cat_sentences:
        container.create_item({
            "id": str(uuid.uuid4()),
            "userid": "cat",
            "text": text
        })

    for text in dog_sentences:
        container.create_item({
            "id": str(uuid.uuid4()),
            "userid": "dog",
            "text": text
        })

insert_data(container)

# Wait for indexing to complete (typically immediate, but added for safety)
time.sleep(2)

# Perform full-text search ordered by relevance score
def full_text_search(container, userid, keyword):
    query = f"SELECT TOP 5 * FROM c ORDER BY RANK FullTextScore(c.text, '{keyword}')"
    results = list(container.query_items(query=query, partition_key=userid, enable_cross_partition_query=False))
    print(f"\nTop 5 results for userid='{userid}' with keyword='{keyword}':")
    for r in results:
        print(f"- [{r['userid']}] {r['text']}")

# Execute full-text search on both partitions
full_text_search(container, "cat", "fence")
full_text_search(container, "dog", "ball")

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-author-feedbackWorkflow: More information is needed from author to address the issue.no-recent-activityThere has been no recent activity on this issue.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions