Skip to content

Conversation

@zvi-code
Copy link
Collaborator

@zvi-code zvi-code commented Nov 5, 2025

Fixes #449

Summary

Fix tag OR syntax in hybrid queries and add comprehensive filter expression tests.

Changes

Tag Filter Parsing Fix

  • Use '|' separator for tag search queries instead of the index-defined separator
  • Allows consistent @field:{tag1|tag2|tag3} syntax in all queries regardless of index configuration
  • Previously used the separator from index creation, causing inconsistencies

New Test Coverage

Add test_filter_expressions.py with 23 test cases covering:

  • Tag filters with OR syntax (@field:{tag1|tag2|tag3})
  • All 9 numeric range variants (inclusive/exclusive bounds, infinity, equality)
  • Logical negation (-@field:{value})
  • Operator precedence (negation > AND > OR)
  • Hybrid queries (filters + vector search)
  • Complex multi-filter queries

This ensures the tag OR syntax works correctly across all query patterns documented in COMMANDS.md.

@zvi-code zvi-code requested review from allenss-amazon, Copilot and yairgott and removed request for Copilot November 5, 2025 21:07
    - Add comprehensive test suite (test_filter_expressions.py) covering:
      * Tag filters with OR syntax (@field:{tag1|tag2|tag3})
      * Numeric ranges (9 variants: inclusive/exclusive bounds, infinity)
      * Logical negation, operator precedence, hybrid queries
      * 23 test cases validating parser correctness
    - Fix tag OR separator: use ‘|' instead of index separator

Signed-off-by: Zvi Schneider <[email protected]>
@zvi-code zvi-code force-pushed the add-filter-expression-tests branch from f4ede02 to af91639 Compare November 5, 2025 21:25
Comment on lines +222 to +224
// separator used when the index was created. This allows users to specify
// multiple tags using the syntax: @field:{tag1|tag2|tag3}
return indexes::Tag::ParseSearchTags(tag_string, '|');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is correct. It leaves open the situation of parsing a tag that itself contains a pipe character. I think the tag parsing code should be updated to handle escaping of the separator character.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allenss-amazon , Do we have somewhere definition of valid\invalid characters for tags? Can "@country:{USA|GBR|CAN}=>[KNN 5 @embedding $vec]" be a tag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to stay aligned with RediSearch. Browsing their documentation, I found the following.

Comment on lines +101 to +117
query_vec = struct.pack('3f', 0.9, 0.1, 0.0)

# Test hybrid query with tag OR syntax: @country:{USA|GBR|CAN}=>[KNN 5 @embedding $vec]
result = client.execute_command(
"FT.SEARCH", "hybrid_idx",
"@country:{USA|GBR|CAN}=>[KNN 5 @embedding $vec]",
"PARAMS", "2", "vec", query_vec,
"RETURN", "1", "country"
)

# Should return up to 3 results (filtered by country tag)
assert result[0] >= 1 and result[0] <= 3

# Verify all results match the country filter
for i in range(1, len(result), 2):
key = result[i].decode('utf-8')
assert key in ["doc:1", "doc:2", "doc:3"], f"Unexpected key {key}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yair pointed out that there are two paths for hybrid queries. I believe we can force the logic to use one way or the other configuring the decision threshold (and there are counters to validate which way it went). Can we update the tests that have hybrid queries to force both paths?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to repeat the vector tests with both HNSW and FLAT, again different code paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue that's been reported is that the order of the results is unexpected, it's possible that we're getting different ordering of the results for vector queries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? I have these 2:

  1. "@Country:{USA|GBR|CAN}=>[KNN 5 @Embedding $vec]"
  2. "@Country:{USA} | @Country:{GBR} | @Country:{CAN}=>[KNN 5 @Embedding $vec]"
    Anything else?

My assumption was that the higher levels of parsing is done correctly, and the change does not affect those, but if I can extend the testing to check logically composed queries, like query1 | query2 | query3, where query-i is a valid filter by itself, is this your intent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to get alignment in capabilities between the ingestion and query subsystems. Assuming that this PR is merged more or less as-is, I believe we're in a state where a tag is forbidden from containing two characters: | and }. I note that this isn't an ingestion problem as we allow the redefinition of the tag separator character. This means that as long as your tags don't contain all 256 bytes, you can ingest all of them. However, you can't query for a tag that contains a pipe or a right brace -- because the parser has no way to treat these as part of a tag, they are hard-coded as delimiters.

I'm going to open an enhancement issue to specify that the query tag parsing logic be updated to support escaping of these two characters. We might roll the fix for this into the future Unicoding of the query logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allenss-amazon, Our documentation explicitly specifies the syntax query operator and these have nothing todo with the tag delimiter. I agree we need to have complete query syntax, I don't think we should connect the 2. This is a fix for a bug with respect to current declared usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can split this. I've opened a separate issue for the future enhancement to do the escape parsing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is captured in the following issue #454

Copilot AI review requested due to automatic review settings November 7, 2025 15:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Unable to use {USA|GBR|CAN} syntax for tag filter in KNN hybrid query

3 participants