Fix tag OR syntax in hybrid queries and add filter expression tests #452

zvi-code · 2025-11-05T21:07:27Z

Fixes #449

Summary

Fix tag OR syntax in hybrid queries and add comprehensive filter expression tests.

Changes

Tag Filter Parsing Fix

Use '|' separator for tag search queries instead of the index-defined separator
Allows consistent @field:{tag1|tag2|tag3} syntax in all queries regardless of index configuration
Previously used the separator from index creation, causing inconsistencies

New Test Coverage

Add test_filter_expressions.py with 23 test cases covering:

Tag filters with OR syntax (@field:{tag1|tag2|tag3})
All 9 numeric range variants (inclusive/exclusive bounds, infinity, equality)
Logical negation (-@field:{value})
Operator precedence (negation > AND > OR)
Hybrid queries (filters + vector search)
Complex multi-filter queries

This ensures the tag OR syntax works correctly across all query patterns documented in COMMANDS.md.

- Add comprehensive test suite (test_filter_expressions.py) covering: * Tag filters with OR syntax (@field:{tag1|tag2|tag3}) * Numeric ranges (9 variants: inclusive/exclusive bounds, infinity) * Logical negation, operator precedence, hybrid queries * 23 test cases validating parser correctness - Fix tag OR separator: use ‘|' instead of index separator Signed-off-by: Zvi Schneider <[email protected]>

allenss-amazon · 2025-11-05T23:45:09Z

src/commands/filter_parser.cc

+  // separator used when the index was created. This allows users to specify
+  // multiple tags using the syntax: @field:{tag1|tag2|tag3}
+  return indexes::Tag::ParseSearchTags(tag_string, '|');


While this is correct. It leaves open the situation of parsing a tag that itself contains a pipe character. I think the tag parsing code should be updated to handle escaping of the separator character.

@allenss-amazon , Do we have somewhere definition of valid\invalid characters for tags? Can "@country:{USA|GBR|CAN}=>[KNN 5 @embedding $vec]" be a tag?

We're trying to stay aligned with RediSearch. Browsing their documentation, I found the following.

allenss-amazon · 2025-11-05T23:56:51Z

integration/test_filter_expressions.py

+        query_vec = struct.pack('3f', 0.9, 0.1, 0.0)
+
+        # Test hybrid query with tag OR syntax: @country:{USA|GBR|CAN}=>[KNN 5 @embedding $vec]
+        result = client.execute_command(
+            "FT.SEARCH", "hybrid_idx",
+            "@country:{USA|GBR|CAN}=>[KNN 5 @embedding $vec]",
+            "PARAMS", "2", "vec", query_vec,
+            "RETURN", "1", "country"
+        )
+
+        # Should return up to 3 results (filtered by country tag)
+        assert result[0] >= 1 and result[0] <= 3
+
+        # Verify all results match the country filter
+        for i in range(1, len(result), 2):
+            key = result[i].decode('utf-8')
+            assert key in ["doc:1", "doc:2", "doc:3"], f"Unexpected key {key}"


Yair pointed out that there are two paths for hybrid queries. I believe we can force the logic to use one way or the other configuring the decision threshold (and there are counters to validate which way it went). Can we update the tests that have hybrid queries to force both paths?

It would also be good to repeat the vector tests with both HNSW and FLAT, again different code paths.

Another issue that's been reported is that the order of the results is unexpected, it's possible that we're getting different ordering of the results for vector queries.

what do you mean? I have these 2:

"@Country:{USA|GBR|CAN}=>[KNN 5 @Embedding $vec]"

"@Country:{USA} | @Country:{GBR} | @Country:{CAN}=>[KNN 5 @Embedding $vec]"
Anything else?

My assumption was that the higher levels of parsing is done correctly, and the change does not affect those, but if I can extend the testing to check logically composed queries, like query1 | query2 | query3, where query-i is a valid filter by itself, is this your intent?

I'm trying to get alignment in capabilities between the ingestion and query subsystems. Assuming that this PR is merged more or less as-is, I believe we're in a state where a tag is forbidden from containing two characters: | and }. I note that this isn't an ingestion problem as we allow the redefinition of the tag separator character. This means that as long as your tags don't contain all 256 bytes, you can ingest all of them. However, you can't query for a tag that contains a pipe or a right brace -- because the parser has no way to treat these as part of a tag, they are hard-coded as delimiters.

I'm going to open an enhancement issue to specify that the query tag parsing logic be updated to support escaping of these two characters. We might roll the fix for this into the future Unicoding of the query logic.

@allenss-amazon, Our documentation explicitly specifies the syntax query operator and these have nothing todo with the tag delimiter. I agree we need to have complete query syntax, I don't think we should connect the 2. This is a fix for a bug with respect to current declared usage.

I agree that we can split this. I've opened a separate issue for the future enhancement to do the escape parsing.

This is captured in the following issue #454

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

zvi-code requested review from allenss-amazon, Copilot and yairgott and removed request for Copilot November 5, 2025 21:07

zvi-code force-pushed the add-filter-expression-tests branch from f4ede02 to af91639 Compare November 5, 2025 21:25

allenss-amazon reviewed Nov 5, 2025

View reviewed changes

Merge branch 'main' into add-filter-expression-tests

5c9c4d7

Copilot AI review requested due to automatic review settings November 7, 2025 15:03

Copilot AI reviewed Nov 7, 2025

View reviewed changes

Merge branch 'main' into add-filter-expression-tests

b36165c

Fix tag OR syntax in hybrid queries and add filter expression tests #452

Are you sure you want to change the base?

Fix tag OR syntax in hybrid queries and add filter expression tests #452

Uh oh!

Conversation

zvi-code commented Nov 5, 2025

Summary

Changes

Tag Filter Parsing Fix

New Test Coverage

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants