Skip to content

Fix for converting SQL wildcard to Lucene syntax#21739

Open
nssuresh2007 wants to merge 2 commits into
opensearch-project:mainfrom
nssuresh2007:sql_wildcard_support
Open

Fix for converting SQL wildcard to Lucene syntax#21739
nssuresh2007 wants to merge 2 commits into
opensearch-project:mainfrom
nssuresh2007:sql_wildcard_support

Conversation

@nssuresh2007
Copy link
Copy Markdown
Contributor

Fix: SQL wildcard characters not converted to Lucene wildcards in WildcardQuerySerializer

Problem:

The WildcardQuerySerializer was passing SQL wildcard patterns directly to Lucene's WildcardQueryBuilder without conversion. SQL uses % (match any sequence) and _ (match
single character), but Lucene expects * and ? respectively. This caused the following integration tests to fail:

  • test_escaping_wildcard_percent_in_text
  • test_wildcard_query_sql_wildcard_underscore_conversion
  • test_wildcard_query_sql_wildcard_percent_conversion

Root Cause:

createQueryBuilder() passed the raw query pattern to WildcardQueryBuilder without translating SQL wildcard syntax to Lucene wildcard syntax.

Fix:

Added convertSqlWildcardToLucene() which translates the pattern before query construction:

  • % → *
  • _ → ?
  • % → literal % (escaped, not a wildcard)
  • _ → literal _ (escaped, not a wildcard)

Testing:

Added 5 unit tests covering basic conversion, escape handling, mixed escaped/unescaped patterns, and passthrough of native Lucene wildcards.

Related Issues

Resolves #21738

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Adding convertSqlWildcardToLucene method that converts
SQL wildcards (% → *, _ → ?) with escape handling.

Signed-off-by: Suresh N S <nssuresh@amazon.com>
@nssuresh2007 nssuresh2007 requested a review from a team as a code owner May 19, 2026 15:02
@github-actions github-actions Bot added bug Something isn't working labels May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit 221ef48)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Incorrect Escape Handling

When a backslash precedes a non-wildcard character (e.g., '\n'), the code preserves both the backslash and the character. However, Lucene's WildcardQuery interprets backslashes as escape characters. Passing '\n' to Lucene will cause it to treat 'n' as a literal, not as a backslash followed by 'n'. This breaks queries containing escaped non-wildcard characters. For example, a user searching for 'test\n' (literal backslash-n) will incorrectly match 'testn'.

default:
    result.append(ESCAPE);
    result.append(c);
    break;

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Code Suggestions ✨

Latest suggestions up to 221ef48

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for input parameter

Add null check for the text parameter to prevent NullPointerException. If text is
null, the method will fail when calling text.toCharArray(). Consider returning an
empty string or throwing an IllegalArgumentException with a descriptive message.

sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/serializers/WildcardQuerySerializer.java [39-77]

 private static String convertSqlWildcardToLucene(String text) {
+    if (text == null) {
+        throw new IllegalArgumentException("Input text cannot be null");
+    }
     final char ESCAPE = '\\';
     StringBuilder result = new StringBuilder(text.length());
     boolean escaped = false;
 
     for (char c : text.toCharArray()) {
         if (escaped) {
             switch (c) {
                 case '%':
                     result.append('%');
                     break;
                 case '_':
                     result.append('_');
                     break;
                 case ESCAPE:
                     result.append(ESCAPE);
                     break;
                 default:
                     result.append(ESCAPE);
                     result.append(c);
                     break;
             }
             escaped = false;
         } else if (c == ESCAPE) {
             escaped = true;
         } else if (c == '%') {
             result.append('*');
         } else if (c == '_') {
             result.append('?');
         } else {
             result.append(c);
         }
     }
     // Trailing backslash with nothing to escape — preserve it
     if (escaped) {
         result.append(ESCAPE);
     }
     return result.toString();
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check prevents potential NullPointerException when text is null. However, since operands.query() is the caller and likely validates input, this is a defensive programming improvement rather than a critical bug fix.

Medium

Previous suggestions

Suggestions up to commit 221ef48
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for input parameter

Add null check for the text parameter to prevent NullPointerException. If text is
null, the method will fail when calling text.toCharArray(). Consider returning an
empty string or throwing an IllegalArgumentException with a descriptive message.

sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/serializers/WildcardQuerySerializer.java [39-77]

 private static String convertSqlWildcardToLucene(String text) {
+    if (text == null) {
+        throw new IllegalArgumentException("Input text cannot be null");
+    }
     final char ESCAPE = '\\';
     StringBuilder result = new StringBuilder(text.length());
     boolean escaped = false;
 
     for (char c : text.toCharArray()) {
         if (escaped) {
             switch (c) {
                 case '%':
                     result.append('%');
                     break;
                 case '_':
                     result.append('_');
                     break;
                 case ESCAPE:
                     result.append(ESCAPE);
                     break;
                 default:
                     result.append(ESCAPE);
                     result.append(c);
                     break;
             }
             escaped = false;
         } else if (c == ESCAPE) {
             escaped = true;
         } else if (c == '%') {
             result.append('*');
         } else if (c == '_') {
             result.append('?');
         } else {
             result.append(c);
         }
     }
     // Trailing backslash with nothing to escape — preserve it
     if (escaped) {
         result.append(ESCAPE);
     }
     return result.toString();
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check prevents potential NullPointerException when text.toCharArray() is called. However, since this is a private method called from createQueryBuilder with operands.query(), the likelihood of null depends on the upstream validation. The suggestion improves defensive programming but may not be critical if the caller guarantees non-null values.

Medium
Suggestions up to commit b8da55a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix escape sequence handling logic

The escaped flag is not reset when a backslash is followed by a non-wildcard
character. This causes subsequent wildcards to be incorrectly treated as escaped.
For example, \a% would incorrectly convert % to % instead of *.

sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/serializers/WildcardQuerySerializer.java [43-71]

 for (char c : text.toCharArray()) {
     switch (c) {
         case ESCAPE:
-            escaped = true;
-            result.append(c);
+            if (escaped) {
+                result.append(c);
+                escaped = false;
+            } else {
+                escaped = true;
+            }
             break;
         case '%':
             if (escaped) {
-                result.deleteCharAt(result.length() - 1);
                 result.append('%');
+                escaped = false;
             } else {
                 result.append('*');
             }
-            escaped = false;
             break;
         case '_':
             if (escaped) {
-                result.deleteCharAt(result.length() - 1);
                 result.append('_');
+                escaped = false;
             } else {
                 result.append('?');
             }
-            escaped = false;
             break;
         default:
+            if (escaped) {
+                result.append(ESCAPE);
+                escaped = false;
+            }
             result.append(c);
-            escaped = false;
     }
 }
Suggestion importance[1-10]: 9

__

Why: The current implementation has a critical bug where escaped flag is not properly reset when a backslash precedes a non-wildcard character. This causes incorrect wildcard conversion. The suggested fix properly handles escape sequences including escaped backslashes and preserves backslashes before non-wildcard characters.

High
Handle backslash character correctly

Appending the backslash character immediately causes issues when it should be
removed for escaped wildcards. The backslash should only be appended when it's not
escaping a wildcard character, or when it's itself escaped.

sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/serializers/WildcardQuerySerializer.java [45-48]

 case ESCAPE:
-    escaped = true;
-    result.append(c);
+    if (escaped) {
+        result.append(c);
+        escaped = false;
+    } else {
+        escaped = true;
+    }
     break;
Suggestion importance[1-10]: 9

__

Why: The current code incorrectly appends the backslash immediately, which causes it to be deleted later when processing escaped wildcards. The suggested fix properly handles both escaped backslashes (\\) and backslashes used for escaping wildcards, ensuring correct behavior.

High

Signed-off-by: Suresh N S <nssuresh@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 221ef48

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 221ef48: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.34%. Comparing base (debf6ed) to head (221ef48).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21739      +/-   ##
============================================
+ Coverage     73.33%   73.34%   +0.01%     
+ Complexity    74996    74973      -23     
============================================
  Files          6012     6012              
  Lines        340934   340934              
  Branches      49076    49076              
============================================
+ Hits         250021   250075      +54     
+ Misses        71005    70969      -36     
+ Partials      19908    19890      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 221ef48

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 221ef48: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] SQL wildcard characters not converted to Lucene wildcards in WildcardQuerySerializer

1 participant