-
Notifications
You must be signed in to change notification settings - Fork 940
Normalize SQL IN(?, ?, ...) statements to "in(?)" to reduce cardinality of db.statement attribute #10564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize SQL IN(?, ?, ...) statements to "in(?)" to reduce cardinality of db.statement attribute #10564
Changes from 2 commits
c8dbb13
34e7a79
b5c215d
0c7fd10
c52df11
26c22c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -5,6 +5,8 @@ | |||||
|
||||||
package io.opentelemetry.instrumentation.api.incubator.semconv.db; | ||||||
|
||||||
import java.util.regex.Pattern; | ||||||
|
||||||
%% | ||||||
|
||||||
%final | ||||||
|
@@ -52,6 +54,9 @@ WHITESPACE = [ \t\r\n]+ | |||||
// max length of the sanitized statement - SQLs longer than this will be trimmed | ||||||
static final int LIMIT = 32 * 1024; | ||||||
|
||||||
private static final Pattern IN_STATEMENT_PATTERN = Pattern.compile("\\sin\\s*\\(\\s*\\?[\\s?,]*?\\)", Pattern.CASE_INSENSITIVE); | ||||||
private static final String IN_STATEMENT_NORMALIZED = " in(?)"; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Sanitizer does not change case or remove whitespace from the original query. Lets keep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, I updated it to preserve case and whitespace, and updated the test cases to check for that. I didn't add a test case for more than one space between Also switched to a non-capturing group for matching on the part in-between the brackets as a small optimization |
||||||
|
||||||
private final StringBuilder builder = new StringBuilder(); | ||||||
|
||||||
private void appendCurrentFragment() { | ||||||
|
@@ -278,7 +283,8 @@ WHITESPACE = [ \t\r\n]+ | |||||
builder.delete(LIMIT, builder.length()); | ||||||
} | ||||||
String fullStatement = builder.toString(); | ||||||
return operation.getResult(fullStatement); | ||||||
String normalizedStatement = IN_STATEMENT_PATTERN.matcher(fullStatement).replaceAll(IN_STATEMENT_NORMALIZED); | ||||||
return operation.getResult(normalizedStatement); | ||||||
} | ||||||
|
||||||
%} | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also matches inputs like
in (?,,,???)
perhaps using"(\\sin\\s*)\\(\\s*\\?\\s*(,\\s*\\?\\s*)*\\)"
and replacing with"$1(?)"
would be better. These regular expressions are hard to parse, maybe we should try to document them to make them easier to understand?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the
(,\\s*\\?\\s*)*
part of that pattern causes a stack overflow forIN
statements with many valuesLet me know if matching invalid syntax like
in (?,,,???)
is ok since it'd hide info helpful for debugging bad queries. I'd guess that info's available in most sql library stack traces thoughAlso, added some documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a possessive quantifier should fix this, try
"(\\sin\\s*)\\(\\s*\\?\\s*(,\\s*\\?\\s*)*+\\)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that solves the problem