-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013
base: main
Are you sure you want to change the base?
Conversation
…re repeated in projections Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
|
||
resultBuilder.addAll(Util.transform(product, ImmutableBitSet::union)); | ||
// select key1, key1, val1, val2, key2 from ... | ||
// the resulting unique keys would be {{0},{4}}, {{1},{4}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this part of the javadoc can be improved a little bit. When I first read this part of the javadoc, I didn’t understand what {0},{4} meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example,
// Select fields key1, key1, val1, val2, and key2
// The query results will return records with unique key combinations
// Example of unique key combinations:
// {{0}, {4}} indicates that the key1 value of the first record is 0, and the key2 value is 4
// {{1}, {4}} indicates that the key1 value of the second record is 1, and the key2 value is 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that the above suggestion is incorrect. The UniqueKeys
metadata does not return record/row values but column ordinals.
To understand these comments it is important to have read the Javadoc of the UniqueKeys
interface before.
{0}
refers to the column at position zero i.e., the first occurrence of column key1 in the query.
{1}
refers to the column at position one i.e., the second occurrence of column key1 in the query.
...
{4}
refers to the column at position four i.e., column key2 in the query.
If this is not clear from the Javadoc of the UniqueKeys
interface then we should improve that part of the documentation and not this internal low-level comments. The part here assumes that the developer understands what the result of this method is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your answer, I agree with you
+1 on the request to improve documentation. Otherwise, this LGTM! |
@soumyakanti3578 Can you please elaborate a bit more about the doc improvements that you would like to see? Thanks. |
Quality Gate passedIssues Measures |
@zabetak It felt to me that it is a bit difficult to understand the documentation explaining the unique key combinations. But I agree with your comment above that this is not the right place to add more detailed documentation. So please ignore my comments regarding doc above. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've added a test for minimality but I would go further - make sure every value returned by any RelMdUniqueKeys provider is minimal. I think the execution cost will be (pardon the pun) minimal.
Consider adding a method static ImmutableBitSet areMinimal(Iterable<ImmutableBitSet>)
(or List<ImmutableBitSet>
if it's more efficient). I have a feeling that the current implementation makes N^2
tests but an implementation could make N * (N - 1) / 2
tests (less than half as many).
@@ -978,6 +978,18 @@ public static boolean allContain(Collection<ImmutableBitSet> bitSets, int bit) { | |||
return true; | |||
} | |||
|
|||
/** | |||
* Returns whether this is a minimal set with respect to the specified collection of bitSets. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example in the javadoc?
@julianhyde Putting minimality logic on every return of the Moreover, if the handler goes rogue and starts to generate not minimal keys at some place then chances are that we are going to fail before even arriving to the minimality check/filter. I don't mind adding the checks/fiters if you feel strongly about it but I see more cons than pros in this approach. For the record, we already have RelMdUniqueKeys#filterSupersets that is currently used by the |
Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.