[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013

zabetak · 2024-10-23T10:07:01Z

Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.

…re repeated in projections Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.

caicancai

Overall LGTM

caicancai · 2024-10-23T14:31:48Z

core/src/main/java/org/apache/calcite/rel/metadata/RelMdUniqueKeys.java

-
-      resultBuilder.addAll(Util.transform(product, ImmutableBitSet::union));
+      // select key1, key1, val1, val2, key2 from ...
+      // the resulting unique keys would be {{0},{4}}, {{1},{4}}


Maybe this part of the javadoc can be improved a little bit. When I first read this part of the javadoc, I didn’t understand what {0},{4} meant.

for example,

// Select fields key1, key1, val1, val2, and key2 // The query results will return records with unique key combinations // Example of unique key combinations: // {{0}, {4}} indicates that the key1 value of the first record is 0, and the key2 value is 4 // {{1}, {4}} indicates that the key1 value of the second record is 1, and the key2 value is 4

I am afraid that the above suggestion is incorrect. The UniqueKeys metadata does not return record/row values but column ordinals.

To understand these comments it is important to have read the Javadoc of the UniqueKeys interface before.
{0} refers to the column at position zero i.e., the first occurrence of column key1 in the query.
{1}refers to the column at position one i.e., the second occurrence of column key1 in the query.
...
{4}refers to the column at position four i.e., column key2 in the query.

If this is not clear from the Javadoc of the UniqueKeys interface then we should improve that part of the documentation and not this internal low-level comments. The part here assumes that the developer understands what the result of this method is.

Thank you very much for your answer, I agree with you

soumyakanti3578 · 2024-10-23T17:48:19Z

+1 on the request to improve documentation. Otherwise, this LGTM!

…y in tests

zabetak · 2024-10-29T14:24:39Z

@soumyakanti3578 Can you please elaborate a bit more about the doc improvements that you would like to see? Thanks.

sonarcloud · 2024-10-29T15:02:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

soumyakanti3578 · 2024-10-29T16:30:49Z

@zabetak It felt to me that it is a bit difficult to understand the documentation explaining the unique key combinations. But I agree with your comment above that this is not the right place to add more detailed documentation. So please ignore my comments regarding doc above. Thanks!

julianhyde

You've added a test for minimality but I would go further - make sure every value returned by any RelMdUniqueKeys provider is minimal. I think the execution cost will be (pardon the pun) minimal.

Consider adding a method static ImmutableBitSet areMinimal(Iterable<ImmutableBitSet>) (or List<ImmutableBitSet> if it's more efficient). I have a feeling that the current implementation makes N^2 tests but an implementation could make N * (N - 1) / 2 tests (less than half as many).

julianhyde · 2024-10-29T17:38:42Z

core/src/main/java/org/apache/calcite/util/ImmutableBitSet.java

@@ -978,6 +978,18 @@ public static boolean allContain(Collection<ImmutableBitSet> bitSets, int bit) {
    return true;
  }

+  /**
+   * Returns whether this is a minimal set with respect to the specified collection of bitSets.
+   */


Can you give an example in the javadoc?

zabetak · 2024-10-30T14:52:50Z

@julianhyde Putting minimality logic on every return of the RelMdUniqueKeys handler doesn't feel right to me. Even if the overhead is minimal why adding seemingly redundant code?

Moreover, if the handler goes rogue and starts to generate not minimal keys at some place then chances are that we are going to fail before even arriving to the minimality check/filter.

I don't mind adding the checks/fiters if you feel strongly about it but I see more cons than pros in this approach.

For the record, we already have RelMdUniqueKeys#filterSupersets that is currently used by the Aggregate handler to ensure that keys are minimal. If decide to apply the filter in every other handler then I guess we don't need another method in ImmutableBitSet and probably don't need the minimality check in the tests either.

[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns a…

b5e2fd2

…re repeated in projections Avoid generating non-minimal/redundant key sets when computing the unique keys for columns that are repeated in the output.

zabetak force-pushed the CALCITE-6640 branch from f0de345 to b5e2fd2 Compare October 23, 2024 12:44

caicancai approved these changes Oct 23, 2024

View reviewed changes

NobiGo approved these changes Oct 25, 2024

View reviewed changes

Add new ImmutableBitSet#isMinimal API and check unique keys minimalit…

66ab17c

…y in tests

julianhyde requested changes Oct 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013

[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013

zabetak commented Oct 23, 2024

caicancai left a comment

caicancai Oct 23, 2024

caicancai Oct 23, 2024

zabetak Oct 29, 2024

caicancai Oct 29, 2024

soumyakanti3578 commented Oct 23, 2024

zabetak commented Oct 29, 2024

sonarcloud bot commented Oct 29, 2024

soumyakanti3578 commented Oct 29, 2024

julianhyde left a comment

julianhyde Oct 29, 2024

zabetak commented Oct 30, 2024

[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013

Are you sure you want to change the base?

[CALCITE-6640] RelMdUniqueKeys grows exponentially when key columns are repeated in projections #4013

Conversation

zabetak commented Oct 23, 2024

caicancai left a comment

Choose a reason for hiding this comment

caicancai Oct 23, 2024

Choose a reason for hiding this comment

caicancai Oct 23, 2024

Choose a reason for hiding this comment

zabetak Oct 29, 2024

Choose a reason for hiding this comment

caicancai Oct 29, 2024

Choose a reason for hiding this comment

soumyakanti3578 commented Oct 23, 2024

zabetak commented Oct 29, 2024

sonarcloud bot commented Oct 29, 2024

Quality Gate passed

soumyakanti3578 commented Oct 29, 2024

julianhyde left a comment

Choose a reason for hiding this comment

julianhyde Oct 29, 2024

Choose a reason for hiding this comment

zabetak commented Oct 30, 2024