Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7866][CH] Fallback for unsupported regex in re2 #7867

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

exmy
Copy link
Contributor

@exmy exmy commented Nov 8, 2024

What changes were proposed in this pull request?

(Fixes: #7866)

How was this patch tested?

add ut

Copy link

github-actions bot commented Nov 8, 2024

#7866

Copy link

github-actions bot commented Nov 8, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Nov 8, 2024

Run Gluten Clickhouse CI on x86

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 8, 2024

@exmy, I'm not familiar with CH backend code. In velox backend, we have a validation function on native side (see below link). In its implementation, RE2 tries to compile literal regex pattern, and then some known unsupported or incompatible cases are filter out even though compiling can pass. Can this handling be a reference to you to fix your issues?

https://github.com/apache/incubator-gluten/blob/main/cpp/velox/utils/Common.cc#L55

@exmy
Copy link
Contributor Author

exmy commented Nov 8, 2024

@exmy, I'm not familiar with CH backend code. In velox backend, we have a validation function on native side (see below link). In its implementation, RE2 tries to compile literal regex pattern, and then some known unsupported or incompatible cases are filter out even though compiling can pass. Can this handling be a reference to you to fix your issues?

https://github.com/apache/incubator-gluten/blob/main/cpp/velox/utils/Common.cc#L55

Thanks for advice! It's indeed a better way to validate on native side, but ch backend currently does not support it. Maybe this way wil be adopted later as well.

BTW, I noticed it seems missing validate for split expression in https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L56. The second parameter for split is also a regex string.

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 8, 2024

BTW, I noticed it seems missing validate for split expression in https://github.com/apache/incubator-gluten/blob/main/cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc#L56. The second parameter for split is also a regex string.

@exmy, thanks for letting me know this! Currently, this function's implementation is not fully ready in Velox. I have created an issue to track this: #7872.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] fallback for unsupported regex in re2
2 participants