Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] fallback for unsupported regex in re2 #7866

Open
exmy opened this issue Nov 8, 2024 · 4 comments · May be fixed by #7867
Open

[CH] fallback for unsupported regex in re2 #7866

exmy opened this issue Nov 8, 2024 · 4 comments · May be fixed by #7867
Labels
bug Something isn't working triage

Comments

@exmy
Copy link
Contributor

exmy commented Nov 8, 2024

Backend

CH (ClickHouse)

Bug description

OptimizedRegularExpression: cannot compile re2: d(?!d), error: invalid perl operator: (?!. Look at https://github.com/google/re2/wiki/Syntax for reference.

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@exmy exmy added bug Something isn't working triage labels Nov 8, 2024
@exmy exmy linked a pull request Nov 8, 2024 that will close this issue
@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 8, 2024

Does CH also use re2?

@PHILO-HE we may reuse the preprocessing script as well as the skip list.

@PHILO-HE
Copy link
Contributor

@PHILO-HE we may reuse the preprocessing script as well as the skip list.

@FelixYBW, currently, we have no script, but have a pre-validation function on Gluten native side. It validate all listed regrex functions by letting RE2 try to compile the pattern. If the compilation fails, we definitely need to make it fall back. This pre-validation can be reused by both backend.

@FelixYBW
Copy link
Contributor

@FelixYBW, currently, we have no script, but have a pre-validation function on Gluten native side. It validate all listed regrex functions by letting RE2 try to compile the pattern. If the compilation fails, we definitely need to make it fall back. This pre-validation can be reused by both backend.

I remember we have some preprocessing of the pattern before we send to re2, which can make more pattern workable on re2. Is the code still there?

@PHILO-HE
Copy link
Contributor

I remember we have some preprocessing of the pattern before we send to re2, which can make more pattern workable on re2. Is the code still there?

@FelixYBW, I just found Velox has such preprocessing for presto: see code. And Meituan also proposed a pr to re-use this code for Spark and also make some improvement. See facebookincubator/velox#10981.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants