-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Type Coercion for UDF Arguments #14268
Changes from 17 commits
bcc0620
a8539c1
4e4cb02
eb61d49
a6f62a0
5afbfc0
36a23b7
d2eadea
944e0a3
2d77206
1a27626
e714ba1
93d75b1
5067223
4d395e2
ec8ccd1
d78877a
041e4ac
437b83d
8fd9fb3
62b97c5
46350c9
3f0c870
46cda71
e7d474d
1f826cb
5d258b2
17df1bc
97c7db0
d0f3f9a
bb773d5
08d9b4a
09ef47c
f519d92
1f806b2
25bcb58
1d94e5c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,9 +20,12 @@ use arrow::datatypes::DataType; | |
use std::any::Any; | ||
|
||
use crate::utils::utf8_to_int_type; | ||
use datafusion_common::types::logical_string; | ||
use datafusion_common::{exec_err, Result, ScalarValue}; | ||
use datafusion_expr::{ColumnarValue, Documentation, Volatility}; | ||
use datafusion_expr::{ScalarUDFImpl, Signature}; | ||
use datafusion_expr::{ | ||
ColumnarValue, Documentation, ScalarUDFImpl, Signature, TypeSignatureClass, | ||
Volatility, | ||
}; | ||
use datafusion_macros::user_doc; | ||
|
||
#[user_doc( | ||
|
@@ -55,7 +58,10 @@ impl Default for BitLengthFunc { | |
impl BitLengthFunc { | ||
pub fn new() -> Self { | ||
Self { | ||
signature: Signature::string(1, Volatility::Immutable), | ||
signature: Signature::coercible( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as a minor comment this new signature is quite a bit of a mouthful compared to It isn't clear to me from reading the code / comments when one would prefer to use one over the other (not related to this PR) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See here: But yes I agree, the documentation should be clearer. I had to dig into the code to get a solid understanding of the behavior. |
||
vec![TypeSignatureClass::Native(logical_string())], | ||
Volatility::Immutable, | ||
), | ||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,10 +21,12 @@ use arrow::compute::contains as arrow_contains; | |
use arrow::datatypes::DataType; | ||
use arrow::datatypes::DataType::{Boolean, LargeUtf8, Utf8, Utf8View}; | ||
use datafusion_common::exec_err; | ||
use datafusion_common::types::logical_string; | ||
use datafusion_common::DataFusionError; | ||
use datafusion_common::Result; | ||
use datafusion_expr::{ | ||
ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility, | ||
ColumnarValue, Documentation, ScalarUDFImpl, Signature, TypeSignature, | ||
TypeSignatureClass, Volatility, | ||
}; | ||
use datafusion_macros::user_doc; | ||
use std::any::Any; | ||
|
@@ -59,7 +61,16 @@ impl Default for ContainsFunc { | |
impl ContainsFunc { | ||
pub fn new() -> Self { | ||
Self { | ||
signature: Signature::string(2, Volatility::Immutable), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jayzhan211 / @notfilippo should we deprecate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I'm quite confuse why we need to change to coercible(string, string) in this PR, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For |
||
signature: Signature::one_of( | ||
vec![ | ||
TypeSignature::String(2), | ||
TypeSignature::Coercible(vec![ | ||
TypeSignatureClass::Native(logical_string()), | ||
TypeSignatureClass::Native(logical_string()), | ||
]), | ||
], | ||
Volatility::Immutable, | ||
), | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean others function that used Coercible String now also cast integer to string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's
TypeSignature::Coercible
with aTypeSignatureClass::Native(logical_string())
, then yes. Any function that specifiesTypeSignature::Coercible
with aTypeSignatureClass::Native
should coerce according to the behavior implemented in thedefault_cast_for
function forNativeType
in order to be consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeSignature::Coercible
is designed to cast between the same logical type, but now it casts to any kind of type, I don't think this is ideal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, in this PR,
SELECT bit_length(12);
now return 16 instead of error, but I think we should error. Any unexpected type is valid which doesn't seem correctThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the doc comments say:
Link:
datafusion/datafusion/expr-common/src/signature.rs
Lines 127 to 134 in a4917d4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shehabgamin
I suggest with revert to the previous change and add binary->string casting given most of the string function allow this kind of casting, but not int->string like
ascii(2)
For Signature::String, we can make it a convenient wrapper of
Coercible(string, string)
or deprecate it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a rule of thumb, stick to either Postgres or DuckDB for functions, and avoid overly broad type casting that allows any type to convert to any other. This can turn invalid queries into valid ones, making errors harder to catch, especially when tests don’t cover all cases.
Based on that,
repeat
)