Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Type Coercion for UDF Arguments #14268

Closed
wants to merge 37 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
bcc0620
Fix DF 43 regression coerce ascii input to string
shehabgamin Jan 22, 2025
a8539c1
datafusion-testing submodule has new commits
shehabgamin Jan 22, 2025
4e4cb02
Merge branch 'main' of github.com:lakehq/datafusion into sail-df-43-r…
shehabgamin Jan 23, 2025
eb61d49
implicit cast int to string
shehabgamin Jan 24, 2025
a6f62a0
Merge branch 'main' of github.com:lakehq/datafusion into sail-df-43-r…
shehabgamin Jan 24, 2025
5afbfc0
fix can_coerce_to and add tests
shehabgamin Jan 24, 2025
36a23b7
update deprecation message for values exec
shehabgamin Jan 24, 2025
d2eadea
Merge branch 'main' of github.com:lakehq/datafusion into sail-df-43-r…
shehabgamin Jan 24, 2025
944e0a3
lint
shehabgamin Jan 24, 2025
2d77206
coerce to string
shehabgamin Jan 24, 2025
1a27626
Adjust type signature
shehabgamin Jan 24, 2025
e714ba1
fix comment
shehabgamin Jan 24, 2025
93d75b1
clean up clippy warnings
shehabgamin Jan 25, 2025
5067223
type signature coercible
shehabgamin Jan 25, 2025
4d395e2
bump pyo3
shehabgamin Jan 25, 2025
ec8ccd1
udf type coercion
shehabgamin Jan 25, 2025
d78877a
moving testing out of functions crate due to circular dependencies
shehabgamin Jan 25, 2025
041e4ac
Merge branch 'main' of github.com:lakehq/datafusion into sail-df-43-r…
shehabgamin Jan 26, 2025
437b83d
find the common string type for TypeSignature::Coercible
shehabgamin Jan 26, 2025
8fd9fb3
update coercible string
shehabgamin Jan 26, 2025
62b97c5
fix error msg and add tests for udfs
shehabgamin Jan 26, 2025
46350c9
update docs to note that args are coercible string
shehabgamin Jan 26, 2025
3f0c870
update expr test
shehabgamin Jan 26, 2025
46cda71
undo
shehabgamin Jan 27, 2025
e7d474d
Merge branch 'main' of github.com:lakehq/datafusion into sail-df-43-r…
shehabgamin Jan 27, 2025
1f826cb
remove test since already covered in slt
shehabgamin Jan 27, 2025
5d258b2
add dictionary to base_yupe
shehabgamin Jan 27, 2025
17df1bc
add dictionary to base_type
shehabgamin Jan 27, 2025
97c7db0
fix lint issues
shehabgamin Jan 28, 2025
d0f3f9a
Merge branch 'main' of github.com:lakehq/datafusion into sail-df-43-r…
shehabgamin Feb 1, 2025
bb773d5
implement TypeSignatureClass::Integer
shehabgamin Feb 1, 2025
08d9b4a
fix slt test
shehabgamin Feb 1, 2025
09ef47c
fix slt
shehabgamin Feb 1, 2025
f519d92
Merge branch 'branch-45' of github.com:lakehq/datafusion into sail-df…
shehabgamin Feb 4, 2025
1f806b2
revert native to original and add anynative
shehabgamin Feb 4, 2025
25bcb58
use anynative
shehabgamin Feb 4, 2025
1d94e5c
use user defined rules instead
shehabgamin Feb 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datafusion/common/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ log = { workspace = true }
object_store = { workspace = true, optional = true }
parquet = { workspace = true, optional = true, default-features = true }
paste = "1.0.15"
pyo3 = { version = "0.23.3", optional = true }
pyo3 = { version = "0.23.4", optional = true }
recursive = { workspace = true, optional = true }
sqlparser = { workspace = true }
tokio = { workspace = true }
Expand Down
6 changes: 6 additions & 0 deletions datafusion/common/src/types/native.rs
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,12 @@ impl NativeType {
)
}

#[inline]
pub fn is_binary(&self) -> bool {
use NativeType::*;
matches!(self, Binary | FixedSizeBinary(_))
}

#[inline]
pub fn is_timestamp(&self) -> bool {
matches!(self, NativeType::Timestamp(_, _))
Expand Down
1 change: 1 addition & 0 deletions datafusion/common/src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,7 @@ pub fn base_type(data_type: &DataType) -> DataType {
DataType::List(field)
| DataType::LargeList(field)
| DataType::FixedSizeList(field, _) => base_type(field.data_type()),
DataType::Dictionary(_, value_type) => base_type(value_type),
_ => data_type.to_owned(),
}
}
Expand Down
25 changes: 17 additions & 8 deletions datafusion/expr-common/src/signature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,17 @@ pub enum TypeSignature {
Exact(Vec<DataType>),
/// One or more arguments belonging to the [`TypeSignatureClass`], in order.
///
/// For example, `Coercible(vec![logical_float64()])` accepts
/// arguments like `vec![Int32]` or `vec![Float32]`
/// since i32 and f32 can be cast to f64
/// `Coercible(vec![TypeSignatureClass::Native(...)])` is designed to cast between the same
/// logical type.
///
/// For example, `Coercible(vec![TypeSignatureClass::Native(logical_int64())])` accepts
/// arguments like `vec![Int32]` since i32 is the same logical type as i64.
///
/// Coercible(vec![TypeSignatureClass::Integer(...)])` accepts any
/// integer type (`NativeType::is_integer`) castable to the target integer `NativeType`.
///
/// For example, `Coercible(vec![TypeSignatureClass::Integer(logical_int64())])` accepts
/// arguments like `vec![Int32]` since i32 can be cast to i64.
///
/// For functions that take no arguments (e.g. `random()`) see [`TypeSignature::Nullary`].
Coercible(Vec<TypeSignatureClass>),
Expand Down Expand Up @@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
Interval,
Duration,
Native(LogicalTypeRef),
// TODO:
// Numeric
// Integer
Numeric(LogicalTypeRef),
Integer(LogicalTypeRef),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shehabgamin I found that we might not need LogicalTypeRef.
This is designed to accept all the Integer, so if the given type is integer, we keep it as it is.

If we want specific integer type, then we should use Native instead. Does this makes sense to you?

Numeric is the same

}

impl Display for TypeSignatureClass {
Expand Down Expand Up @@ -376,7 +383,9 @@ impl TypeSignature {
TypeSignature::Coercible(types) => types
.iter()
.map(|logical_type| match logical_type {
TypeSignatureClass::Native(l) => get_data_types(l.native()),
TypeSignatureClass::Native(l)
| TypeSignatureClass::Numeric(l)
| TypeSignatureClass::Integer(l) => get_data_types(l.native()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_data_types is used only in get_possible_types

TypeSignatureClass::Timestamp => {
vec![
DataType::Timestamp(TimeUnit::Nanosecond, None),
Expand Down Expand Up @@ -498,7 +507,7 @@ impl Signature {
}
}

/// A specified number of numeric arguments
/// A specified number of string arguments
pub fn string(arg_count: usize, volatility: Volatility) -> Self {
Self {
type_signature: TypeSignature::String(arg_count),
Expand Down
Loading