Replies: 3 comments 16 replies
-
@Yohahaha I didn't know Spark allows this. Curious if you could share some pointer to documentation and example queries. Assuming ignore-nulls is such parameter, I agree with you that Velox needs to provide 2 separate function implementations. |
Beta Was this translation helpful? Give feedback.
-
Makes sense to me. Do we have an example of an aggregate function that needs access to session config? |
Beta Was this translation helpful? Give feedback.
-
I see Presto has similar |
Beta Was this translation helpful? Give feedback.
-
Background Info
When add
first/last
Spark aggregate function, @mbasmanova points out there are some inconsistent behaviors for scalar function and aggregate function to provide configureable implementation, see #4578 (comment)Spark size function has
spark.sql.legacy.sizeOfNull
config to control its behavior, and Velox provide a QueryConfig for it https://facebookincubator.github.io/velox/configs.html#spark-legacy-size-of-null.velox/velox/functions/sparksql/Size.cpp
Lines 30 to 35 in e1305d4
Spark first aggregate function provide a ignoreNull parameter in SQL usage:
select first(expr, ignoreNull) from tmp
. However, Velox aggregate function does not have any config class to initialize itself compared with scalar function.Solutions
For function parameters in SQL
Parameters will be parsed by parser as each function's own member, does not shared in session or query. These functions should provide different implementations to fulfill it, for example,
select first(expr) as not_ignore_null_expr, first(expr, true) as ignore_null_expr from tmp
, downstream engine which use Velox for accelerating should invoke first() and first_ignore_null() separately.For function depends on config
Some functions depends on static/session config to exhibit different behavior, we should pass a config class in its initialize method. For example, Spark
size
function depends on spark.sql.legacy.sizeOfNull, Prestodate_add
function depends on session timezone.However, we lack of the ability for pass session config to aggregate functions. Consider if someone wants add a new aggregate function depends on session timezone, how could we implement it?
CC @mbasmanova
Beta Was this translation helpful? Give feedback.
All reactions