-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support OpenSearch ip field type #1044
base: main
Are you sure you want to change the base?
Conversation
58ef81b
to
0174ca5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wonder any benefits for UDT approach if we build functions on IP type in future?
flint-spark-integration/src/main/scala/org/apache/spark/sql/flint/datatype/FlintDataType.scala
Show resolved
Hide resolved
If we expect very intensive comparison on ip address field, UDT could perform better since it can store the IP address data in more optimized way (like byte array), and don't need to parse text every time. |
val ip4 = "2001:db8::1234:5678" | ||
checkAnswer(df, Seq(Row(ip0, ip2), Row(ip1, ip2), Row(ip3, ip4))) | ||
|
||
df = spark.sql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does = push down to DSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is pushed down.
Further to this conversation, #3145 added support for an IP address data type in OpenSearch SQL (corresponding pull request). In OpenSearch SQL, the IP address data type:
For consistency between OpenSearch SQL and Spark, it probably makes sense to have the same behaviour in OpenSearch Spark - or at least to work towards it? I'm not particularly with how UDT's work in Spark, but I assume this would be necessary to implement the above functionality? Note that there is also an existing |
As mentioned in my comment above, there is an existing |
Thank you very much for the extra context. |
I did some more investigation regarding UDT and tried if we can utilize it to realize similar functionality as in SQL plugin. Refer this commit for the implementation.
I'll try if implicit type coercion is possible, but let me know your thoughts. |
I love this UDT implementation. We will try UDT in Calcite too. |
I found the implementation of equality check, and we cannot override the behavior unless we make change to Spark repository. Equality is checked by generated code, and executed in executor. The code generation is defined in CodeGenerator.genEqual
If we look at I think it is better storing it as String and prepare matching functions for ip address if users need. Could you give your opinion? |
|
@ykmr1224 I see some UDT example like this: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala. Can the custom |
I initially refer that example and thought we can override the equal method, but as I mentioned above, it didn't work and I was able to identify the actual implementation where override is not considered. |
Recap of the comparison between storing ip field as string and UDT. I prefer StringType considering the simpler notation for condition and predicate push down.
Query Example:
Query Example:
|
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
0174ca5
to
9e46a80
Compare
In Couple questions.
|
I suppose it is possible (haven't been able to realize it), but we would need special logic to push this down.
Literal value would be handled in the function. And the function can ignore invalid IP address or raise error depending on the requirements.
I found Spark doesn't support UDT in create table statement. |
Description
ip
field typeip
field to be stored as StringType.Comparison of the possible approaches
String Type (
StringType
)User Defined Type (UDT)
Binary Type (
BinaryType
)Related Issues
#1034
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.