Skip to content

add json functions #3559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open

Conversation

xinyual
Copy link
Contributor

@xinyual xinyual commented Apr 17, 2025

Description

Here we add json related functions to align with spark

Here is the argument and description of functions

Functions argument description return type implementation
json json(value: ANY) return value if value is a json, otherwise, return null, e.g. json("123") = "123", json(json_object("1", "2")) = {"1": "2"}, json(1) = null the first argument type implement by ourselves
json_valid json_valid(value: ANY) return true if value is a json, otherwise, return false, e.g. json("123") = true, json(json_object("1", "2")) = true, json(1) = false boolean use calcite SqlStdOperatorTable.IS_JSON_VALUE
json_object json_object(key1: String, value1: ANY, key2: string, value2 ....) create an object using key value pairs, the key must be string string use calcite SqlStdOperatorTable.JSON_OBJECT
json_array json_array(value1: ANY, value2: ANY, ...) create an array using values string use SqlStdOperatorTable.JSON_ARRAY
json_array_length (this align with spark since splunk doesn't have this function) json_array_length(value: STRING) parse the string to json array and return size, if can't be transferred, return null integer implement by ourselves
json_extract json_extract(target: ANY, path1: STRING, path2:STRING...) it first transfer target to json, then extract value using paths. If only one path, return the value, otherwise, return the list of values. If one path cannot find value, return null as the result for this path. The path use "{}" to represent index for array, for e.g., demo = {"a": [{"b": 1}, {"b": 2}]}, json_extract(demo, "a{0}.b") = 1. "{}" means "{*}" if one path, we return string, otherwise, return list of string use calcite JsonFunctions::json_value, modify path to align with splunk
json_delete json_delete(target: ANY, path1: STRING, path2:STRING...) it first transfer target to json, then delete value using paths. return the object after deleting. If a path cannot find any value, skip it. string. align with splunk use calcite JsonFunctions::json_remove, modify path to align with splunk
json_set json_set(target: ANY, path1: STRING, value1: ANY, path2:STRING, value2:ANY...) it first transfer target to json, then set value to target path. The value can be set only when the parent node of target path is an object. Otherwise, skip it. For example, demo = {"a": [{"b": 1}, {"b": 2}]}, json_set(demo, "a{0}.b", 3) = {"a": [{"b": 3}, {"b": 2}]}. json_set(demo, "a{0}.b.d", 3) = {"a": [{"b": 1}, {"b": 2}]}. string.align with splunk use calcite JsonFunctions::json_set, modify path to align with splunk
json_append json_append(target: ANY, path1: STRING, value1: ANY, path2:STRING, value2:ANY...) it first transfer target to json, then append value to target path if the path points to an array. Otherwise, skip it. For example, demo = {"a": [{"b": 1}, {"b": 2}]}, json_append(demo, "a", 3) = {"a": [{"b": 1}, {"b": 2}, 3]}. json_append(demo, "a{0}.b", 3) = {"a": [{"b": 1}, {"b": 2}]}. string. align with splunk use calcite JsonFunctions::json_insert, modify path to align with splunk
json_extend json_extend(target: ANY, path1: STRING, value1: ANY, path2:STRING, value2:ANY...) it first transfer target to json, then extend value to target path if the path points to an array. Otherwise, skip it. For example, demo = {"a": [{"b": 1}, {"b": 2}]}, json_extend(demo, "a", json_array(1, 2)) = {"a": [{"b": 1}, {"b": 2}, 1, 2]}. json_extend(demo, "a{0}.b", 3) = {"a": [{"b": 1}, {"b": 2}]}. string. align with splunk use calcite JsonFunctions::json_insert, modify path to align with splunk
json_keys json_keys(target: ANY) it first transfer target to json object, then give the keys, otherwise, return null string use calcite JsonFunctions.jsonKeys

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#3565

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

xinyual added 5 commits April 17, 2025 15:02
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@LantaoJin
Copy link
Member

LantaoJin commented Apr 22, 2025

Why we didn't leverage the Calcite builtin JSON functions, for instance SqlStdOperatorTable.JSON_ARRAY

@xinyual
Copy link
Contributor Author

xinyual commented Apr 22, 2025

Why we didn't leverage the Calcite builtin JSON functions, for instance SqlStdOperatorTable.JSON_ARRAY

I have listed the functions above and try to reuse calcite code

Signed-off-by: xinyual <[email protected]>
import org.opensearch.sql.ppl.JsonFunctionsIT;

@Ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ignore IT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous V2 json return the undefined type while we return a string instead.


Usage: `json_array_length(value)` parse the string to json array and return size, if can't be parsed, return null

Argument type: value: STRING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument type: A JSON array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already update

Description
>>>>>>>>>>>

Usage: `json_array_length(value)` parse the string to json array and return size, if can't be parsed, return null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL is returned in case of any other valid JSON string, NULL or an invalid JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already update

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update doc,

json_array_length(value) parse the string to json array and return size,, null is returned in case of any other valid JSON string, null or an invalid JSON.

Comment on lines +707 to +711
| JSON_KEYS
| JSON_SET
| JSON_DELETE
| JSON_APPEND
| JSON_EXTEND
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docs for these functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already update it.

Description
>>>>>>>>>>>

Usage: `json_extract(json_string, path1, path2, ...)` it first transfer json_string to json, then extract value using paths. If only one path, return the value, otherwise, return the list of values. If one path cannot find value, return null as the result for this path. The path use "{<index>}" to represent index for array, "{}" means "{*}".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it first transfer json_string to json, then extract value using paths. If only one path, return the value, otherwise, return the list of values.

Extracts values using the specified JSON paths. If only one path is provided, it returns a single value. If multiple paths are provided, it returns a JSON Array in the order of the paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already update it.


Example::

> source=json_test | eval extract = json_extract('{\"a\": [{\"b\": 1}, {\"b\": 2}]}', 'a{}.b') | head 1 | fields extract
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\" escaple is not necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I update it.

+---------------------------------+
| test_json_array |
|---------------------------------|
| [[1,2],[{\"b\": 1}, {\"b\": 2}]]|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[1,2],[{\"b\": 1}, {\"b\": 2}]] is not a valid JSON array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update it. Please check the doc.

Signed-off-by: xinyual <[email protected]>
Description
>>>>>>>>>>>

Usage: `json_array_length(value)` parse the string to json array and return size, if can't be parsed, return null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update doc,

json_array_length(value) parse the string to json array and return size,, null is returned in case of any other valid JSON string, null or an invalid JSON.

Description
>>>>>>>>>>>

Usage: `json_extract(json_string, path1, path2, ...)` Extracts values using the specified JSON paths. If only one path is provided, it returns a single value. If multiple paths are provided, it returns a JSON Array in the order of the paths. If one path cannot find value, return null as the result for this path. The path use "{<index>}" to represent index for array, "{}" means "{*}".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is JSON Path standard? add a section to explain it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a standard json path. I would a section in this doc to describe this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already add the section at the top of doc. Please review it.


Example::

> source=json_test | eval delete = json_delete('{"a": [{"b": 1}, {"b": 2}]}', 'a{0}.b') | head 1 | fields delete
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_delete('{"a": [{"b": 1}, {"b": 2}]}', 'a{0}.b') should delete is {"b": 1}? and keep {"b": 2}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Write wrong result, already fix it.

| {"a": [{"b": 3}]} |
+-------------------------+

> source=json_test | eval jsonSet = json_set('{"a": [{"b": 1}, {"b": 2}]}', 'a{0}.b', 3, 'a{1}.b', 4) | head 1 | fields jsonSet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'a{1}.b', 4). should change {"b":2} to {"b":4}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write wrong result, already fix it.

| {"a": [{"b": 1}, 3]} |
+-------------------------+

> source=json_test | eval jsonAppend = json_append('{"a": [{"b": 1}, {"b": 2}]}', 'a{0}.b', 3, 'a{1}.b', 4) | head 1 | fields jsonAppend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a{0}.b is not an array, should skip append?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Write wrong query, already fix it.

| {"a": [{"b": 1}, 3]} |
+-------------------------+

> source=json_test | eval jsonExtend = json_extend('{"a": [{"b": 1}, {"b": 2}]}', 'a{0}.b', 3, 'a{1}.b', 4) | head 1 | fields jsonExtend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a{0}.b is not an array, should skip append?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Write wrong query, already fix it.


Example::

> source=json_test | eval jsonKeys = json_keys('{"a": 1, "b": 2}') | head 1 | fields jsonKeys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if json object is nested {"a": {"b", 1}, "b":1}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the top level key will be returned. Still ["a", "b"] in this case. I update doc and add this case.

Description
>>>>>>>>>>>

Usage: `json_keys(json_string)` Return the key list of the json_string as a string if it's an object json string. Otherwise, return null.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Return the key list of the json_string as a string if it's an object json string." -> Return the key list of the Json object as a Json array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

xinyual added 10 commits May 28, 2025 15:08
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
----------

Description
>>>>>>>>>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add Version and Limitation section for all new added functions.
Limitation: Only works when plugins.calcite.enabled=true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

penghuo
penghuo previously approved these changes Jun 4, 2025
@LantaoJin
Copy link
Member

CI failures CalcitePPLAppendcolTest > ... FAILED is not related.

@xinyual xinyual force-pushed the addJsonFunctions branch from 03931e1 to 2a0bad8 Compare June 6, 2025 02:03
@xinyual
Copy link
Contributor Author

xinyual commented Jun 6, 2025

Force push to revert useless change and resolve DCO problems.

xinyual added 3 commits June 6, 2025 10:05
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants