-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Machine readable PPL function documentation. #1065
Comments
@LantaoJin can you please take a look to this? |
We can leverage LLM to create an initial version based on current markdown docs, but how to ensure all docs could be auto converted to JSON format in feature? Ask developer to add a JSON doc manually seems very tricky. How above well format our existing markdown docs and leverage a markdown to json tool (for example https://github.com/njvack/markdown-to-json) as a long-term solution? |
However, I'm also open to providing just a one-time conversion, because in the short term there probably won't be many new commands and functions added to this repository. But without automated conversion, our developers need to be particularly careful and manually modify the JSON files when updating any existing commands and functions. |
We need to define what kind of format the function docs look like. Here are some examples: |
Is the Apache Spark documentation machine readable? if yes, then I am leaning to use Apach Spark format to have consistent documentation for both SQL and PPL. |
Spark functions doc is coming from Scala docs in this file: https://github.com/apache/spark/blob/0184c5bf6670e5bde0f79b2ce64319fce813704f/sql/api/src/main/scala/org/apache/spark/sql/functions.scala Should we follow the same way? |
I do not think apache has that accurate information which is needed! We are currently working on Autosuggest for all PPL functions.We just need some format from which we need to extract these details from: Whats required to create JSON Format expected: Else please suggest a better way to do it! |
Since our existing documentation is rST, I'm inclined toward a system that already can handle rST well. Sphinx is widely-used and I've worked with it before. (Spark's docs linked above also use Sphinx, for the record) It most commonly outputs HTML, but we can configure it to output doctree XML instead, which should be machine-readable. If we have a specific output format we need, we can also implement a custom Builder. This should limit the migration work from our existing system. For a more long-term solution, we should move our function documentation to Javadoc comments. This is easier to maintain since the documentation lives in the same place as its function. Javadocs already have a tag system for input and return parameters. The input and return data types become attached to the method signatures, which guarantees that they stay up to date. In either case, with Sphinx, we can have it separately produce documentation for the OpenSearch site, for autosuggestion, and for any other consumers, just by changing the output configuration. Sphinx can also handle validating its input, so we can check the format directly in CI. |
this would be the expected JSON |
That format relies heavily on copying unprocessed strings (e.g. For a more type-safe format that we can validate at build time, a custom builder would probably help more. |
So, would it be suggested if we use Doctree to parse this json.If we are fixed on it,can you please share the format of doc tree expected so we can try to take mock and implement the automation process to generate this JSON. |
is there any update regarding providing us the sample format so we can start working on it |
Why do we stick to this JSON format? |
The JSON format is needed to provide code intelligence features in the query editor - specifically signature help, which shows function documentation, parameter information, and return types when users are typing queries. This helps users understand how to use each function correctly. We need that specific JSON format as we are using SignatureInformation interface of MonacoEditor API to display the description,label,parameters,different signatures that a perticular function allows, which is in the format of
For consistency, we should maintain the same structure as the existing
This format provides a good developer experience through signature help tooltips in the editor while maintaining consistency across both PPL and SQL query languages. As of now we are targeting PPL for this Q1, we will figure out a way for that as well . |
I think scaladoc would be ideas since it includes the documentation as well as function signatures. @LantaoJin |
Thanks that would be great! @LantaoJin need your help on this,Also we need all the commands in one place as well , which helps us to automate syntax highlighting as well. |
@ykmr1224 No, we don't maintain all the function definitions in this repo. We just transform PPL functions to Spark functions by
But all functions list is included in class Line 246 in b9a0dc5
|
Unlike the SQL Plugin project, spark-ppl only converts PPL syntax into Spark plans. Except for the last type which is implemented through ScalaUDF in our project, all other function definitions are in the Spark project. Therefore, currently the most complete function definitions are documented in the current markdown doc files through manual compilation. |
We had offline discussion. We will come up with template and replace existing docs based on he template. |
Is your feature request related to a problem?
To provide context help for OpenSearch SQL/PPL while user is editing their query, I want better structured documentation where it can be automatically read and extracted description, parameters and those types.
What solution would you like?
Ideally it is documented as JSON, etc. which can be easily parsed.
What alternatives have you considered?
Extract those info from current markdown doc.
Do you have any additional context?
n/a
The text was updated successfully, but these errors were encountered: