-
Notifications
You must be signed in to change notification settings - Fork 906
feat(serverless-spark): Add serverless-spark source with list_batches tool #1690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2e450ef
to
fad52b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dborowitz, thanks for the PR! The source and tool implementations overall look good, but we need more thorough integration test to make sure things are working correctly in the long term.
// or failed Serverless Spark batches, of any age. | ||
func runListBatchesTest(t *testing.T) { | ||
requestBody := bytes.NewBuffer([]byte(`{"pageSize": 2, "filter": "state = SUCCEEDED OR state = FAILED"}`)) | ||
req, err := http.NewRequest(http.MethodPost, "http://127.0.0.1:5000/api/tool/list-batches/invoke", requestBody) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a more comprehensive golang test table here to make sure all the edge cases are tested? Ideally we want to test for every parameter including the pageToken
. We should also be testing edge cases like empty results etc. The generic authentication feature testing should also be included. Feel free to refer to the BigQuery tests as examples. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
To keep it compatible with arbitrary test projects, I took the approach of listing results with the underlying API directly, then comparing a possibly-paginated version of the results from the tool with the underlying results.
Note this is inherently racy, but the hope is that we won't be frequently creating new batches in the integration test project. If it turns out to be an issue, we can always filter to batches created >30m ago or something.
6976c99
to
c2a6087
Compare
I've converted to draft to ensure we don't merge before confirming the documentation/branding story with my PM. Other than that, I've resolved the one comment from @duwenxin99 about integration tests, and it's ready for review. |
… tool Built as a thin wrapper over the official Google Cloud Dataproc Go client library, with support for filtering and pagination.
c2a6087
to
b9aaaae
Compare
Updated the docs after consulting with PM, and added docs for the prebuilt toolset, which I had missed before. |
@dborowitz Thanks for the test update! Could you add auth test cases to every tool like BQ does here to make sure the auth feature is working correctly. An |
Description
Add a new source
serverless-spark
for connecting to Google Cloud Serverless for Apache Spark, along with a single simple toolserverless-spark-list-batches
.One outstanding though possibly trivial question is how to address naming/branding/source splitting. The official product name is "Google Cloud Serverless for Apache Spark", though this is a relatively recent rebranding of "Dataproc Serverless". I figured for the source and tool names we could use the short version "serverless-spark", but I've tried to retain the officially correct name in documentation. Granted, it's quite a mouthful and sticks out compared to the other names in Toolbox. On the other hand, I don't want to imply that this works with any Spark hosting environment besides Google's. I'll double-check the branding guidelines with our PMs, but I appreciate Toolbox maintainers' feedback as well.
PR Checklist
CONTRIBUTING.md
bug/issue
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
!
if this involve a breaking change🛠️ Part of #1689