-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(alloydb): Added generate batch embeddings sample #12721
base: main
Are you sure you want to change the base?
feat(alloydb): Added generate batch embeddings sample #12721
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
fdcbe83
to
93aab38
Compare
"id": "D3FUBaXIUquR" | ||
}, | ||
"source": [ | ||
"This runs the complete embeddings workflow:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theres a bit of a disconnect between the set up and running. In the "Create the embeddings workflow" section I would add some context on that you are setting up the functions you will be using. We also need to prepare the user more for the idea of generating embeddings for multiple columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some more information under the Building an Embeddings Workflow
heading as well as added more information on the dataset about what columns are to be embedded.
Let me know what you think!
")\n", | ||
"\n", | ||
"# Update the database with the generated embeddings concurrently\n", | ||
"await batch_update_rows_concurrently(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user I would kind of expect just to run 1 method to generate the embeddings, compared to having to get and batch the source data then generate my embeddings, then updating the database. I have to copy around the variable cols_to_embed a lot. We might be able to simplify this devex more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some changes in the code which would enhance the user experience by letting users declare cols to embed (and other variables) and using the run_embeddings_workflow
directly.
Another alternative is to let each function use the global cols_to_embed variable and eliminate it's argument. It could make the code harder to maintain and understand.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to keeping it as an argument
93aab38
to
4cc34e4
Compare
The testing added in this PR is not currently running (the CI tests "succeed", but they're not detecting any tests). I did some debugging in #12762, and the changes I think you need are in this commit range
After these changes, the tests run and fail due to the lack of configs (b/378136679) This PR introduces testing that would also help #12588 (I originally thought this might be a duplicate PR, but they're different files) |
Additionally: we need to confirm if the codeowners added in #12583 are correct; this might be for a different product related team. |
I can confirm that's the right folder -- the AlloyDB team is maintaining these samples long term. |
Also, this PR is superseding that PR. |
Description
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
nox -s py-3.9
(see Test Environment Setup)nox -s lint
(see Test Environment Setup)