Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(alloydb): Added generate batch embeddings sample #12721

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

twishabansal
Copy link

@twishabansal twishabansal commented Oct 23, 2024

Description

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: notebooks Issues related to the Vertex AI Workbench API. labels Oct 23, 2024
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/generate_batch_embeddings.ipynb Outdated Show resolved Hide resolved
@twishabansal twishabansal force-pushed the generate_batch_embeddings branch 2 times, most recently from fdcbe83 to 93aab38 Compare October 24, 2024 18:05
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
"id": "D3FUBaXIUquR"
},
"source": [
"This runs the complete embeddings workflow:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres a bit of a disconnect between the set up and running. In the "Create the embeddings workflow" section I would add some context on that you are setting up the functions you will be using. We also need to prepare the user more for the idea of generating embeddings for multiple columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some more information under the Building an Embeddings Workflow heading as well as added more information on the dataset about what columns are to be embedded.

Let me know what you think!

alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
")\n",
"\n",
"# Update the database with the generated embeddings concurrently\n",
"await batch_update_rows_concurrently(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user I would kind of expect just to run 1 method to generate the embeddings, compared to having to get and batch the source data then generate my embeddings, then updating the database. I have to copy around the variable cols_to_embed a lot. We might be able to simplify this devex more.

Copy link
Author

@twishabansal twishabansal Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some changes in the code which would enhance the user experience by letting users declare cols to embed (and other variables) and using the run_embeddings_workflow directly.

Another alternative is to let each function use the global cols_to_embed variable and eliminate it's argument. It could make the code harder to maintain and understand.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping it as an argument

alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/batch_embeddings_update.ipynb Outdated Show resolved Hide resolved
@twishabansal twishabansal marked this pull request as ready for review November 6, 2024 16:53
@twishabansal twishabansal requested review from a team as code owners November 6, 2024 16:53
@twishabansal twishabansal requested a review from a team as a code owner November 8, 2024 16:58
@glasnt glasnt changed the title docs: Added generate batch embeddings sample feat(alloydb): Added generate batch embeddings sample Nov 11, 2024
glasnt added a commit that referenced this pull request Nov 11, 2024
@averikitsch averikitsch self-requested a review November 11, 2024 22:20
@averikitsch averikitsch dismissed their stale review November 11, 2024 22:21

Rereviewed by kvg@

alloydb/notebooks/embeddings_batch_processing.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/embeddings_batch_processing.ipynb Outdated Show resolved Hide resolved
alloydb/notebooks/embeddings_batch_processing.ipynb Outdated Show resolved Hide resolved
@glasnt
Copy link
Contributor

glasnt commented Nov 11, 2024

The testing added in this PR is not currently running (the CI tests "succeed", but they're not detecting any tests).

I did some debugging in #12762, and the changes I think you need are in this commit range

  • Move test files
    • conftest.py to alloydb/ (allowing it to be shared)
    • e2e_test.py to notebooks/ folder (common practice)
  • add an empty requirements.txt file (I think this is the major change, no tests were running without it)

After these changes, the tests run and fail due to the lack of configs (b/378136679)

This PR introduces testing that would also help #12588 (I originally thought this might be a duplicate PR, but they're different files)

@glasnt
Copy link
Contributor

glasnt commented Nov 12, 2024

Additionally: we need to confirm if the codeowners added in #12583 are correct; this might be for a different product related team.

@kurtisvg
Copy link
Contributor

Additionally: we need to confirm if the codeowners added in #12583 are correct; this might be for a different product related team.

I can confirm that's the right folder -- the AlloyDB team is maintaining these samples long term.

@kurtisvg
Copy link
Contributor

This PR introduces testing that would also help #12588 (I originally thought this might be a duplicate PR, but they're different files)

Also, this PR is superseding that PR.

@glasnt glasnt assigned glasnt and unassigned engelke Nov 14, 2024
@glasnt glasnt added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: notebooks Issues related to the Vertex AI Workbench API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants