Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Indexing with offline batch inference #1235

Open
heemin32 opened this issue Mar 19, 2025 · 0 comments
Open

[FEATURE] Indexing with offline batch inference #1235

heemin32 opened this issue Mar 19, 2025 · 0 comments

Comments

@heemin32
Copy link
Collaborator

Is your feature request related to a problem?

In neural search, users are interested in minimizing the cost and time needed for embedding generation on large datasets. OSI addressed this by implementing an offline batch inference solution, which leverages batch processing to optimize both cost and performance. This process involves OSI handling file creation, uploading to S3, invoking the ML Common API, monitoring inference completion, retrieving results from S3, and finally parsing and ingesting the data into OpenSearch. opensearch-project/ml-commons#2891

However, many customers do not use OSI during the ingestion process. To accommodate them, we can offer an offline batch inference option that does not require OSI.

Similar issue that I created in ml-common. opensearch-project/ml-commons#3428
Creating another one here for better visibility to get community feedbacks.

What solution would you like?

For example, customers can create an index and ingest plain text data, and we provide an API to generate embeddings using the offline batch inference component. The process would work as follows:

  1. The customer creates an index.
  2. The customer ingests documents containing plain text.
  3. The customer triggers an API to populate embedding field, either as a one-time process or on a scheduled interval:
    1. Retrieve documents from the target index that lack embeddings or having outdated embedding(utilizing timestamp?)
    2. Create a file and upload it to S3.
    3. Call ML Common to perform offline batch inference.
    4. Retrieve the processed file from S3.
    5. Populate the index with the generated embeddings.
  4. The customer sees the embeddings successfully populated in the index.

What alternatives have you considered?

Utilizing OSI

Do you have any additional context?

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant