Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added benchmarking for pinecone #17

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

var77
Copy link
Contributor

@var77 var77 commented Sep 29, 2023

Added benchmarking for Pinecone
Currently this only benchmarks index creation latency. (create index + upsert data in batches)
The core/utils/pinecone_async_index.py is just wrapper for Pinecone Index class, so async requests will be supported when querying index. Referance

SSH_PORT_LANTERN=2222
PINECONE_API_KEY='YOUR_PINECONE_API_KEY_HERE'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from JS background so following some conventions from there. What do you think about

  1. putting this in .env.local
  2. putting .env.local in .gitignore
  3. adding a check that these two variables are defined when testing Pinecone
    I'm worry about accidentally committing these variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think moving .env to gitignore and adding .env.example may be an option. Also having checks before using these variables will help to provide better user facing error message


def get_cloud_provider(provider_name):
if provider_name == Cloud.PINECONE:
return Pinecone(os.environ['PINECONE_API_KEY'], os.environ['PINECONE_ENV'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do the "variables exist" check here

@@ -33,6 +36,7 @@ class Extension(Enum):
Extension.PGVECTOR_HNSW: {'m': 32, 'ef_construction': 128, 'ef': 10},
Extension.LANTERN: {'m': 32, 'ef_construction': 128, 'ef': 10},
Extension.NEON: {'m': 32, 'ef_construction': 128, 'ef': 10},
Cloud.PINECONE: { 'name': '', 'metric': 'cosine', 'pods': 1, 'replicas': 1, 'pod_type': 'p2' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the other indices are using L2 by default, and sift uses the L2 metric for ground truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an option of euclidean distance which seems to be l2-norm, but I think in our index it is l2 squared. Will it work as expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants