Skip to content

Conversation

@var77
Copy link
Contributor

@var77 var77 commented Sep 29, 2023

Added benchmarking for Pinecone
Currently this only benchmarks index creation latency. (create index + upsert data in batches)
The core/utils/pinecone_async_index.py is just wrapper for Pinecone Index class, so async requests will be supported when querying index. Referance

POSTGRES_PORT_LANTERN=5436
SSH_PORT_LANTERN=2222
SSH_PORT_LANTERN=2222
PINECONE_API_KEY='YOUR_PINECONE_API_KEY_HERE'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from JS background so following some conventions from there. What do you think about

  1. putting this in .env.local
  2. putting .env.local in .gitignore
  3. adding a check that these two variables are defined when testing Pinecone
    I'm worry about accidentally committing these variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think moving .env to gitignore and adding .env.example may be an option. Also having checks before using these variables will help to provide better user facing error message


def get_cloud_provider(provider_name):
if provider_name == Cloud.PINECONE:
return Pinecone(os.environ['PINECONE_API_KEY'], os.environ['PINECONE_ENV'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do the "variables exist" check here

Extension.PGVECTOR_HNSW: {'m': 32, 'ef_construction': 128, 'ef': 10},
Extension.LANTERN: {'m': 32, 'ef_construction': 128, 'ef': 10},
Extension.NEON: {'m': 32, 'ef_construction': 128, 'ef': 10},
Cloud.PINECONE: { 'name': '', 'metric': 'cosine', 'pods': 1, 'replicas': 1, 'pod_type': 'p2' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the other indices are using L2 by default, and sift uses the L2 metric for ground truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an option of euclidean distance which seems to be l2-norm, but I think in our index it is l2 squared. Will it work as expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants