The setup script will:
- Download and process the PyPI dataset and store the results in the
data
directory. - Create vector embeddings for the PyPI dataset.
- If the
STORAGE_BACKEND
environment variable is set toBLOB
: Upload the datasets to blob storage.
There are three ways to run the setup script:
You can run the setup script using a virtual environment with Poetry. This method will automatically utilize your GPU for the vector embeddings if it is detected.
-
Install dependencies and set up the virtual environment:
poetry install
-
Run the setup script:
poetry run python pypi_scout/scripts/setup.py
If you have an NVIDIA GPU and the NVIDIA Container Toolkit installed, follow these steps:
-
Build the Docker image:
docker build -t pypi-scout .
-
Run the setup script in a Docker container with GPU support:
docker run --rm \ --gpus all \ --env-file .env \ -v $(pwd)/data:/code/data \ --entrypoint "/bin/bash" \ pypi-scout \ -c "python /code/pypi_scout/scripts/setup.py"
If you do not have an NVIDIA GPU or the NVIDIA Container Toolkit installed, follow these steps:
-
Build the Docker image:
docker build -f DockerfileCPU -t pypi-scout .
-
Run the setup script in a Docker container without GPU support:
docker run --rm \ --env-file .env \ -v $(pwd)/data:/code/data \ --entrypoint "/bin/bash" \ pypi-scout \ -c "python /code/pypi_scout/scripts/setup.py"
After setting up the dataset, start the application using Docker Compose:
docker-compose up
After a short while, your application will be live at http://localhost:3000.