From 7cd199532ea0c52eda363d347cb711543d6c8dfb Mon Sep 17 00:00:00 2001 From: Rajvaibhav Rahane Date: Wed, 2 Apr 2025 10:04:59 -0700 Subject: [PATCH 1/4] Add Developer guide Signed-off-by: Rajvaibhav Rahane --- DEVELOPER_GUIDE.md | 163 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+) create mode 100644 DEVELOPER_GUIDE.md diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md new file mode 100644 index 0000000..fd72739 --- /dev/null +++ b/DEVELOPER_GUIDE.md @@ -0,0 +1,163 @@ +[Developer Guide](#developer-guide) +- [Getting Started](#getting-started) + - [Fork OpenSearch Remote Vector Index Builder Repo](#fork-remote-vector-index-builder-repo) + - [Install Prerequisites](#install-prerequisites) + - [Python Dependencies](#python-dependencies) +- [Python Guide](#python-guide) + - [Language Formatting Guide](#language-formatting-guide) + - [Testing Guide](#testing-guide) +- [Building Docker Images](#building-docker-images) + - [Faiss Base Image](#faiss-base-image) + - [Core Image](#core-image) +- [Memory Profiling](#memory-profiling) + - [GPU Memory Profiling with NVIDIA SMI](#gpu-memory-profiling-with-nvidia-smi) + - [CPU Memory Profiling with memory_profiler](#cpu-memory-profiling-with-memory_profiler) + +# Developer Guide + +So you want to contribute code to OpenSearch Remote Vector Index Builder? Excellent! We're glad you're here. Here's what you need to do. + +## Getting Started + +### Fork Remote Vector Index Builder Repo + +Fork [opensearch-project/OpenSearch Remote Vector Index Builder](https://github.com/opensearch-project/remote-vector-index-builder) and clone locally. + +Example: +``` +git clone https://github.com/[username]/remote-vector-index-builder.git +``` + +### Install Prerequisites + +#### Python Dependencies +Core Dependencies: +``` +pip install -r remote_vector_index_builder/core/requirements.txt +``` + +Test Dependencies: +``` +pip install -r test_remote_vector_index_builder/requirements.txt +``` + +## Python Guide +### Language Formatting Guide +Run the following commands from the root folder. Configuration of below tools can be found in [`setup.cfg`](setup.cfg). + +The code lint check can be run with: +``` +flake8 remote_vector_index_builder/ test_remote_vector_index_builder/ +``` + +The formatting check can be run with: +``` +black --check remote_vector_index_builder/ test_remote_vector_index_builder/ +``` + +The code can be formatted with: +``` +black remote_vector_index_builder/ test_remote_vector_index_builder/ +``` + +### Testing Guide +The static type checking can be done with: +``` +mypy remote_vector_index_builder/ test_remote_vector_index_builder/ +``` +The python tests can be run with: +``` +pytest test_remote_vector_index_builder/ +``` + +## Building Docker Images +The Github CIs automatically publish snapshot images to Dockerhub at [opensearchstaging/remote-vector-index-builder](https://hub.docker.com/r/opensearchstaging/remote-vector-index-builder). + +The following are the commands to build the images locally: + +### Faiss Base Image +The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a submodule in this repository. Run the below command to initialize the submodule first. +``` +git submodule update --init +``` +The Faiss Base Image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed. + +Please see the section [Provisioning an instance for development](#provisioning-an-instance-for-development) to provision an instance for development. + +Run the below command to create the Faiss base image: +``` +docker build -f ./base_image/build_scripts/Dockerfile . -t opensearchstaging/remote-vector-index-builder:faiss-base-latest +``` + +### Core Image +The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for core index build functionalities: +1. Building an Index +2. Object Store I/O + +Build an image with the above core functionalities: +``` +docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchstaging/remote-vector-index-builder:core-latest +``` + +## Provisioning an instance for development + +A NVIDIA GPU Powered machine with CUDA Toolkit installed is required to build a Faiss Base Image, and to run the docker images to build an index. + +Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI, with Docker CLI installed is recommended for development use. + +[Setup an EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html) + +## Memory Profiling + +Simple memory profiling can be done during development to get memory usage statistics during the Index Build process. + +### GPU Memory Profiling with NVIDIA SMI + +1. Install [py3nvml](https://pypi.org/project/py3nvml/): + +In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline. + +2. Run the following command: +``` +pip install --no-cache-dir --upgrade -r /remote_vector_index_builder/core/requirements.txt +``` + +3. Add import statement and initialize method in the file containing the driver code. +``` +from py3nvml import nvidia_smi +nvidia_smi.nvmlInit() +``` + +4. Define and call the below method wherever necessary. + +e.g. before and after calling the GPU index cleanup method. +``` +from py3nvml import nvidia_smi + +def get_gpu_memory(): + handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) # GPU Device ID + info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) + print(f"Total: {info.total/1024**2:.2f}MB") + print(f"Free: {info.free/1024**2:.2f}MB") + print(f"Used: {info.used/1024**2:.2f}MB") + +``` + +### CPU Memory Profiling with memory_profiler + +1. Add the below command in [`/remote_vector_index_builder/core/Dockerfile`](/remote_vector_index_builder/core/Dockerfile) to install [memory_profiler](https://pypi.org/project/memory-profiler/). +``` +RUN conda install -c conda-forge memory_profiler -y +``` + +2. In the file that contains the function that needs to be profiled, add the import and an `@profile` annotation on the function. +``` +from memory_profiler import profile + +@profile +def my_func(): + a = [1] * (10 ** 6) + b = [2] * (2 * 10 ** 7) + del b + return a +``` From 25e55882453c87a8233bbe000b1e8e9ba1e4f4c0 Mon Sep 17 00:00:00 2001 From: Rajvaibhav Rahane Date: Wed, 9 Apr 2025 11:56:59 -0700 Subject: [PATCH 2/4] Fix nits Signed-off-by: Rajvaibhav Rahane --- DEVELOPER_GUIDE.md | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md index fd72739..7e20490 100644 --- a/DEVELOPER_GUIDE.md +++ b/DEVELOPER_GUIDE.md @@ -9,6 +9,7 @@ - [Building Docker Images](#building-docker-images) - [Faiss Base Image](#faiss-base-image) - [Core Image](#core-image) +- [Provisioning an Instance for Development](#provisioning-an-instance-for-development) - [Memory Profiling](#memory-profiling) - [GPU Memory Profiling with NVIDIA SMI](#gpu-memory-profiling-with-nvidia-smi) - [CPU Memory Profiling with memory_profiler](#cpu-memory-profiling-with-memory_profiler) @@ -31,6 +32,10 @@ git clone https://github.com/[username]/remote-vector-index-builder.git ### Install Prerequisites #### Python Dependencies + +The following are commands to install dependencies during local development and testing. +The required dependencies are installed during Docker image creation onto the image. + Core Dependencies: ``` pip install -r remote_vector_index_builder/core/requirements.txt @@ -65,7 +70,7 @@ The static type checking can be done with: ``` mypy remote_vector_index_builder/ test_remote_vector_index_builder/ ``` -The python tests can be run with: +The Python tests can be run with: ``` pytest test_remote_vector_index_builder/ ``` @@ -101,7 +106,7 @@ docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchst ## Provisioning an instance for development -A NVIDIA GPU Powered machine with CUDA Toolkit installed is required to build a Faiss Base Image, and to run the docker images to build an index. +A NVIDIA GPU Powered machine with CUDA Toolkit installed is required to build a Faiss Base Image, and to run the Docker images to build an index. Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI, with Docker CLI installed is recommended for development use. @@ -113,24 +118,15 @@ Simple memory profiling can be done during development to get memory usage stati ### GPU Memory Profiling with NVIDIA SMI -1. Install [py3nvml](https://pypi.org/project/py3nvml/): +1. Install [py3nvml](https://pypi.org/project/py3nvml/): In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline. -In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline. - -2. Run the following command: -``` -pip install --no-cache-dir --upgrade -r /remote_vector_index_builder/core/requirements.txt -``` - -3. Add import statement and initialize method in the file containing the driver code. +2. Add import statement and initialize method in the file containing the driver code. ``` from py3nvml import nvidia_smi nvidia_smi.nvmlInit() ``` -4. Define and call the below method wherever necessary. - -e.g. before and after calling the GPU index cleanup method. +3. Define and call the below method wherever necessary. e.g. before and after calling the GPU index cleanup method. ``` from py3nvml import nvidia_smi From 3207c569056ab96e2bac21ed9a43e38f30fc4e1f Mon Sep 17 00:00:00 2001 From: Rajvaibhav Rahane Date: Wed, 9 Apr 2025 14:43:15 -0700 Subject: [PATCH 3/4] fix nits Signed-off-by: Rajvaibhav Rahane --- DEVELOPER_GUIDE.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md index 7e20490..1bcb62f 100644 --- a/DEVELOPER_GUIDE.md +++ b/DEVELOPER_GUIDE.md @@ -34,7 +34,7 @@ git clone https://github.com/[username]/remote-vector-index-builder.git #### Python Dependencies The following are commands to install dependencies during local development and testing. -The required dependencies are installed during Docker image creation onto the image. +The required dependencies are installed onto the Docker image during creation. Core Dependencies: ``` @@ -85,7 +85,7 @@ The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a ``` git submodule update --init ``` -The Faiss Base Image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed. +The Faiss base image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed. Please see the section [Provisioning an instance for development](#provisioning-an-instance-for-development) to provision an instance for development. @@ -106,7 +106,7 @@ docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchst ## Provisioning an instance for development -A NVIDIA GPU Powered machine with CUDA Toolkit installed is required to build a Faiss Base Image, and to run the Docker images to build an index. +A NVIDIA GPU powered machine with CUDA Toolkit installed is required to build a Faiss base Image, and to run the Docker images to build an index. Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI, with Docker CLI installed is recommended for development use. From 2e2f966ce2a130fc3b2e0c5e44e4c1b463a7588a Mon Sep 17 00:00:00 2001 From: Rajvaibhav Rahane Date: Thu, 10 Apr 2025 16:44:59 -0700 Subject: [PATCH 4/4] fix nit Signed-off-by: Rajvaibhav Rahane --- DEVELOPER_GUIDE.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md index a29a145..e325b5a 100644 --- a/DEVELOPER_GUIDE.md +++ b/DEVELOPER_GUIDE.md @@ -106,9 +106,9 @@ docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchst ## Provisioning an instance for development -A NVIDIA GPU powered machine with CUDA Toolkit installed is required to build a Faiss base Image and to run the Docker images to build an index. +A NVIDIA GPU powered machine with CUDA Toolkit installed is required to build a Faiss base image and to run the Docker images to build an index. -Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI, with Docker CLI installed is recommended for development use. +Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI with Docker CLI installed is recommended for development use. [Setup an EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html) @@ -118,7 +118,7 @@ Simple memory profiling can be done during development to get memory usage stati ### GPU Memory Profiling with NVIDIA SMI -1. Install [py3nvml](https://pypi.org/project/py3nvml/): In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline. +1. Install [py3nvml](https://pypi.org/project/py3nvml/): In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add `py3nvml` on a newline. 2. Add import statement and initialize method in the file containing the driver code. ```