|
| 1 | +[Developer Guide](#developer-guide) |
| 2 | +- [Getting Started](#getting-started) |
| 3 | + - [Fork OpenSearch Remote Vector Index Builder Repo](#fork-remote-vector-index-builder-repo) |
| 4 | + - [Install Prerequisites](#install-prerequisites) |
| 5 | + - [Python Dependencies](#python-dependencies) |
| 6 | +- [Python Guide](#python-guide) |
| 7 | + - [Language Formatting Guide](#language-formatting-guide) |
| 8 | + - [Testing Guide](#testing-guide) |
| 9 | +- [Building Docker Images](#building-docker-images) |
| 10 | + - [Faiss Base Image](#faiss-base-image) |
| 11 | + - [Core Image](#core-image) |
| 12 | +- [Memory Profiling](#memory-profiling) |
| 13 | + - [GPU Memory Profiling with NVIDIA SMI](#gpu-memory-profiling-with-nvidia-smi) |
| 14 | + - [CPU Memory Profiling with memory_profiler](#cpu-memory-profiling-with-memory_profiler) |
| 15 | + |
| 16 | +# Developer Guide |
| 17 | + |
| 18 | +So you want to contribute code to OpenSearch Remote Vector Index Builder? Excellent! We're glad you're here. Here's what you need to do. |
| 19 | + |
| 20 | +## Getting Started |
| 21 | + |
| 22 | +### Fork Remote Vector Index Builder Repo |
| 23 | + |
| 24 | +Fork [opensearch-project/OpenSearch Remote Vector Index Builder](https://github.com/opensearch-project/remote-vector-index-builder) and clone locally. |
| 25 | + |
| 26 | +Example: |
| 27 | +``` |
| 28 | +git clone https://github.com/[username]/remote-vector-index-builder.git |
| 29 | +``` |
| 30 | + |
| 31 | +### Install Prerequisites |
| 32 | + |
| 33 | +#### Python Dependencies |
| 34 | +Core Dependencies: |
| 35 | +``` |
| 36 | +pip install -r remote_vector_index_builder/core/requirements.txt |
| 37 | +``` |
| 38 | + |
| 39 | +Test Dependencies: |
| 40 | +``` |
| 41 | +pip install -r test_remote_vector_index_builder/requirements.txt |
| 42 | +``` |
| 43 | + |
| 44 | +## Python Guide |
| 45 | +### Language Formatting Guide |
| 46 | +Run the following commands from the root folder. Configuration of below tools can be found in [`setup.cfg`](setup.cfg). |
| 47 | + |
| 48 | +The code lint check can be run with: |
| 49 | +``` |
| 50 | +flake8 remote_vector_index_builder/ test_remote_vector_index_builder/ |
| 51 | +``` |
| 52 | + |
| 53 | +The formatting check can be run with: |
| 54 | +``` |
| 55 | +black --check remote_vector_index_builder/ test_remote_vector_index_builder/ |
| 56 | +``` |
| 57 | + |
| 58 | +The code can be formatted with: |
| 59 | +``` |
| 60 | +black remote_vector_index_builder/ test_remote_vector_index_builder/ |
| 61 | +``` |
| 62 | + |
| 63 | +### Testing Guide |
| 64 | +The static type checking can be done with: |
| 65 | +``` |
| 66 | +mypy remote_vector_index_builder/ test_remote_vector_index_builder/ |
| 67 | +``` |
| 68 | +The python tests can be run with: |
| 69 | +``` |
| 70 | +pytest test_remote_vector_index_builder/ |
| 71 | +``` |
| 72 | + |
| 73 | +## Building Docker Images |
| 74 | +The Github CIs automatically publish snapshot images to Dockerhub at [opensearchstaging/remote-vector-index-builder](https://hub.docker.com/r/opensearchstaging/remote-vector-index-builder). |
| 75 | + |
| 76 | +The following are the commands to build the images locally: |
| 77 | + |
| 78 | +### Faiss Base Image |
| 79 | +The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a submodule in this repository. Run the below command to initialize the submodule first. |
| 80 | +``` |
| 81 | +git submodule update --init |
| 82 | +``` |
| 83 | +The Faiss Base Image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed. |
| 84 | + |
| 85 | +Please see the section [Provisioning an instance for development](#provisioning-an-instance-for-development) to provision an instance for development. |
| 86 | + |
| 87 | +Run the below command to create the Faiss base image: |
| 88 | +``` |
| 89 | +docker build -f ./base_image/build_scripts/Dockerfile . -t opensearchstaging/remote-vector-index-builder:faiss-base-latest |
| 90 | +``` |
| 91 | + |
| 92 | +### Core Image |
| 93 | +The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for core index build functionalities: |
| 94 | +1. Building an Index |
| 95 | +2. Object Store I/O |
| 96 | + |
| 97 | +Build an image with the above core functionalities: |
| 98 | +``` |
| 99 | +docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchstaging/remote-vector-index-builder:core-latest |
| 100 | +``` |
| 101 | + |
| 102 | +## Provisioning an instance for development |
| 103 | + |
| 104 | +A NVIDIA GPU Powered machine with CUDA Toolkit installed is required to build a Faiss Base Image, and to run the docker images to build an index. |
| 105 | + |
| 106 | +Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI, with Docker CLI installed is recommended for development use. |
| 107 | + |
| 108 | +[Setup an EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html) |
| 109 | + |
| 110 | +## Memory Profiling |
| 111 | + |
| 112 | +Simple memory profiling can be done during development to get memory usage statistics during the Index Build process. |
| 113 | + |
| 114 | +### GPU Memory Profiling with NVIDIA SMI |
| 115 | + |
| 116 | +1. Install [py3nvml](https://pypi.org/project/py3nvml/): |
| 117 | + |
| 118 | +In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline. |
| 119 | + |
| 120 | +2. Run the following command: |
| 121 | +``` |
| 122 | +pip install --no-cache-dir --upgrade -r /remote_vector_index_builder/core/requirements.txt |
| 123 | +``` |
| 124 | + |
| 125 | +3. Add import statement and initialize method in the file containing the driver code. |
| 126 | +``` |
| 127 | +from py3nvml import nvidia_smi |
| 128 | +nvidia_smi.nvmlInit() |
| 129 | +``` |
| 130 | + |
| 131 | +4. Define and call the below method wherever necessary. |
| 132 | + |
| 133 | +e.g. before and after calling the GPU index cleanup method. |
| 134 | +``` |
| 135 | +from py3nvml import nvidia_smi |
| 136 | +
|
| 137 | +def get_gpu_memory(): |
| 138 | + handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) # GPU Device ID |
| 139 | + info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) |
| 140 | + print(f"Total: {info.total/1024**2:.2f}MB") |
| 141 | + print(f"Free: {info.free/1024**2:.2f}MB") |
| 142 | + print(f"Used: {info.used/1024**2:.2f}MB") |
| 143 | +
|
| 144 | +``` |
| 145 | + |
| 146 | +### CPU Memory Profiling with memory_profiler |
| 147 | + |
| 148 | +1. Add the below command in [`/remote_vector_index_builder/core/Dockerfile`](/remote_vector_index_builder/core/Dockerfile) to install [memory_profiler](https://pypi.org/project/memory-profiler/). |
| 149 | +``` |
| 150 | +RUN conda install -c conda-forge memory_profiler -y |
| 151 | +``` |
| 152 | + |
| 153 | +2. In the file that contains the function that needs to be profiled, add the import and an `@profile` annotation on the function. |
| 154 | +``` |
| 155 | +from memory_profiler import profile |
| 156 | +
|
| 157 | +@profile |
| 158 | +def my_func(): |
| 159 | + a = [1] * (10 ** 6) |
| 160 | + b = [2] * (2 * 10 ** 7) |
| 161 | + del b |
| 162 | + return a |
| 163 | +``` |
0 commit comments