Skip to content

Commit a922f0a

Browse files
authored
Add Developer guide (#46)
Signed-off-by: Rajvaibhav Rahane <[email protected]>
1 parent 1e89f62 commit a922f0a

File tree

1 file changed

+159
-0
lines changed

1 file changed

+159
-0
lines changed

DEVELOPER_GUIDE.md

+159
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
[Developer Guide](#developer-guide)
2+
- [Getting Started](#getting-started)
3+
- [Fork OpenSearch Remote Vector Index Builder Repo](#fork-remote-vector-index-builder-repo)
4+
- [Install Prerequisites](#install-prerequisites)
5+
- [Python Dependencies](#python-dependencies)
6+
- [Python Guide](#python-guide)
7+
- [Language Formatting Guide](#language-formatting-guide)
8+
- [Testing Guide](#testing-guide)
9+
- [Building Docker Images](#building-docker-images)
10+
- [Faiss Base Image](#faiss-base-image)
11+
- [Core Image](#core-image)
12+
- [Provisioning an Instance for Development](#provisioning-an-instance-for-development)
13+
- [Memory Profiling](#memory-profiling)
14+
- [GPU Memory Profiling with NVIDIA SMI](#gpu-memory-profiling-with-nvidia-smi)
15+
- [CPU Memory Profiling with memory_profiler](#cpu-memory-profiling-with-memory_profiler)
16+
17+
# Developer Guide
18+
19+
So you want to contribute code to OpenSearch Remote Vector Index Builder? Excellent! We're glad you're here. Here's what you need to do.
20+
21+
## Getting Started
22+
23+
### Fork Remote Vector Index Builder Repo
24+
25+
Fork [opensearch-project/OpenSearch Remote Vector Index Builder](https://github.com/opensearch-project/remote-vector-index-builder) and clone locally.
26+
27+
Example:
28+
```
29+
git clone https://github.com/[username]/remote-vector-index-builder.git
30+
```
31+
32+
### Install Prerequisites
33+
34+
#### Python Dependencies
35+
36+
The following are commands to install dependencies during local development and testing.
37+
The required dependencies are installed onto the Docker image during creation.
38+
39+
Core Dependencies:
40+
```
41+
pip install -r remote_vector_index_builder/core/requirements.txt
42+
```
43+
44+
Test Dependencies:
45+
```
46+
pip install -r test_remote_vector_index_builder/requirements.txt
47+
```
48+
49+
## Python Guide
50+
### Language Formatting Guide
51+
Run the following commands from the root folder. Configuration of below tools can be found in [`setup.cfg`](setup.cfg).
52+
53+
The code lint check can be run with:
54+
```
55+
flake8 remote_vector_index_builder/ test_remote_vector_index_builder/
56+
```
57+
58+
The formatting check can be run with:
59+
```
60+
black --check remote_vector_index_builder/ test_remote_vector_index_builder/
61+
```
62+
63+
The code can be formatted with:
64+
```
65+
black remote_vector_index_builder/ test_remote_vector_index_builder/
66+
```
67+
68+
### Testing Guide
69+
The static type checking can be done with:
70+
```
71+
mypy remote_vector_index_builder/ test_remote_vector_index_builder/
72+
```
73+
The Python tests can be run with:
74+
```
75+
pytest test_remote_vector_index_builder/
76+
```
77+
78+
## Building Docker Images
79+
The Github CIs automatically publish snapshot images to Dockerhub at [opensearchstaging/remote-vector-index-builder](https://hub.docker.com/r/opensearchstaging/remote-vector-index-builder).
80+
81+
The following are the commands to build the images locally:
82+
83+
### Faiss Base Image
84+
The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a submodule in this repository. Run the below command to initialize the submodule first.
85+
```
86+
git submodule update --init
87+
```
88+
The Faiss base image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed.
89+
90+
Please see the section [Provisioning an instance for development](#provisioning-an-instance-for-development) to provision an instance for development.
91+
92+
Run the below command to create the Faiss base image:
93+
```
94+
docker build -f ./base_image/build_scripts/Dockerfile . -t opensearchstaging/remote-vector-index-builder:faiss-base-latest
95+
```
96+
97+
### Core Image
98+
The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for core index build functionalities:
99+
1. Building an Index
100+
2. Object Store I/O
101+
102+
Build an image with the above core functionalities:
103+
```
104+
docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchstaging/remote-vector-index-builder:core-latest
105+
```
106+
107+
## Provisioning an instance for development
108+
109+
A NVIDIA GPU powered machine with CUDA Toolkit installed is required to build a Faiss base image and to run the Docker images to build an index.
110+
111+
Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI with Docker CLI installed is recommended for development use.
112+
113+
[Setup an EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html)
114+
115+
## Memory Profiling
116+
117+
Simple memory profiling can be done during development to get memory usage statistics during the Index Build process.
118+
119+
### GPU Memory Profiling with NVIDIA SMI
120+
121+
1. Install [py3nvml](https://pypi.org/project/py3nvml/): In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add `py3nvml` on a newline.
122+
123+
2. Add import statement and initialize method in the file containing the driver code.
124+
```
125+
from py3nvml import nvidia_smi
126+
nvidia_smi.nvmlInit()
127+
```
128+
129+
3. Define and call the below method wherever necessary. e.g. before and after calling the GPU index cleanup method.
130+
```
131+
from py3nvml import nvidia_smi
132+
133+
def get_gpu_memory():
134+
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) # GPU Device ID
135+
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
136+
print(f"Total: {info.total/1024**2:.2f}MB")
137+
print(f"Free: {info.free/1024**2:.2f}MB")
138+
print(f"Used: {info.used/1024**2:.2f}MB")
139+
140+
```
141+
142+
### CPU Memory Profiling with memory_profiler
143+
144+
1. Add the below command in [`/remote_vector_index_builder/core/Dockerfile`](/remote_vector_index_builder/core/Dockerfile) to install [memory_profiler](https://pypi.org/project/memory-profiler/).
145+
```
146+
RUN conda install -c conda-forge memory_profiler -y
147+
```
148+
149+
2. In the file that contains the function that needs to be profiled, add the import and an `@profile` annotation on the function.
150+
```
151+
from memory_profiler import profile
152+
153+
@profile
154+
def my_func():
155+
a = [1] * (10 ** 6)
156+
b = [2] * (2 * 10 ** 7)
157+
del b
158+
return a
159+
```

0 commit comments

Comments
 (0)