Skip to content

Commit 7cd1995

Browse files
committed
Add Developer guide
Signed-off-by: Rajvaibhav Rahane <[email protected]>
1 parent 4b54439 commit 7cd1995

File tree

1 file changed

+163
-0
lines changed

1 file changed

+163
-0
lines changed

DEVELOPER_GUIDE.md

+163
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
[Developer Guide](#developer-guide)
2+
- [Getting Started](#getting-started)
3+
- [Fork OpenSearch Remote Vector Index Builder Repo](#fork-remote-vector-index-builder-repo)
4+
- [Install Prerequisites](#install-prerequisites)
5+
- [Python Dependencies](#python-dependencies)
6+
- [Python Guide](#python-guide)
7+
- [Language Formatting Guide](#language-formatting-guide)
8+
- [Testing Guide](#testing-guide)
9+
- [Building Docker Images](#building-docker-images)
10+
- [Faiss Base Image](#faiss-base-image)
11+
- [Core Image](#core-image)
12+
- [Memory Profiling](#memory-profiling)
13+
- [GPU Memory Profiling with NVIDIA SMI](#gpu-memory-profiling-with-nvidia-smi)
14+
- [CPU Memory Profiling with memory_profiler](#cpu-memory-profiling-with-memory_profiler)
15+
16+
# Developer Guide
17+
18+
So you want to contribute code to OpenSearch Remote Vector Index Builder? Excellent! We're glad you're here. Here's what you need to do.
19+
20+
## Getting Started
21+
22+
### Fork Remote Vector Index Builder Repo
23+
24+
Fork [opensearch-project/OpenSearch Remote Vector Index Builder](https://github.com/opensearch-project/remote-vector-index-builder) and clone locally.
25+
26+
Example:
27+
```
28+
git clone https://github.com/[username]/remote-vector-index-builder.git
29+
```
30+
31+
### Install Prerequisites
32+
33+
#### Python Dependencies
34+
Core Dependencies:
35+
```
36+
pip install -r remote_vector_index_builder/core/requirements.txt
37+
```
38+
39+
Test Dependencies:
40+
```
41+
pip install -r test_remote_vector_index_builder/requirements.txt
42+
```
43+
44+
## Python Guide
45+
### Language Formatting Guide
46+
Run the following commands from the root folder. Configuration of below tools can be found in [`setup.cfg`](setup.cfg).
47+
48+
The code lint check can be run with:
49+
```
50+
flake8 remote_vector_index_builder/ test_remote_vector_index_builder/
51+
```
52+
53+
The formatting check can be run with:
54+
```
55+
black --check remote_vector_index_builder/ test_remote_vector_index_builder/
56+
```
57+
58+
The code can be formatted with:
59+
```
60+
black remote_vector_index_builder/ test_remote_vector_index_builder/
61+
```
62+
63+
### Testing Guide
64+
The static type checking can be done with:
65+
```
66+
mypy remote_vector_index_builder/ test_remote_vector_index_builder/
67+
```
68+
The python tests can be run with:
69+
```
70+
pytest test_remote_vector_index_builder/
71+
```
72+
73+
## Building Docker Images
74+
The Github CIs automatically publish snapshot images to Dockerhub at [opensearchstaging/remote-vector-index-builder](https://hub.docker.com/r/opensearchstaging/remote-vector-index-builder).
75+
76+
The following are the commands to build the images locally:
77+
78+
### Faiss Base Image
79+
The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a submodule in this repository. Run the below command to initialize the submodule first.
80+
```
81+
git submodule update --init
82+
```
83+
The Faiss Base Image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed.
84+
85+
Please see the section [Provisioning an instance for development](#provisioning-an-instance-for-development) to provision an instance for development.
86+
87+
Run the below command to create the Faiss base image:
88+
```
89+
docker build -f ./base_image/build_scripts/Dockerfile . -t opensearchstaging/remote-vector-index-builder:faiss-base-latest
90+
```
91+
92+
### Core Image
93+
The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for core index build functionalities:
94+
1. Building an Index
95+
2. Object Store I/O
96+
97+
Build an image with the above core functionalities:
98+
```
99+
docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchstaging/remote-vector-index-builder:core-latest
100+
```
101+
102+
## Provisioning an instance for development
103+
104+
A NVIDIA GPU Powered machine with CUDA Toolkit installed is required to build a Faiss Base Image, and to run the docker images to build an index.
105+
106+
Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge instance running a Deep Learning OSS Nvidia Driver AMI, with Docker CLI installed is recommended for development use.
107+
108+
[Setup an EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html)
109+
110+
## Memory Profiling
111+
112+
Simple memory profiling can be done during development to get memory usage statistics during the Index Build process.
113+
114+
### GPU Memory Profiling with NVIDIA SMI
115+
116+
1. Install [py3nvml](https://pypi.org/project/py3nvml/):
117+
118+
In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline.
119+
120+
2. Run the following command:
121+
```
122+
pip install --no-cache-dir --upgrade -r /remote_vector_index_builder/core/requirements.txt
123+
```
124+
125+
3. Add import statement and initialize method in the file containing the driver code.
126+
```
127+
from py3nvml import nvidia_smi
128+
nvidia_smi.nvmlInit()
129+
```
130+
131+
4. Define and call the below method wherever necessary.
132+
133+
e.g. before and after calling the GPU index cleanup method.
134+
```
135+
from py3nvml import nvidia_smi
136+
137+
def get_gpu_memory():
138+
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) # GPU Device ID
139+
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
140+
print(f"Total: {info.total/1024**2:.2f}MB")
141+
print(f"Free: {info.free/1024**2:.2f}MB")
142+
print(f"Used: {info.used/1024**2:.2f}MB")
143+
144+
```
145+
146+
### CPU Memory Profiling with memory_profiler
147+
148+
1. Add the below command in [`/remote_vector_index_builder/core/Dockerfile`](/remote_vector_index_builder/core/Dockerfile) to install [memory_profiler](https://pypi.org/project/memory-profiler/).
149+
```
150+
RUN conda install -c conda-forge memory_profiler -y
151+
```
152+
153+
2. In the file that contains the function that needs to be profiled, add the import and an `@profile` annotation on the function.
154+
```
155+
from memory_profiler import profile
156+
157+
@profile
158+
def my_func():
159+
a = [1] * (10 ** 6)
160+
b = [2] * (2 * 10 ** 7)
161+
del b
162+
return a
163+
```

0 commit comments

Comments
 (0)