Skip to content

Commit f7d56e7

Browse files
committed
Add Developer guide
Signed-off-by: Rajvaibhav Rahane <[email protected]>
1 parent 4b54439 commit f7d56e7

File tree

1 file changed

+150
-0
lines changed

1 file changed

+150
-0
lines changed

DEVELOPER_GUIDE.md

+150
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
[Developer Guide](#developer-guide)
2+
- [Getting Started](#getting-started)
3+
- [Fork OpenSearch Remote Vector Index Builder Repo](#fork-remote-vector-index-builder-repo)
4+
- [Install Prerequisites](#install-prerequisites)
5+
- [Python Dependencies](#python-dependencies)
6+
- [Python Guide](#python-guide)
7+
- [Language Formatting Guide](#language-formatting-guide)
8+
- [Testing Guide](#testing-guide)
9+
- [Building Docker Images](#building-docker-images)
10+
- [Faiss Base Image](#faiss-base-image)
11+
- [Core Image](#core-image)
12+
- [Memory Profiling](#memory-profiling)
13+
- [GPU Memory Profiling with NVIDIA SMI](#gpu-memory-profiling-with-nvidia-smi)
14+
- [CPU Memory Profiling with memory_profiler](#cpu-memory-profiling-with-memory_profiler)
15+
16+
# Developer Guide
17+
18+
So you want to contribute code to OpenSearch Remote Vector Index Builder? Excellent! We're glad you're here. Here's what you need to do.
19+
20+
## Getting Started
21+
22+
### Fork Remote Vector Index Builder Repo
23+
24+
Fork [opensearch-project/OpenSearch Remote Vector Index Builder](https://github.com/opensearch-project/remote-vector-index-builder) and clone locally.
25+
26+
Example:
27+
```
28+
git clone https://github.com/[username]/remote-vector-index-builder.git
29+
```
30+
31+
### Install Prerequisites
32+
33+
#### Python Dependencies
34+
Core Dependencies:
35+
```
36+
pip install -r remote_vector_index_builder/core/requirements.txt
37+
```
38+
39+
Test Dependencies:
40+
```
41+
pip install -r test_remote_vector_index_builder/requirements.txt
42+
```
43+
44+
## Python Guide
45+
### Language Formatting Guide
46+
Run the following commands from the root folder. Configuration of below tools can be found in [`setup.cfg`](setup.cfg).
47+
48+
The code lint check can be run with:
49+
```
50+
flake8 remote_vector_index_builder/ test_remote_vector_index_builder/
51+
```
52+
53+
The formatting check can be run with:
54+
```
55+
black --check remote_vector_index_builder/ test_remote_vector_index_builder/
56+
```
57+
58+
The code can be formatted with:
59+
```
60+
black remote_vector_index_builder/ test_remote_vector_index_builder/
61+
```
62+
63+
### Testing Guide
64+
Mypy checks:
65+
```
66+
mypy remote_vector_index_builder/ test_remote_vector_index_builder/
67+
```
68+
Pytest checks:
69+
```
70+
pytest test_remote_vector_index_builder/
71+
```
72+
73+
## Building Docker Images
74+
The Github CIs automatically publish snapshot images to Dockerhub at [opensearchstaging/remote-vector-index-builder](https://hub.docker.com/r/opensearchstaging/remote-vector-index-builder).
75+
76+
The following are the commands to build the images locally:
77+
78+
### Faiss Base Image
79+
The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a submodule in this repository. Run the below command to initialize the submodule first.
80+
```
81+
git submodule update --init
82+
```
83+
Then run the below command to create the Faiss base image:
84+
```
85+
docker build -f ./base_image/build_scripts/Dockerfile . -t opensearchstaging/remote-vector-index-builder:faiss-base-latest
86+
```
87+
88+
### Core Image
89+
The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for core index build functionalities:
90+
1. Building an Index
91+
2. Object Store I/O
92+
93+
Build an image with the above core functionalities:
94+
```
95+
docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchstaging/remote-vector-index-builder:core-latest
96+
```
97+
98+
## Memory Profiling
99+
100+
### GPU Memory Profiling with NVIDIA SMI
101+
102+
1. Install [py3nvml](https://pypi.org/project/py3nvml/):
103+
104+
In [`/remote_vector_index_builder/core/requirements.txt`](/remote_vector_index_builder/core/requirements.txt) add py3nvml on a newline.
105+
106+
2. Run the following command:
107+
```
108+
pip install --no-cache-dir --upgrade -r /remote_vector_index_builder/core/requirements.txt
109+
```
110+
111+
3. Add import statement and initialize method in the file containing the driver code.
112+
```
113+
from py3nvml import nvidia_smi
114+
nvidia_smi.nvmlInit()
115+
```
116+
117+
4. Define and call the below method wherever necessary.
118+
119+
e.g. before and after calling the GPU index cleanup method.
120+
```
121+
from py3nvml import nvidia_smi
122+
123+
def get_gpu_memory():
124+
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) # GPU Device ID
125+
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
126+
print(f"Total: {info.total/1024**2:.2f}MB")
127+
print(f"Free: {info.free/1024**2:.2f}MB")
128+
print(f"Used: {info.used/1024**2:.2f}MB")
129+
130+
```
131+
132+
### CPU Memory Profiling with memory_profiler
133+
134+
1. Add the below command in [`/remote_vector_index_builder/core/Dockerfile`](/remote_vector_index_builder/core/Dockerfile) to install [memory_profiler](https://pypi.org/project/memory-profiler/).
135+
```
136+
RUN conda install -c conda-forge memory_profiler -y
137+
```
138+
139+
2. In the file that contains the function that needs to be profiled, add the import and an `@profile` annotation on the function.
140+
```
141+
from memory_profiler import profile
142+
143+
@profile
144+
def my_func():
145+
a = [1] * (10 ** 6)
146+
b = [2] * (2 * 10 ** 7)
147+
del b
148+
return a
149+
```
150+
3. [Rebuild the core image](#core-image) and run it.

0 commit comments

Comments
 (0)