Skip to content

Commit d91a42c

Browse files
committed
Added API documentation
Signed-off-by: Rohan Chitale <[email protected]>
1 parent a922f0a commit d91a42c

File tree

1 file changed

+83
-6
lines changed

1 file changed

+83
-6
lines changed

DEVELOPER_GUIDE.md

+83-6
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,11 @@ Core Dependencies:
4141
pip install -r remote_vector_index_builder/core/requirements.txt
4242
```
4343

44+
API Dependencies:
45+
```
46+
pip install -r remote_vector_index_builder/app/requirements.txt
47+
```
48+
4449
Test Dependencies:
4550
```
4651
pip install -r test_remote_vector_index_builder/requirements.txt
@@ -76,6 +81,21 @@ pytest test_remote_vector_index_builder/
7681
```
7782

7883
## Building Docker Images
84+
85+
There are 3 images vended by this repository:
86+
- `faiss-base`
87+
- Uses NVIDIA CUDA image as a base image
88+
- Adds the dependencies needed to install `faiss`, and builds `faiss` from source
89+
- `core`
90+
- Uses `faiss-base` as a base image
91+
- Adds the code for the core index build functionalities: building an index and remote store I/O
92+
- `api`
93+
- Uses `core` as a base image
94+
- Adds the code for a `fastAPI` server with `_build` and `_status` APIs
95+
- The `_build` API triggers an index build workflow and returns a job id to the caller
96+
- The `_status` API gets the status of the workflow, given a job id
97+
- The index build workflow is executed in the background, using the `core` library functions
98+
7999
The Github CIs automatically publish snapshot images to Dockerhub at [opensearchstaging/remote-vector-index-builder](https://hub.docker.com/r/opensearchstaging/remote-vector-index-builder).
80100

81101
The following are the commands to build the images locally:
@@ -85,25 +105,38 @@ The [Faiss repository](https://github.com/facebookresearch/faiss/) is added as a
85105
```
86106
git submodule update --init
87107
```
88-
The Faiss base image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed.
108+
The `faiss-base` image can only be created on an NVIDIA GPU powered machine with CUDA Toolkit installed.
89109

90110
Please see the section [Provisioning an instance for development](#provisioning-an-instance-for-development) to provision an instance for development.
91111

92-
Run the below command to create the Faiss base image:
112+
Run the below command to create the `faiss-base` image:
93113
```
94114
docker build -f ./base_image/build_scripts/Dockerfile . -t opensearchstaging/remote-vector-index-builder:faiss-base-latest
95115
```
96116

97117
### Core Image
98-
The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for core index build functionalities:
99-
1. Building an Index
100-
2. Object Store I/O
118+
The path [`/remote-vector-index-builder/core`](/remote_vector_index_builder/core/) contains the code for the `core` image.
119+
Run the below command to create the `core` image:
101120

102-
Build an image with the above core functionalities:
103121
```
104122
docker build -f ./remote_vector_index_builder/core/Dockerfile . -t opensearchstaging/remote-vector-index-builder:core-latest
105123
```
106124

125+
The image can be built on any type of machine with `docker` installed
126+
127+
### API Image
128+
The path [`/remote-vector-index-builder/app`](/remote_vector_index_builder/app/) contains the code for the `api` image.
129+
Run the below command to create `api` image.
130+
131+
```
132+
docker build -f ./remote_vector_index_builder/app/Dockerfile . -t opensearchstaging/remote-vector-index-builder:api-latest
133+
```
134+
135+
The image can be built on any type of machine with `docker` installed
136+
137+
Note that any docker image tag can be used when building locally. It may be more useful to tag the image with a personal dockerhub name,
138+
in order to push the image to your remote dockerhub account.
139+
107140
## Provisioning an instance for development
108141

109142
A NVIDIA GPU powered machine with CUDA Toolkit installed is required to build a Faiss base image and to run the Docker images to build an index.
@@ -112,6 +145,49 @@ Typically an [EC2 G5](https://aws.amazon.com/ec2/instance-types/g5/) 2xlarge ins
112145

113146
[Setup an EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html)
114147

148+
## Running Docker Images
149+
150+
### Run the API Image as standalone container
151+
The API image implements the [Remote Vector Service API Contract](/API.md). It provides out-of-the-box APIs
152+
that directly integrate with the OpenSearch k-NN Remote Vector Service Client. The default image
153+
configuration allows it to be used to run integration tests for the OpenSearch k-NN Remote Vector component.
154+
This configuration maintains the job state in an in-memory dictionary with a TTL, and uses a fixed size
155+
thread pool to execute index build workflows.
156+
157+
158+
The dictionary size and TTL is controllable via the Docker container settings. For example, you can set the TTL
159+
to `None`, to ensure requests never get deleted. You are also free to implement a separate, custom API image,
160+
as long as it conforms to the Remote Vector Service API Contract and provides endpoints for the Remote Vector Service Client.
161+
This custom API image can still use the `core` image libraries to execute the index build workflow.
162+
163+
Follow the steps below to use run the API image locally. Note that s3 is currently the only supported Remote Store repository
164+
1. [Provision an instance for development](#provisioning-an-instance-for-development)
165+
2. Create a s3 bucket, upload vector and doc id binaries to the bucket
166+
3. Ensure the instance has AWS credentials to connect with the s3 bucket
167+
- Use any option from 3-11: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials
168+
4. Clone the repository to the instance : `git clone https://github.com/opensearch-project/remote-vector-index-builder.git`
169+
5. Change into the top-level directory: `cd remote-vector-index-builder`
170+
6. Build the docker image: `docker build -f ./remote_vector_index_builder/app/Dockerfile . -t opensearchstaging/remote-vector-index-builder:api-latest`
171+
- Note that any image tag can be used, not just `opensearchstaging/remote-vector-index-builder:api-latest`
172+
7. Run the docker image: `docker run --gpus all -p 80:1025 opensearchstaging/remote-vector-index-builder:api-latest`
173+
8. In a separate terminal, issue a build request:
174+
```
175+
curl -XPOST "http://0.0.0.0:80/_build" \
176+
-H 'Content-Type: application/json' \
177+
-d '
178+
{
179+
"repository_type": "s3",
180+
"container_name": "<your_s3_bucket>>",
181+
"vector_path": "<vector_path_in_s3_bucket>",
182+
"doc_id_path": "<doc_id_path_in_s3_bucket>",
183+
"dimension": "<vector dimension>",
184+
"doc_count": "<number of vectors>"
185+
}
186+
'
187+
```
188+
This will return a job id, if the build request was successfully submitted.
189+
9. Check the status of your build request: `curl -XGET "http://0.0.0.0:80/_status/<job_id>"`
190+
115191
## Memory Profiling
116192
117193
Simple memory profiling can be done during development to get memory usage statistics during the Index Build process.
@@ -157,3 +233,4 @@ def my_func():
157233
del b
158234
return a
159235
```
236+

0 commit comments

Comments
 (0)