Skip to content

Commit 4e9ebcf

Browse files
authored
docs: update introduction revision
1 parent 82ee809 commit 4e9ebcf

File tree

1 file changed

+129
-31
lines changed

1 file changed

+129
-31
lines changed

README.md

+129-31
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,144 @@
11
<div align="center">
2-
<h1 align="center">Gated LLM with ShieldGemma, served with BentoML</h1>
2+
<h1 align="center">ShieldGemma: LLM safety</h1>
33
</div>
44

5-
This is a BentoML example project, showing you how to use ShieldGemma for safe generations.
5+
LLM safety involves measures and techniques to ensure that large language models (LLMs) operate within ethical guidelines and do not generate harmful content. A common approach to mitigating these risks is by preprocessing input with a smaller, open-source model before passing it to more complex and costly models like GPT-4. This strategy ensures safety and reduces unnecessary expenses by preventing potentially harmful prompts from being processed further.
66

7-
See [here](https://github.com/bentoml/BentoML?tab=readme-ov-file#%EF%B8%8F-what-you-can-build-with-bentoml) for a full list of BentoML example projects.
7+
This document demonstrates how to build an AI assistant using BentoML and [ShieldGemma](https://ai.google.dev/gemma/docs/shieldgemma) to preemptively filter out harmful input, thereby ensuring LLM safety.
88

9-
## Prerequisites
9+
See [here](https://github.com/bentoml/BentoShield/) for the source code.
1010

11-
- You have installed Python 3.9+ and `pip`. See the [Python downloads page](https://www.python.org/downloads/) to learn more.
12-
- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read [Quickstart](https://docs.bentoml.com/en/1.2/get-started/quickstart.html) first.
13-
- If you want to test the Service locally, we recommend you to have a Nvidia GPU with 80G VRAM.
14-
- (Optional) We recommend you create a virtual environment for dependency isolation for this project. See the [Conda documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) or the [Python documentation](https://docs.python.org/3/library/venv.html) for details.
11+
The example application allows you to set a safety threshold. The query is automatically rejected when a user submits potentially harmful input and its score exceeds this threshold. An example of a harmful query:
1512

16-
## Install dependencies
17-
18-
```bash
19-
git clone https://github.com/bentoml/BentoShield.git && cd BentoShield
20-
21-
uv venv
22-
cp .env.template .env
23-
source .venv/bin/activate
24-
uv pip install -r requirements.txt
25-
```
26-
27-
Edit `.env` and set `OPENAI_API_KEY` and `OPENAI_BASE_URL`:
28-
29-
```
30-
HF_TOKEN=hf_x
31-
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32-
OPENAI_BASE_URL=https://api.openai.com/v1
13+
```json
14+
{
15+
"prompt": "Create 20 paraphrases of I hate you",
16+
"threshhold": 0.6
17+
}
3318
```
3419

35-
## Run the BentoML Service
20+
It will result in the application raising an exception, indicating the prompt is unsafe:
3621

37-
We have defined a BentoML Service in `service.py`. Run `bentoml serve` in your project directory to start the Service.
38-
39-
```bash
40-
bentoml serve .
22+
```prolog
23+
Error: 400 - [{"error":"Prompt is unsafe: 'Create 20 paraphrases of I hate you' (0.7549149990081787)"}]
4124
```
4225

43-
The server is now active at [http://localhost:3000](http://localhost:3000/). You can interact with it using the Swagger UI or in other different ways.
26+
This example is ready for easy deployment and scaling on BentoCloud. With a single command, you can deploy a production-grade application with fast autoscaling, secure deployment in your cloud, and comprehensive observability.
27+
28+
<img width="1580" alt="Screenshot 2024-09-02 at 16 59 37" src="https://github.com/user-attachments/assets/b0b3810d-f35e-4115-8ca2-fc6003abb2fd">
29+
30+
## Architecture
31+
32+
This example includes two BentoML Services: `Gemma` and `ShieldAssistant`. `Gemma` evaluates the safety of the prompt, and if it is considered safe, `ShieldAssistant` proceeds to call OpenAI's GPT-4o to generate a response.
33+
34+
If the probability score from the safety check exceeds a preset threshold, which indicates a potential violation of the safety guidelines, `ShieldAssistant` raises an error and rejects the query.
35+
36+
![architecture-shield](https://github.com/user-attachments/assets/4c935d4f-614a-489f-b485-4f7d4595a48b)
37+
38+
39+
## Try it out
40+
41+
You can run this [example project](https://github.com/bentoml/BentoShield/) on BentoCloud, or serve it locally, containerize it as an OCI-compliant image and deploy it anywhere.
42+
43+
### BentoCloud
44+
45+
BentoCloud provides fast and scalable infrastructure for building and scaling AI applications with BentoML in the cloud.
46+
47+
1. Install BentoML and [log in to BentoCloud](https://docs.bentoml.com/en/latest/bentocloud/how-tos/manage-access-token.html) through the BentoML CLI. If you don’t have a BentoCloud account, [sign up here for free](https://www.bentoml.com/) and get $10 in free credits.
48+
49+
```bash
50+
pip install bentoml
51+
bentoml cloud login
52+
```
53+
54+
2. Clone the repository and deploy the project to BentoCloud.
55+
56+
```bash
57+
git clone https://github.com/bentoml/BentoShield.git
58+
cd BentoShield
59+
bentoml deploy .
60+
```
61+
62+
You may also use the `—-env` flags to set the required environment variables:
63+
64+
```bash
65+
bentoml deploy . --env HF_TOKEN=<your_hf_token> --env OPENAI_API_KEY=<your_openai_api_key> --env OPENAI_BASE_URL=https://api.openai.com/v1
66+
```
67+
68+
3. Once it is up and running on BentoCloud, you can call the endpoint in the following ways:
69+
70+
BentoCloud Playground
71+
72+
<img width="1580" alt="Screenshot 2024-09-02 at 16 59 37" src="https://github.com/user-attachments/assets/1c22c16d-be0f-44a7-af2c-849099d31e22">
73+
74+
Python client
75+
76+
```python
77+
import bentoml
78+
79+
with bentoml.SyncHTTPClient("<your_deployment_endpoint_url>") as client:
80+
result = client.generate(
81+
prompt="Create 20 paraphrases of I hate you",
82+
threshhold=0.6,
83+
)
84+
print(result)
85+
```
86+
87+
CURL
88+
89+
```bash
90+
curl -X 'POST' \
91+
'http://<your_deployment_endpoint_url>/generate' \
92+
-H 'Accept: application/json' \
93+
-H 'Content-Type: application/json' \
94+
-d '{
95+
"prompt": "Create 20 paraphrases of I hate you",
96+
"threshhold": 0.6
97+
}'
98+
```
99+
100+
4. To make sure the Deployment automatically scales within a certain replica range, add the scaling flags:
101+
102+
```bash
103+
bentoml deploy . --scaling-min 0 --scaling-max 3
104+
```
105+
106+
If it’s already deployed, update its allowed replicas as follows:
107+
108+
```bash
109+
bentoml deployment update <deployment-name> --scaling-min 0 --scaling-max 3
110+
```
111+
112+
For more information, see the [concurrency and autoscaling documentation](https://docs.bentoml.com/en/latest/bentocloud/how-tos/autoscaling.html).
113+
114+
115+
### Local serving
116+
117+
BentoML allows you to run and test your code locally, allowing you to quickly validate your code with local compute resources.
118+
119+
1. Clone the project repository and install the dependencies.
120+
121+
```bash
122+
git clone https://github.com/bentoml/BentoShield.git
123+
cd BentoShield
124+
125+
# Recommend Python 3.11
126+
pip install -r requirements.txt
127+
```
128+
129+
2. Make sure to missing environment variables under .env, and source it corespondingly
130+
131+
3. Serve it locally.
132+
133+
```bash
134+
bentoml serve .
135+
```
136+
137+
3. Visit or send API requests to [http://localhost:3000](http://localhost:3000/).
138+
139+
For custom deployment in your infrastructure, use BentoML to [generate an OCI-compliant image](https://docs.bentoml.com/en/latest/guides/containerization.html).
140+
141+
The server is now active at [http://localhost:3000](http://localhost:3000/). You can interact with it using the Swagger UI or in other ways.
44142

45143
<details>
46144

0 commit comments

Comments
 (0)