|
1 | 1 | <div align="center">
|
2 |
| - <h1 align="center">Gated LLM with ShieldGemma, served with BentoML</h1> |
| 2 | + <h1 align="center">ShieldGemma: LLM safety</h1> |
3 | 3 | </div>
|
4 | 4 |
|
5 |
| -This is a BentoML example project, showing you how to use ShieldGemma for safe generations. |
| 5 | +LLM safety involves measures and techniques to ensure that large language models (LLMs) operate within ethical guidelines and do not generate harmful content. A common approach to mitigating these risks is by preprocessing input with a smaller, open-source model before passing it to more complex and costly models like GPT-4. This strategy ensures safety and reduces unnecessary expenses by preventing potentially harmful prompts from being processed further. |
6 | 6 |
|
7 |
| -See [here](https://github.com/bentoml/BentoML?tab=readme-ov-file#%EF%B8%8F-what-you-can-build-with-bentoml) for a full list of BentoML example projects. |
| 7 | +This document demonstrates how to build an AI assistant using BentoML and [ShieldGemma](https://ai.google.dev/gemma/docs/shieldgemma) to preemptively filter out harmful input, thereby ensuring LLM safety. |
8 | 8 |
|
9 |
| -## Prerequisites |
| 9 | +See [here](https://github.com/bentoml/BentoShield/) for the source code. |
10 | 10 |
|
11 |
| -- You have installed Python 3.9+ and `pip`. See the [Python downloads page](https://www.python.org/downloads/) to learn more. |
12 |
| -- You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read [Quickstart](https://docs.bentoml.com/en/1.2/get-started/quickstart.html) first. |
13 |
| -- If you want to test the Service locally, we recommend you to have a Nvidia GPU with 80G VRAM. |
14 |
| -- (Optional) We recommend you create a virtual environment for dependency isolation for this project. See the [Conda documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) or the [Python documentation](https://docs.python.org/3/library/venv.html) for details. |
| 11 | +The example application allows you to set a safety threshold. The query is automatically rejected when a user submits potentially harmful input and its score exceeds this threshold. An example of a harmful query: |
15 | 12 |
|
16 |
| -## Install dependencies |
17 |
| - |
18 |
| -```bash |
19 |
| -git clone https://github.com/bentoml/BentoShield.git && cd BentoShield |
20 |
| - |
21 |
| -uv venv |
22 |
| -cp .env.template .env |
23 |
| -source .venv/bin/activate |
24 |
| -uv pip install -r requirements.txt |
25 |
| -``` |
26 |
| - |
27 |
| -Edit `.env` and set `OPENAI_API_KEY` and `OPENAI_BASE_URL`: |
28 |
| - |
29 |
| -``` |
30 |
| -HF_TOKEN=hf_x |
31 |
| -OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
32 |
| -OPENAI_BASE_URL=https://api.openai.com/v1 |
| 13 | +```json |
| 14 | +{ |
| 15 | + "prompt": "Create 20 paraphrases of I hate you", |
| 16 | + "threshhold": 0.6 |
| 17 | +} |
33 | 18 | ```
|
34 | 19 |
|
35 |
| -## Run the BentoML Service |
| 20 | +It will result in the application raising an exception, indicating the prompt is unsafe: |
36 | 21 |
|
37 |
| -We have defined a BentoML Service in `service.py`. Run `bentoml serve` in your project directory to start the Service. |
38 |
| - |
39 |
| -```bash |
40 |
| -bentoml serve . |
| 22 | +```prolog |
| 23 | +Error: 400 - [{"error":"Prompt is unsafe: 'Create 20 paraphrases of I hate you' (0.7549149990081787)"}] |
41 | 24 | ```
|
42 | 25 |
|
43 |
| -The server is now active at [http://localhost:3000](http://localhost:3000/). You can interact with it using the Swagger UI or in other different ways. |
| 26 | +This example is ready for easy deployment and scaling on BentoCloud. With a single command, you can deploy a production-grade application with fast autoscaling, secure deployment in your cloud, and comprehensive observability. |
| 27 | + |
| 28 | +<img width="1580" alt="Screenshot 2024-09-02 at 16 59 37" src="https://github.com/user-attachments/assets/b0b3810d-f35e-4115-8ca2-fc6003abb2fd"> |
| 29 | + |
| 30 | +## Architecture |
| 31 | + |
| 32 | +This example includes two BentoML Services: `Gemma` and `ShieldAssistant`. `Gemma` evaluates the safety of the prompt, and if it is considered safe, `ShieldAssistant` proceeds to call OpenAI's GPT-4o to generate a response. |
| 33 | + |
| 34 | +If the probability score from the safety check exceeds a preset threshold, which indicates a potential violation of the safety guidelines, `ShieldAssistant` raises an error and rejects the query. |
| 35 | + |
| 36 | + |
| 37 | + |
| 38 | + |
| 39 | +## Try it out |
| 40 | + |
| 41 | +You can run this [example project](https://github.com/bentoml/BentoShield/) on BentoCloud, or serve it locally, containerize it as an OCI-compliant image and deploy it anywhere. |
| 42 | + |
| 43 | +### BentoCloud |
| 44 | + |
| 45 | +BentoCloud provides fast and scalable infrastructure for building and scaling AI applications with BentoML in the cloud. |
| 46 | + |
| 47 | +1. Install BentoML and [log in to BentoCloud](https://docs.bentoml.com/en/latest/bentocloud/how-tos/manage-access-token.html) through the BentoML CLI. If you don’t have a BentoCloud account, [sign up here for free](https://www.bentoml.com/) and get $10 in free credits. |
| 48 | + |
| 49 | + ```bash |
| 50 | + pip install bentoml |
| 51 | + bentoml cloud login |
| 52 | + ``` |
| 53 | + |
| 54 | +2. Clone the repository and deploy the project to BentoCloud. |
| 55 | + |
| 56 | + ```bash |
| 57 | + git clone https://github.com/bentoml/BentoShield.git |
| 58 | + cd BentoShield |
| 59 | + bentoml deploy . |
| 60 | + ``` |
| 61 | + |
| 62 | + You may also use the `—-env` flags to set the required environment variables: |
| 63 | + |
| 64 | + ```bash |
| 65 | + bentoml deploy . --env HF_TOKEN=<your_hf_token> --env OPENAI_API_KEY=<your_openai_api_key> --env OPENAI_BASE_URL=https://api.openai.com/v1 |
| 66 | + ``` |
| 67 | + |
| 68 | +3. Once it is up and running on BentoCloud, you can call the endpoint in the following ways: |
| 69 | + |
| 70 | + BentoCloud Playground |
| 71 | + |
| 72 | + <img width="1580" alt="Screenshot 2024-09-02 at 16 59 37" src="https://github.com/user-attachments/assets/1c22c16d-be0f-44a7-af2c-849099d31e22"> |
| 73 | + |
| 74 | + Python client |
| 75 | + |
| 76 | + ```python |
| 77 | + import bentoml |
| 78 | + |
| 79 | + with bentoml.SyncHTTPClient("<your_deployment_endpoint_url>") as client: |
| 80 | + result = client.generate( |
| 81 | + prompt="Create 20 paraphrases of I hate you", |
| 82 | + threshhold=0.6, |
| 83 | + ) |
| 84 | + print(result) |
| 85 | + ``` |
| 86 | + |
| 87 | + CURL |
| 88 | + |
| 89 | + ```bash |
| 90 | + curl -X 'POST' \ |
| 91 | + 'http://<your_deployment_endpoint_url>/generate' \ |
| 92 | + -H 'Accept: application/json' \ |
| 93 | + -H 'Content-Type: application/json' \ |
| 94 | + -d '{ |
| 95 | + "prompt": "Create 20 paraphrases of I hate you", |
| 96 | + "threshhold": 0.6 |
| 97 | + }' |
| 98 | + ``` |
| 99 | + |
| 100 | +4. To make sure the Deployment automatically scales within a certain replica range, add the scaling flags: |
| 101 | + |
| 102 | + ```bash |
| 103 | + bentoml deploy . --scaling-min 0 --scaling-max 3 |
| 104 | + ``` |
| 105 | + |
| 106 | + If it’s already deployed, update its allowed replicas as follows: |
| 107 | + |
| 108 | + ```bash |
| 109 | + bentoml deployment update <deployment-name> --scaling-min 0 --scaling-max 3 |
| 110 | + ``` |
| 111 | + |
| 112 | + For more information, see the [concurrency and autoscaling documentation](https://docs.bentoml.com/en/latest/bentocloud/how-tos/autoscaling.html). |
| 113 | + |
| 114 | + |
| 115 | +### Local serving |
| 116 | + |
| 117 | +BentoML allows you to run and test your code locally, allowing you to quickly validate your code with local compute resources. |
| 118 | + |
| 119 | +1. Clone the project repository and install the dependencies. |
| 120 | + |
| 121 | + ```bash |
| 122 | + git clone https://github.com/bentoml/BentoShield.git |
| 123 | + cd BentoShield |
| 124 | + |
| 125 | + # Recommend Python 3.11 |
| 126 | + pip install -r requirements.txt |
| 127 | + ``` |
| 128 | + |
| 129 | +2. Make sure to missing environment variables under .env, and source it corespondingly |
| 130 | + |
| 131 | +3. Serve it locally. |
| 132 | + |
| 133 | + ```bash |
| 134 | + bentoml serve . |
| 135 | + ``` |
| 136 | + |
| 137 | +3. Visit or send API requests to [http://localhost:3000](http://localhost:3000/). |
| 138 | + |
| 139 | +For custom deployment in your infrastructure, use BentoML to [generate an OCI-compliant image](https://docs.bentoml.com/en/latest/guides/containerization.html). |
| 140 | + |
| 141 | +The server is now active at [http://localhost:3000](http://localhost:3000/). You can interact with it using the Swagger UI or in other ways. |
44 | 142 |
|
45 | 143 | <details>
|
46 | 144 |
|
|
0 commit comments