Skip to content

Commit aa21e52

Browse files
authored
v1.1 Updated Doco (#182)
* Updated Doco for 1.1 * Delete Q&A
1 parent deb84e4 commit aa21e52

27 files changed

+226
-63
lines changed

.github/workflows/pytest.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ on:
1313

1414
jobs:
1515
check:
16+
if: github.event.pull_request.draft == false
1617
runs-on: ubuntu-latest
1718
services:
1819
docker:

docs/content/client/chatbot/_index.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,28 @@ Once you've selected a model, you can change the different model parameters to h
4343

4444
For more details on the parameters, ask the Chatbot or review [Concepts for Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/concepts.htm).
4545

46-
## Retrieval Augmented Generation (RAG)
46+
## Toolkit
4747

48-
Once you've created embeddings using [Split/Embed](../tools/split_embed), the option to enable and disable RAG will be available. Once you've enabled RAG, if you have more than one [Vector Store](#vector-store) you will need select the one you want to work with.
48+
The {{< short_app_ref >}} provides tools to augment Large Language Models with your proprietary data using Retrieval Augmented Generation (**RAG**), including:
49+
* [Vector Search](#vector-search) for Unstructured Data
50+
* [SelectAI](#selectai) for Structured Data
4951

50-
![Chatbot RAG](images/chatbot_rag.png)
52+
53+
## Vector Search
54+
55+
Once you've created embeddings using [Split/Embed](../tools/split_embed), the option use Vector Search will be available. After selecting Vector Search, if you have more than one [Vector Store](#vector-store) you will need select the one you want to work with.
56+
57+
![Chatbot Vector Search](images/chatbot_vs.png)
5158

5259
Choose the type of Search you want performed and the additional parameters associated with that search.
5360

54-
## Vector Store
61+
### Vector Store
62+
63+
With Vector Search selected, if you have more than one Vector Store, you can select which one will be used for searching, otherwise it will default to the only one available. To choose a different Vector Store, click the "Reset" button to open up the available options.
64+
65+
66+
## SelectAI
67+
68+
Once you've [configured SelectAI](https://docs.oracle.com/en-us/iaas/autonomous-database-serverless/doc/select-ai-get-started.html#GUID-E9872607-42A6-43FA-9851-7B60430C21B7), the option to use SelectAI will be available. After selecting the SelectAI toolkit, a profile and the default narrate option will automatically be selected. If you have more then one profile, you can choose which one to use. You can also select different SelectAI actions.
5569

56-
With RAG enabled, if you have more than one Vector Store, you can select which one will be used for searching, otherwise it will default to the only one available. To choose a different Vector Store, click the "Reset" button to open up the available options.
70+
![Chatbot SelectAI](images/chatbot_selectai.png)
-31.4 KB
Binary file not shown.
56.5 KB
Loading
75.3 KB
Loading
-26.7 KB
Loading

docs/content/client/testbed/_index.md

Lines changed: 81 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,87 @@ title = '🧪 Testbed'
33
weight = 30
44
+++
55
<!--
6-
Copyright (c) 2024, 2025, Oracle and/or its affiliates.
6+
Copyright (c) 2023, 2024, Oracle and/or its affiliates.
77
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
88
-->
9+
Generating a Test Dataset of Q&A pairs using an external LLM accelerates testing phase. The {{< full_app_ref >}} integrates with a framework called [Giskard](https://www.giskard.ai/), designed for this purpose. Giskard analyzes documents to identify high-level topics related to the generated Q&A pairs and includes them in the Test Dataset. All Test Sets and Evaluations are stored in the database for future evaluations and reviews.
910

10-
{{% notice style="code" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
11-
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
12-
{{% /notice %}}
11+
![Generation](images/generation.png)
12+
13+
This generation phase is optional but often recommended to reduce the cost of proof-of-concepts, as manually creating test data requires significant human effort.
14+
15+
After generation, the questions are sent to the configured agent. Each answer is collected and compared to the expected answer using an LLM acting as a judge. The judge classifies the responses and provides justifications for each decision, as shown in the following diagram.
16+
17+
![Test](images/test.png)
18+
19+
20+
## Generation
21+
From the Testbed page, switch to **Generate Q&A Test Set** and upload as many documents you want. These documents will be embedded and analyzed by the selected Q&A Language/Embedding Models to generate a defined number of Q&A:
22+
23+
![GenerateNew](images/generate.png)
24+
25+
You can choose any of the models available to perform a Q&A generation process. You maybe interested in using a high profile, expensive model for the crucial dataset generation to evaluate the RAG application, while using a cheaper LLM Model to put into production.
26+
27+
This phase not only generates the number of Q&A you need, but it will analyze the document provided extracting a set of topics that could help to classify the questions generated and can help to find the area to be improved.
28+
29+
When the generation is over (it could take time):
30+
31+
![Generate](images/qa_dataset.png)
32+
33+
you can:
34+
35+
* delete a Q&A: clicking **Delete Q&A** you’ll drop the question from the final dataset if you consider it not meaningful;
36+
* modify the text of the **Question** and the **Reference answer**: if you are not agree, you can updated the raw text generated, according the **Reference context** that is it fixed, like the **Metadata**.
37+
38+
Your updates will automatically be stored in the database and you can also download the dataset.
39+
40+
The generation process it’s optional. If you have prepared a JSONL file with your Q&A, according this schema:
41+
42+
```text
43+
[
44+
{
45+
"id": <an alphanumeric unique id like ”2f6d5ec5–4111–4ba3–9569–86a7bec8f971">,
46+
"question":"<Question?>",
47+
"reference_answer":"<An example of answer considered right>",
48+
"reference_context":"<A piece of document by which has been extracted the question>",
49+
"conversation_history":[
50+
51+
],
52+
"metadata":{
53+
"question_type":"[simple|complex]",
54+
"seed_document_id":"<numeric>",
55+
"topic":"<topics>"
56+
}
57+
}
58+
]
59+
```
60+
61+
You can upload it:
62+
63+
![Upload](images/upload.png)
64+
65+
If you need an example, generate just one Q&A and download it then add to your own Q&As Test Dataset.
66+
67+
## Evaluation
68+
At this point, if you have generated or are using an existing Test Dataset, you can run an evaluation using the configuration parameters in the left hand side.
69+
70+
![Evaluation](images/evaluation.png)
71+
72+
The top part is related to the LLM are you going to be used for chat generation, and it includes the most relevant hyper-parameters to use in the call. The lower part it’s related to the Vector Store used in which, apart the **Embedding Model**, **Chunk Size**, **Chunk Overlap** and **Distance Strategy**, that are fixed and coming from the **Split/Embed** process you have to perform before, you can modify:
73+
74+
* **Top K**: how many chunks should be included in the prompt’s context from nearer to the question found;
75+
* **Search Type**: that could be Similarity or Maximal Marginal Relevance. The first one is it commonly used, but the second one it’s related to an Oracle DB23ai feature that allows to exclude similar chunks from the top K and give space in the list to different chunks providing more relevant information.
76+
77+
At the end of the evaluation it will be provided an **Overall Correctness Score**, that’s is simply the percentage of correct answers on the total number of questions submitted:
78+
79+
![Correctness](images/evaluation_report.png)
80+
81+
Moreover, a percentage by topics, the list of failures and the full list of Q&As will be evaluated. To each Q&A included into the test dataset, will be added:
82+
83+
* **agent_answer**: the actual answer provided by the RAG app;
84+
* **correctness**: a flag true/false that evaluates if the agent_answer matches the reference_answer;
85+
* **correctness_reason**: the reason why an answer has been evaluated wrong by the judge LLM.
86+
87+
The list of **Failures**, **Correctness by each Q&A**, as well as a **Report**, could be download and stored for future audit activities.
88+
89+
*In this way you can perform several tests using the same curated test dataset, generated or self-made, looking for the best performance RAG configuration*.
36.7 KB
Loading
197 KB
Loading
114 KB
Loading

0 commit comments

Comments
 (0)