You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/client/chatbot/_index.md
+19-5Lines changed: 19 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,14 +43,28 @@ Once you've selected a model, you can change the different model parameters to h
43
43
44
44
For more details on the parameters, ask the Chatbot or review [Concepts for Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/concepts.htm).
45
45
46
-
## Retrieval Augmented Generation (RAG)
46
+
## Toolkit
47
47
48
-
Once you've created embeddings using [Split/Embed](../tools/split_embed), the option to enable and disable RAG will be available. Once you've enabled RAG, if you have more than one [Vector Store](#vector-store) you will need select the one you want to work with.
48
+
The {{< short_app_ref >}} provides tools to augment Large Language Models with your proprietary data using Retrieval Augmented Generation (**RAG**), including:
49
+
*[Vector Search](#vector-search) for Unstructured Data
50
+
*[SelectAI](#selectai) for Structured Data
49
51
50
-

52
+
53
+
## Vector Search
54
+
55
+
Once you've created embeddings using [Split/Embed](../tools/split_embed), the option use Vector Search will be available. After selecting Vector Search, if you have more than one [Vector Store](#vector-store) you will need select the one you want to work with.
56
+
57
+

51
58
52
59
Choose the type of Search you want performed and the additional parameters associated with that search.
53
60
54
-
## Vector Store
61
+
### Vector Store
62
+
63
+
With Vector Search selected, if you have more than one Vector Store, you can select which one will be used for searching, otherwise it will default to the only one available. To choose a different Vector Store, click the "Reset" button to open up the available options.
64
+
65
+
66
+
## SelectAI
67
+
68
+
Once you've [configured SelectAI](https://docs.oracle.com/en-us/iaas/autonomous-database-serverless/doc/select-ai-get-started.html#GUID-E9872607-42A6-43FA-9851-7B60430C21B7), the option to use SelectAI will be available. After selecting the SelectAI toolkit, a profile and the default narrate option will automatically be selected. If you have more then one profile, you can choose which one to use. You can also select different SelectAI actions.
55
69
56
-
With RAG enabled, if you have more than one Vector Store, you can select which one will be used for searching, otherwise it will default to the only one available. To choose a different Vector Store, click the "Reset" button to open up the available options.
Copy file name to clipboardExpand all lines: docs/content/client/testbed/_index.md
+81-4Lines changed: 81 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,10 +3,87 @@ title = '🧪 Testbed'
3
3
weight = 30
4
4
+++
5
5
<!--
6
-
Copyright (c) 2024, 2025, Oracle and/or its affiliates.
6
+
Copyright (c) 2023, 2024, Oracle and/or its affiliates.
7
7
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
8
8
-->
9
+
Generating a Test Dataset of Q&A pairs using an external LLM accelerates testing phase. The {{< full_app_ref >}} integrates with a framework called [Giskard](https://www.giskard.ai/), designed for this purpose. Giskard analyzes documents to identify high-level topics related to the generated Q&A pairs and includes them in the Test Dataset. All Test Sets and Evaluations are stored in the database for future evaluations and reviews.
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
12
-
{{% /notice %}}
11
+

12
+
13
+
This generation phase is optional but often recommended to reduce the cost of proof-of-concepts, as manually creating test data requires significant human effort.
14
+
15
+
After generation, the questions are sent to the configured agent. Each answer is collected and compared to the expected answer using an LLM acting as a judge. The judge classifies the responses and provides justifications for each decision, as shown in the following diagram.
16
+
17
+

18
+
19
+
20
+
## Generation
21
+
From the Testbed page, switch to **Generate Q&A Test Set** and upload as many documents you want. These documents will be embedded and analyzed by the selected Q&A Language/Embedding Models to generate a defined number of Q&A:
22
+
23
+

24
+
25
+
You can choose any of the models available to perform a Q&A generation process. You maybe interested in using a high profile, expensive model for the crucial dataset generation to evaluate the RAG application, while using a cheaper LLM Model to put into production.
26
+
27
+
This phase not only generates the number of Q&A you need, but it will analyze the document provided extracting a set of topics that could help to classify the questions generated and can help to find the area to be improved.
28
+
29
+
When the generation is over (it could take time):
30
+
31
+

32
+
33
+
you can:
34
+
35
+
* delete a Q&A: clicking **Delete Q&A** you’ll drop the question from the final dataset if you consider it not meaningful;
36
+
* modify the text of the **Question** and the **Reference answer**: if you are not agree, you can updated the raw text generated, according the **Reference context** that is it fixed, like the **Metadata**.
37
+
38
+
Your updates will automatically be stored in the database and you can also download the dataset.
39
+
40
+
The generation process it’s optional. If you have prepared a JSONL file with your Q&A, according this schema:
41
+
42
+
```text
43
+
[
44
+
{
45
+
"id": <an alphanumeric unique id like ”2f6d5ec5–4111–4ba3–9569–86a7bec8f971">,
46
+
"question":"<Question?>",
47
+
"reference_answer":"<An example of answer considered right>",
48
+
"reference_context":"<A piece of document by which has been extracted the question>",
49
+
"conversation_history":[
50
+
51
+
],
52
+
"metadata":{
53
+
"question_type":"[simple|complex]",
54
+
"seed_document_id":"<numeric>",
55
+
"topic":"<topics>"
56
+
}
57
+
}
58
+
]
59
+
```
60
+
61
+
You can upload it:
62
+
63
+

64
+
65
+
If you need an example, generate just one Q&A and download it then add to your own Q&As Test Dataset.
66
+
67
+
## Evaluation
68
+
At this point, if you have generated or are using an existing Test Dataset, you can run an evaluation using the configuration parameters in the left hand side.
69
+
70
+

71
+
72
+
The top part is related to the LLM are you going to be used for chat generation, and it includes the most relevant hyper-parameters to use in the call. The lower part it’s related to the Vector Store used in which, apart the **Embedding Model**, **Chunk Size**, **Chunk Overlap** and **Distance Strategy**, that are fixed and coming from the **Split/Embed** process you have to perform before, you can modify:
73
+
74
+
***Top K**: how many chunks should be included in the prompt’s context from nearer to the question found;
75
+
***Search Type**: that could be Similarity or Maximal Marginal Relevance. The first one is it commonly used, but the second one it’s related to an Oracle DB23ai feature that allows to exclude similar chunks from the top K and give space in the list to different chunks providing more relevant information.
76
+
77
+
At the end of the evaluation it will be provided an **Overall Correctness Score**, that’s is simply the percentage of correct answers on the total number of questions submitted:
78
+
79
+

80
+
81
+
Moreover, a percentage by topics, the list of failures and the full list of Q&As will be evaluated. To each Q&A included into the test dataset, will be added:
82
+
83
+
***agent_answer**: the actual answer provided by the RAG app;
84
+
***correctness**: a flag true/false that evaluates if the agent_answer matches the reference_answer;
85
+
***correctness_reason**: the reason why an answer has been evaluated wrong by the judge LLM.
86
+
87
+
The list of **Failures**, **Correctness by each Q&A**, as well as a **Report**, could be download and stored for future audit activities.
88
+
89
+
*In this way you can perform several tests using the same curated test dataset, generated or self-made, looking for the best performance RAG configuration*.
0 commit comments