Merge branch 'main' into aqua_apiserver

mayoor · web-flow · commit b03780899743 · 2025-02-24T12:02:23.000-08:00
diff --git a/docs/source/user_guide/large_language_model/aqua_client.rst b/docs/source/user_guide/large_language_model/aqua_client.rst
@@ -0,0 +1,131 @@
+AI Quick Actions HTTP Client
+****************************
+
+.. versionadded:: 2.13.0
+
+The AI Quick Actions client is a centralized, reusable component for interacting with the OCI Model Deployment service.
+
+**Implementation Highlights:**
+
+- Offers both synchronous (Client) and asynchronous (AsyncClient)
+- Integrates with OCI Authentication patterns
+
+Authentication
+==============
+
+The AI Quick Actions client supports the same authentication methods as other OCI services, including API Key, session token, instance principal, and resource principal. For additional details, please refer to the `authentication guide <https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html>`_. Ensure you have the necessary `access policies <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm>`_ to connect to the OCI Data Science Model Deployment endpoint.
+
+Usage
+=====
+
+Sync Usage
+----------
+
+**Text Completion**
+
+.. code-block:: python3
+
+    from ads.aqua import Client
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = Client(endpoint="https://<MD_OCID>/predict")
+    response = client.generate(
+        prompt="Tell me a joke",
+        payload={"model": "odsc-llm"},
+        stream=False,
+    )
+    print(response)
+
+**Chat Completion**
+
+.. code-block:: python3
+
+    from ads.aqua import Client
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = Client(endpoint="https://<MD_OCID>/predict")
+    response = client.chat(
+        messages=[{"role": "user", "content": "Tell me a joke."}],
+        payload={"model": "odsc-llm"},
+        stream=False,
+    )
+    print(response)
+
+**Streaming**
+
+.. code-block:: python3
+
+    from ads.aqua import Client
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = Client(endpoint="https://<MD_OCID>/predict")
+    response = client.chat(
+        messages=[{"role": "user", "content": "Tell me a joke."}],
+        payload={"model": "odsc-llm"},
+        stream=True,
+    )
+
+    for chunk in response:
+        print(chunk)
+
+**Embedding**
+
+.. code-block:: python3
+
+    from ads.aqua import Client
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = Client(endpoint="https://<MD_OCID>/predict")
+    response = client.embeddings(
+        input=["one", "two"]
+    )
+    print(response)
+
+
+Async Usage
+-----------
+
+The following examples demonstrate how to perform the same operations using the asynchronous client with Python's async/await syntax.
+
+**Text Completion**
+
+.. code-block:: python3
+
+    from ads.aqua import AsyncClient
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = AsyncClient(endpoint="https://<MD_OCID>/predict")
+    response = await client.generate(
+        prompt="Tell me a joke",
+        payload={"model": "odsc-llm"},
+        stream=False,
+    )
+    print(response)
+
+**Streaming**
+
+.. code-block:: python3
+
+    from ads.aqua import AsyncClient
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = AsyncClient(endpoint="https://<MD_OCID>/predict")
+    async for chunk in await client.generate(
+        prompt="Tell me a joke",
+        payload={"model": "odsc-llm"},
+        stream=True,
+    ):
+        print(chunk)
+
+**Embedding**
+
+.. code-block:: python3
+
+    from ads.aqua import AsyncClient
+    ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")
+
+    client = AsyncClient(endpoint="https://<MD_OCID>/predict")
+    response = await client.embeddings(
+        input=["one", "two"]
+    )
+    print(response)
diff --git a/docs/source/user_guide/large_language_model/index.rst b/docs/source/user_guide/large_language_model/index.rst
@@ -4,39 +4,19 @@
 Large Language Model
 ####################
 
-Oracle Cloud Infrastructure (OCI) provides fully managed infrastructure to work with Large Language Model (LLM).
+Oracle Cloud Infrastructure (OCI) `Data Science <https://www.oracle.com/artificial-intelligence/data-science>`_ is a fully managed, serverless platform that empowers data science teams to build, train, and manage machine learning models on Oracle Cloud Infrastructure.
 
-Train and Deploy LLM
-********************
-You can train LLM at scale with multi-node and multi-GPU using `Data Science Jobs (Jobs) <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm>`_, and deploy it with `Data Science Model Deployment (Model Deployments) <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm>`_. The following blog posts show examples training and deploying Llama2 models:
+The platform features `AI Quick Actions <https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions.htm>`_, which enable you to deploy, evaluate, and fine-tune foundation models directly within OCI Data Science. Designed for users eager to quickly harness AI capabilities, these actions provide a streamlined, code-free, and efficient environment for working with foundation models. You can access AI Quick Actions directly from the Data Science Notebook.
 
-* `Multi-GPU multinode fine-tuning Llama2 on OCI Data Science <https://blogs.oracle.com/ai-and-datascience/post/multi-gpu-multi-node-finetuning-llama2-oci>`_
-* `Deploy Llama 2 in OCI Data Science <https://blogs.oracle.com/ai-and-datascience/post/llama2-oci-data-science-cloud-platform>`_
-* `Quantize and deploy Llama 2 70B on cost-effective NVIDIA A10 Tensor Core GPUs in OCI Data Science <https://blogs.oracle.com/ai-and-datascience/post/quantize-deploy-llama2-70b-costeffective-a10s-oci>`_
-
-
-Integration with LangChain
-**************************
-ADS is designed to work with LangChain, enabling developers to incorporate various LangChain components and models deployed on OCI seamlessly into their applications. Additionally, ADS can package LangChain applications and deploy it as a REST API endpoint using OCI Data Science Model Deployment.
-
-* `Bridging cloud and conversational AI: LangChain and OCI Data Science platform <https://blogs.oracle.com/ai-and-datascience/post/cloud-conversational-ai-langchain-oci-data-science>`_
-* `Deploy LangChain applications as OCI model deployments <https://blogs.oracle.com/ai-and-datascience/post/deploy-langchain-application-as-model-deployment>`_
-
-
-.. admonition:: Installation
-  :class: note
-
-  Install ADS and other dependencies for LLM integrations.
-
-  .. code-block:: bash
-
-    $ python3 -m pip install "oracle-ads[llm]"
+Detailed documentation on deploying LLM models in OCI Data Science using AI Quick Actions is available `here <https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md>`_ and `here <https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions-model-deploy.htm>`_.
 
+This section provides comprehensive information on integrating OCI with **LangChain, Autogen, LlamaIndex**, and other third-party **LLM frameworks**.
 
 
 .. toctree::
     :maxdepth: 2
 
+    aqua_client
     training_llm
     langchain_models
     autogen_integration
diff --git a/docs/source/user_guide/large_language_model/training_llm.rst b/docs/source/user_guide/large_language_model/training_llm.rst
@@ -1,6 +1,14 @@
 Training Large Language Model
 *****************************
 
+.. admonition:: Note
+  :class: note
+
+  The example provided below is obsolete. Instead, use the `AI Quick Actions <https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions.htm>`_, which can be used to deploy, evaluate, and fine-tune foundation models in OCI Data Science.
+
+
+
+
 .. versionadded:: 2.8.8
 
 Oracle Cloud Infrastructure (OCI) `Data Science Jobs (Jobs) <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm>`_
@@ -55,4 +63,3 @@ The same training script also support Parameter-Efficient Fine-Tuning (PEFT). Yo
     torchrun llama_finetuning.py --enable_fsdp --use_peft --peft_method lora \
     --pure_bf16 --batch_size_training 1 \
     --model_name meta-llama/Llama-2-7b-hf --output_dir /home/datascience/outputs
-