diff --git a/README.md b/README.md index 72f5d80..bc2c7f8 100644 --- a/README.md +++ b/README.md @@ -101,3 +101,4 @@ Disclaimer: Examples contributed by the community and partners do not represent | [Build a bank support agent with Pydantic AI and Mistral AI](third_party/PydanticAI/pydantic_bank_support_agent.ipynb)| Agent | Pydantic | | [Mistral and MLflow Tracing](third_party/MLflow/mistral-mlflow-tracing.ipynb) | Tracing, Observability | MLflow | | [Mistral OCR with Gradio](third_party/gradio/MistralOCR.md) | OCR | Gradio | +| [prompt_optimization.ipynb](third_party/metagpt/prompt_optimization.ipynb)) |Prompting | Optimizing prompts without any supervision diff --git a/third_party/metagpt/prompt_optimization.ipynb b/third_party/metagpt/prompt_optimization.ipynb new file mode 100644 index 0000000..75f941d --- /dev/null +++ b/third_party/metagpt/prompt_optimization.ipynb @@ -0,0 +1,613 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "Y_jLAryGIaAB", + "metadata": { + "id": "Y_jLAryGIaAB" + }, + "source": [ + "# Prompt optimization\n", + "\n", + "- ❌ Prompt engineering... sucks. It's a non-standard process, heavily relying on trial and error and difficult to standardize\n", + "- 🀩 Luckily, we can automate it using ✨prompt optimzation✨, investigated in recent works such as [_Self-Supervised Prompt Optimization_](https://arxiv.org/pdf/2502.06855)\n", + "- 🎯 In its essence, Prompt Optimization (PO) consists in the process of taking a prompt aiming at performing a certain task and iteratively refining it to make it better for the specific problem tackled.\n", + "- βœ… This notebook gives an overview of how to use PO with Mistral models\n", + "\n", + "
\n", + " \"promptopt\"\n", + "
\n", + "\n", + "# Problem setting\n", + "\n", + "- You have put up a form, and collected many more answers than the ones you can read.\n", + "- Your survey got popular---very popular, πŸ˜…---and need to sift through the answers. To keep things accessibly, we allowed (and will continue to!) responses using plain text.\n", + "- Filtering is therefore _impossible_. Still, you need some strategies to sift through the applications received to identify the most promising profiles.\n", + "- Let's define a few prompts to process answers and output answers we can filter on effectively." + ] + }, + { + "cell_type": "markdown", + "id": "fy8aF06wOoBU", + "metadata": { + "id": "fy8aF06wOoBU" + }, + "source": [ + "### Task prompts\n", + "\n", + "- Let's define a few prompts to process answers\n", + "- These prompts are purposely not optimized, and rather serve as an example of something quick and dirty we wish to work with.\n", + "- For this example, we will consider answers collected as part of the applications for our [Ambassadorship Program](https://docs.mistral.ai/guides/contribute/ambassador/)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "zhIOJ8HKn31b", + "metadata": { + "id": "zhIOJ8HKn31b" + }, + "outputs": [], + "source": [ + "# overarching prompt, giving context\n", + "context = (\n", + " \"I am working on recruiting people to advocate about the products of an AI company. \"\n", + " \"The position in in close contact with the DevRel team, and we are looking at having people \"\n", + " \"share on their own personal social media more about the company and its products. \"\n", + " \"The company I work at produces Large Language Models and is very followed, \"\n", + " \"therefore I got a sheer amount of applications that I need to process \"\n", + " \"very soon. I won't be able to process them by hand, and there is little structure in the \"\n", + " \"form that we sent out to applicants. Therefore, I am expecting you to assist me into processing the \"\n", + " \"information these people gave to make it much more structured. This means that you do read \"\n", + " \"what applicants declared and extract key information based on the context of the question asked.\"\n", + ")\n", + "\n", + "# classifying job titles\n", + "job_prompt = lambda job_title: (\n", + " \"Your task is to provide me with a direct classification of the person's job title into one of 4 categories. \"\n", + " \"The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. \"\n", + " \"There is no possibility for mixed assignments. You always assign one and one only category to each subject. \"\n", + " \"When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. \"\n", + " \"This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. \"\n", + " \"Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word \"\n", + " \"based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if \"\n", + " \"(1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. \"\n", + " f\"# INPUT declared title: the person job title is {job_title}\"\n", + ")\n", + "\n", + "# getting the location in an easy way\n", + "location_prompt = lambda location: (\n", + " \"Your task is basic. Your task is to disambiguate the respondent's answer in terms of the location used. \"\n", + " \"Your output is always CITY, COUNTRY. Use always the English name of a city. Also, always use the international \"\n", + " \"country code. Nothing else. For instance, if a user answered with 'Rome', you would output 'Rome, IT'. \"\n", + " \"In the rare case when someone puts down multiple locations, make sure you always select the first one. Nothing more\"\n", + " f\" #INPUT declared location: the respondent declared being located in {location}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ZAT4fuHlOxlL", + "metadata": { + "id": "ZAT4fuHlOxlL" + }, + "source": [ + "### Installing dependancies\n", + "\n", + "To use SPO via MetaGPT you need to clone the repository, and move this notebook inside of it. Dependancies are not easily usable, but hacking around it is fairly straightforward πŸ˜‰ \n", + "\n", + "Just run:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "89c9bc38", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cloning into 'MetaGPT'...\n", + "remote: Enumerating objects: 48797, done.\u001b[K\n", + "remote: Counting objects: 100% (287/287), done.\u001b[K\n", + "remote: Compressing objects: 100% (136/136), done.\u001b[K\n", + "remote: Total 48797 (delta 195), reused 151 (delta 151), pack-reused 48510 (from 3)\u001b[K\n", + "Receiving objects: 100% (48797/48797), 179.81 MiB | 45.07 MiB/s, done.\n", + "Resolving deltas: 100% (36800/36800), done.\n", + "/Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT\n" + ] + } + ], + "source": [ + "# clone the repo\n", + "!git clone https://github.com/geekan/MetaGPT\n", + "\n", + "# install dependancies\n", + "!pip install -qUr MetaGPT/requirements.txt\n", + "\n", + "# move inside the directory, kernel-wise\n", + "%cd MetaGPT" + ] + }, + { + "cell_type": "markdown", + "id": "hLsS-Glveybr", + "metadata": { + "id": "hLsS-Glveybr" + }, + "source": [ + "## Create instruction files\n", + "\n", + "After having installed `metagpt`, we can perform prompt optimization creating a yaml file specifying the task tackled.\n", + "\n", + "From `metagpt` [documentation](https://github.com/geekan/MetaGPT/tree/main/examples/spo), this yaml file needs the following structure:\n", + "\n", + "```bash\n", + "prompt: |\n", + " Please solve the following problem.\n", + "\n", + "requirements: |\n", + " ...\n", + "\n", + "count: None\n", + "\n", + "qa:\n", + " - question: |\n", + " ...\n", + " answer: |\n", + " ...\n", + "\n", + " - question: |\n", + " ...\n", + " answer: |\n", + " ...\n", + "```\n", + "\n", + "We will need to generate one of these template files **for each** of the prompts we are seeking to optimize. Luckily, we can do so automatically. \n", + "\n", + "Also, as the tasks we're dealing with are fairly straightforward we can spare us providing few shot examples in the form Q&As 🀩\n", + "\n", + "Still, these template files offer a very straightforward way to provide real-world few-shot examples so definitely worth looking into those." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "yPpTc7XuexPF", + "metadata": { + "id": "yPpTc7XuexPF" + }, + "outputs": [], + "source": [ + "from typing import Optional\n", + "\n", + "def prompt_to_dict(\n", + " prompt: str,\n", + " requirements: Optional[str],\n", + " questions: list[str],\n", + " answers: list[str],\n", + " count: Optional[int] = None,\n", + ")->dict:\n", + " return {\n", + " \"prompt\": prompt if isinstance(prompt, str) else prompt(\"\"),\n", + " \"requirements\": requirements,\n", + " \"count\": count,\n", + " \"qa\": [\n", + " {\n", + " \"question\": question,\n", + " \"answer\": answer\n", + " } for question, answer in zip(questions, answers)\n", + " ]\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "Plv2A2FcglAm", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Plv2A2FcglAm", + "outputId": "0beba67e-46d5-4cf6-fbf4-dfccd069f8d1" + }, + "outputs": [], + "source": [ + "import yaml\n", + "\n", + "prompts = {\n", + " \"job\": job_prompt,\n", + " \"location\": location_prompt\n", + "}\n", + "\n", + "requirements = [\n", + " \"The job title, categorized\",\n", + " \"The location, disambiguated\"\n", + "]\n", + "path = \"metagpt/ext/spo/settings\" # this is the path where the template files needs to be saved\n", + "\n", + "for (name, prompt), requirement in zip(prompts.items(), requirements):\n", + " # creating template files for each prompt\n", + " with open(f\"{path}/{name}.yaml\", \"w\") as f:\n", + " yaml.dump(\n", + " prompt_to_dict(\n", + " prompt, \n", + " requirement,\n", + " [\"\"], \n", + " [\"\"]\n", + " ),\n", + " f,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "866a4c2f", + "metadata": {}, + "source": [ + "## Creating model files\n", + "\n", + "Once you created template files for the different prompts, you need to specify which models you need to use as (1) executors (2) evaluators and (3) optimizers for the different prompts.\n", + "\n", + "metagpt's SPO requires you to provide these models within a specific `.yaml` file---you can use the following snippet to create these files using your own Mistral API key ([get one!](https://console.mistral.ai/api-keys))." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "LUZalCD-yhlC", + "metadata": { + "id": "LUZalCD-yhlC" + }, + "outputs": [], + "source": [ + "def models_dict(\n", + " mistral_api_key: str\n", + " )->dict:\n", + " return {\n", + " \"llm\": {\n", + " \"api_type\": \"openai\",\n", + " \"model\": \"mistral-small-latest\",\n", + " \"base_url\": \"https://api.mistral.ai/v1/\",\n", + " \"api_key\": mistral_api_key,\n", + " \"temperature\": 0\n", + " },\n", + " \"models\": {\n", + " \"mistral-small-latest\": {\n", + " \"api_type\": \"openai\",\n", + " \"base_url\": \"https://api.mistral.ai/v1/\",\n", + " \"api_key\": mistral_api_key,\n", + " \"temperature\": 0\n", + " },\n", + " \"mistral-large-latest\": {\n", + " \"api_type\": \"openai\",\n", + " \"base_url\": \"https://api.mistral.ai/v1/\",\n", + " \"api_key\": mistral_api_key,\n", + " \"temperature\": 0\n", + " }\n", + " }\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4401cb21", + "metadata": {}, + "outputs": [], + "source": [ + "path = \"config/config2.yaml\" # saving the models file here\n", + "\n", + "MISTRAL_API_KEY = \"adG4AjK52MSdfmOy2sBNtS71vJeXGF97\" # your api key\n", + "\n", + "with open(path, \"w\") as f:\n", + " yaml.dump(models_dict(MISTRAL_API_KEY), f)" + ] + }, + { + "cell_type": "markdown", + "id": "e22cc339", + "metadata": {}, + "source": [ + "**We're good! πŸŽ‰** \n", + "\n", + "Once you have (1) template files for your candidate prompts and (2) a `models.yaml` file to identify the different models you wish to use, we can get start running rounds and optimizing the prompts 😊\n", + "\n", + "### A little hack: jupyter notebooks don't really work with `asyncio` 🫠\n", + "\n", + "...if only jupyter notebooks worked well with `asyncio` πŸ˜‚ The little hack here is to export the code you need to run prompt optimization to a `.py` file and then run that one using CLI-like instructions.\n", + "\n", + "Here we are only creating one file for the job title extraction prompt. Exporting these prompt optimization processes to different files also allows for parallel execution (πŸ’¨, right?). For the sake of demonstration, we are only showing how to optimize one prompt (job extraction), but you can easily switch this to other prompts yourself." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "JpgfHposxLPZ", + "metadata": { + "id": "JpgfHposxLPZ" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Overwriting spo.py\n" + ] + } + ], + "source": [ + "%%writefile spo.py\n", + "\n", + "from metagpt.ext.spo.components.optimizer import PromptOptimizer\n", + "from metagpt.ext.spo.utils.llm_client import SPO_LLM\n", + "\n", + "# Initialize LLM settings\n", + "SPO_LLM.initialize(\n", + " # same temperature settings as metagpt's default!\n", + " optimize_kwargs={\n", + " \"model\": \"mistral-large-latest\", \n", + " \"temperature\": 0.6\n", + " },\n", + " evaluate_kwargs={\n", + " \"model\": \"mistral-small-latest\", \n", + " \"temperature\": 0.3\n", + " },\n", + " execute_kwargs={\n", + " \"model\": \"mistral-small-latest\", \n", + " \"temperature\": 0\n", + " }\n", + ")\n", + "\n", + "template_name = \"job.yaml\" # change this for each prompt!\n", + "\n", + "# Create and run optimizer\n", + "optimizer = PromptOptimizer(\n", + " optimized_path=\"workspace\", # Output directory\n", + " initial_round=1, # Starting round\n", + " max_rounds=5, # Maximum optimization rounds\n", + " template=template_name, # Template file - Change this for each prompt!\n", + " name=\"Mistral-Prompt-Opt\", # Project name\n", + ")\n", + "\n", + "optimizer.optimize()" + ] + }, + { + "cell_type": "markdown", + "id": "0fb9af9b", + "metadata": {}, + "source": [ + "Now, let's run prompt optimization β˜€οΈ" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "e211a622", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[32m2025-04-19 15:33:24.300\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.const\u001b[0m:\u001b[36mget_metagpt_package_root\u001b[0m:\u001b[36m15\u001b[0m - \u001b[1mPackage root set to /Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:24.300\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.const\u001b[0m:\u001b[36mget_metagpt_package_root\u001b[0m:\u001b[36m15\u001b[0m - \u001b[1mPackage root set to /Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:25.337\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_handle_first_round\u001b[0m:\u001b[36m80\u001b[0m - \u001b[1m\n", + "⚑ RUNNING Round 1 PROMPT ⚑\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:43.216\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.000 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 226, completion_tokens: 2\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:43.370\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πŸš€Round 2 OPTIMIZATION STARTING πŸš€\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:43.370\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 1 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.760\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.012 | Max budget: $10.000 | Current cost: $0.012, prompt_tokens: 587, completion_tokens: 321\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.761\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 2 round: Streamline the instructions and clarify the input format to reduce confusion and improve robustness against ambiguous job titles.\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.761\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 2 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. Provide your answer using one word only. Do not include any additional context or explanations.\n", + "\n", + " # INPUT declared title: the person's job title is\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.762\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "⚑ RUNNING OPTIMIZED PROMPT ⚑\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:52.430\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 96, completion_tokens: 2\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:52.430\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "πŸ“Š EVALUATING OPTIMIZED PROMPT πŸ“Š\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:34:27.452\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.002 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 175\u001b[0m\n", + "\u001b[32m2025-04-19 15:34:33.397\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.005 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 548, completion_tokens: 237\u001b[0m\n", + "\u001b[32m2025-04-19 15:34:52.530\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.007 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 156\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.464\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.009 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 182\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.465\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [True, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.467\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "🎯 OPTIMIZATION RESULT 🎯\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.467\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 2 Optimization: βœ… SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.473\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πŸš€Round 3 OPTIMIZATION STARTING πŸš€\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.473\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 2 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.753\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.022 | Max budget: $10.000 | Current cost: $0.010, prompt_tokens: 408, completion_tokens: 260\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.754\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 3 round: Include instructions for handling ambiguous job titles and ensure consistent formatting.\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.754\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 3 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT declared title: the person's job title is\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.754\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "⚑ RUNNING OPTIMIZED PROMPT ⚑\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:17.382\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 122, completion_tokens: 2\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:17.383\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "πŸ“Š EVALUATING OPTIMIZED PROMPT πŸ“Š\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:34.698\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.011 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 185\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:39.916\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.013 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 243\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:57.867\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.015 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 227\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.285\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.018 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 396, completion_tokens: 349\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.286\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [True, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.287\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "🎯 OPTIMIZATION RESULT 🎯\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.287\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 3 Optimization: βœ… SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.291\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πŸš€Round 4 OPTIMIZATION STARTING πŸš€\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.291\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 3 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.035 | Max budget: $10.000 | Current cost: $0.013, prompt_tokens: 436, completion_tokens: 398\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 4 round: Include explicit guidelines for handling ambiguity, provide contextual examples, define a clear input format, and reinforce the output format with an example.\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 4 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT: The person's job title is: [Job Title]\n", + "\n", + " # Example:\n", + " # INPUT: The person's job title is: Software Developer\n", + " # OUTPUT: ENGINEERING\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "⚑ RUNNING OPTIMIZED PROMPT ⚑\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:56.956\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 185, completion_tokens: 21\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:56.957\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "πŸ“Š EVALUATING OPTIMIZED PROMPT πŸ“Š\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:10.473\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.020 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 230\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:10.476\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.023 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 236\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:14.260\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.025 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 210\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.786\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.028 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 507, completion_tokens: 250\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.787\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [True, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.789\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "🎯 OPTIMIZATION RESULT 🎯\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.789\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 4 Optimization: βœ… SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.795\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πŸš€Round 5 OPTIMIZATION STARTING πŸš€\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.795\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 4 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.527\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.048 | Max budget: $10.000 | Current cost: $0.013, prompt_tokens: 529, completion_tokens: 383\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.527\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 5 round: Include explicit criteria and additional examples for less common job titles to improve classification accuracy.\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.528\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 5 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT: The person's job title is: [Job Title]\n", + "\n", + " # Example:\n", + " # INPUT: The person's job title is: Software Developer\n", + " # OUTPUT: ENGINEERING\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.530\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "⚑ RUNNING OPTIMIZED PROMPT ⚑\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:49:40.335\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.002 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 202, completion_tokens: 12\u001b[0m\n", + "\u001b[32m2025-04-19 15:49:40.335\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "πŸ“Š EVALUATING OPTIMIZED PROMPT πŸ“Š\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:49:58.080\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.030 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 246\u001b[0m\n", + "\u001b[32m2025-04-19 15:50:31.634\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.033 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 600, completion_tokens: 215\u001b[0m\n", + "\u001b[32m2025-04-19 15:50:41.015\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.035 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 232\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.878\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.038 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 231\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.879\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [False, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.881\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "🎯 OPTIMIZATION RESULT 🎯\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.882\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 5 Optimization: βœ… SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.883\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m52\u001b[0m - \u001b[1m\n", + "==================================================\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m53\u001b[0m - \u001b[1m\n", + "πŸ† OPTIMIZATION COMPLETED - FINAL RESULTS πŸ†\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m54\u001b[0m - \u001b[1m\n", + "πŸ“Œ Best Performing Round: 5\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m55\u001b[0m - \u001b[1m\n", + "🎯 Final Optimized Prompt:\n", + "Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT: The person's job title is: [Job Title]\n", + "\n", + " # Example:\n", + " # INPUT: The person's job title is: Software Developer\n", + " # OUTPUT: ENGINEERING\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m56\u001b[0m - \u001b[1m\n", + "==================================================\n", + "\u001b[0m\n" + ] + } + ], + "source": [ + "!python spo.py" + ] + }, + { + "cell_type": "markdown", + "id": "ca302560", + "metadata": {}, + "source": [ + "## Asessing the results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "| Original Prompt | Optimized Prompt |\n", + "|-----------------|------------------|\n", + "| Your task is to provide me with a direct classification of the person's job title into one of 4 categories. The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. There is no possibility for mixed assignments. You always assign one and one only category to each subject. When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if (1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. # INPUT declared title: the person job title is {job_title} | Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.

# INPUT: The person's job title is: {job_title}

# Example:
# INPUT: The person's job title is: Software Developer
# OUTPUT: ENGINEERING |" + ] + }, + { + "cell_type": "markdown", + "id": "ae491800", + "metadata": {}, + "source": [ + "Results indicate the original prompt is modified according to typical best-practices, such as providing examples to guide the LLM (**few-shot prompting**), or by providing tag-like elements to direct the model's attention towards particular parts of the input prompt.\n", + "\n", + "This revised prompt has been obtained using only 5 optimization \"rounds\", and can further be optimized (although finally satisfactory performance is of course a heuristic in the context of black-box optimization)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "graphsenv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/third_party/metagpt/prompt_optimization.md b/third_party/metagpt/prompt_optimization.md new file mode 100644 index 0000000..3b8e676 --- /dev/null +++ b/third_party/metagpt/prompt_optimization.md @@ -0,0 +1,131 @@ +# An Overview of Prompt Optimization + +Early successes with prompt-based learning demonstrated that the phrasing of tasks can drastically affect a language model (LLM)'s performance. In their seminal work, Brown et al. (2020) argue that carefully crafted text prompts could induce LLMs to perform new tasks without any parameter update. Together with empirical findings related to the effectiveness of LLMs across a wide variety of tasks, this spurred interest in prompt engineering. Yet, designing prompts by hand proved to be a cumbersome and error-prone effort, requiring expertise and extensive trial-and-error (Liu et al., 2023). + +One line of early work focused on automated search for prompts in a continuous setting. In particular, Shin et al. (2020) introduced *AutoPrompt*, a gradient-guided search algorithm to find input trigger phrases that coax models into desired behaviors. + +Later, Ma et al. (2023) followed with a discrete approach to prompt optimization, automatically generating prompts by mining corpora and paraphrasing existing prompts. + +These approaches demonstrated that automatic prompt discovery is feasible and effective. Gradient-based prompt tuning methods have also emerged, including techniques such as prefix-tuning (Li et al., 2021) and prompt tuning (Lester et al., 2021). Both approaches learn continuous representations (prompt embeddings) prepended to inputs, updating only a small number of parameters and leaving the main model unchanged at inference time. + +A gradient-free approach which does not require access to the model internalsβ€”it only operates on the input tokens given to the modelβ€”is RLPrompt (Deng et al., 2022), which formulates prompt design as a reinforcement learning (RL) problem. + +These techniques are suited for scenarios where model internals are inaccessible and resulted in unintuitive yet effective prompt strings. + +# A Taxonomy of Prompt Optimization + +## Retrieval-Augmented Prompting + +Retrieval-augmented prompting incorporates external information (e.g., documents, examples) into the prompt. This includes Retrieval-Augmented Generation (RAG), which appends retrieved context to the prompt (Lewis et al., 2020), and example-based prompting, which selects demonstrations from a dataset (Rubin et al., 2021). While effective, these methods depend on retrieval infrastructure as well as correctly-labeled data, hindering adoption in cases where either of these assumptions is violated. + +## Continuous Prompt Tuning + +Continuous prompt tuning methods represent prompts as trainable vectors. Clearly, this assumes having access to the model internals, as these trainable vectors are then fed to the transformer layers for conditional generation. Trainable vectors can either be changed partially (*prefix tuning*) or entirely (*prompt tuning*). +Prefix-tuning (Li et al., 2021) and prompt tuning (Lester et al., 2021) perform prompt optimization by first embedding the candidate prompt into a vector, and then updating it, enabling efficient adaptation. These methods are gradient-based and assume access to model internals, making them non-applicable to API-only models. + +## Black-Box Discrete Optimization + +Black-box techniques, such as manual search, evolutionary strategies, and reinforcement learning, work without gradient access and are thus equally usable for both open and closed-weights models. +In particular, RLPrompt (Deng et al., 2022) uses a policy network trained via reinforcement learning (RL) to generate prompts based on reward signals. Though effective, these methods are often sample-inefficient and computationally expensive. + +### Diving into Self-Supervised Prompt Optimization (SPO) + +In their work, Xiang et al. (2025) propose Self-Supervised Prompt Optimization (SPO), which *(i)* eliminates the need for ground truth by using the LLM to evaluate its own outputs and *(ii)* combines larger and smaller models in a self-loop for greater sample efficiency and speed, optimizing prompts using as little as three examples. + +SPO runs an iterative loop where model(s) act as a prompt executor (carrying out a task given a prompt \( p_k, \phi(p_k) \in \mathcal{T} \)), evaluator (mapping the model's output to a scalar metric of performance \( e: \mathcal{T} \mapsto \mathbb{R} , e(\phi(p_k)) \in \mathbb{R} \)), and optimizer (optimizing \( p_{k+1} \) with \( e(p_{k+1}) \geq e(p_k) \)). +Through pairwise comparisons between subsequent prompts and refinement, SPO improves prompts \( p_k \) without external supervision. +SPO proved highly efficient and applicable to both open- and closed-ended tasks. Unlike gradient-based or RL methods, it does not require labeled data or model internals. It is well-suited for black-box settings and maintains interpretability since it works in natural language space. Further, SPO occupies a promising spot in the PO landscape, combining the strengths of black-box and self-supervised learning while minimizing their limitations, it points towards the promising direction of enabling models to autonomously optimize their own prompts using internal feedback loops. + +# Insights on Prompt Optimization + +**Sample Efficiency** Gradient-based methods tend to be more sample-efficient than black-box optimization, though recent work like SPO (Xiang et al., 2025) demonstrates efficient optimization using only a few samples per iteration. Sample efficiency is crucial for many applications, especially when concerned with larger models/slower inference settings. + +**Dependence on Labeled Data** Continuous tuning and RL often require labeled data, while methods like SPO avoid this by relying on model self-evaluation. In this, SPO effectively allows for prompt optimization without relying on previously collected data, and is thus preferable when limited information is available to optimize prompts differently. + +**Generalization and Transfer** Most prompt optimization techniques tend to be task-specific in practice. Some prompts (e.g., β€œLet's think step by step”) generalize better. RL-discovered prompts sometimes transfer across models (Deng et al., 2022). + +**Robustness** Prompt performance can be brittle with respect to phrasing. Continuous prompts are less interpretable, and discrete ones may be sensitive to small changes. Methods like prompt ensembling help mitigate this. + +**Computational Cost** Manual prompt engineering is computationally cheap, but proves brittle and labor-intensive. Gradient-based tuning proves to be more compute-efficient, while RL and search-based methods are costly due to the number of required queries of the method to iterate and improve on the prompt. SPO-like approaches seem promising due to *(i)* limited number of queries and *(ii)* disuse of model internals at test-time. + +### References +```bash +@article{brown2020language, + title={Language models are few-shot learners}, + author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others}, + journal={Advances in neural information processing systems}, + volume={33}, + pages={1877--1901}, + year={2020} +} + +@article{liu2023pre, + title={Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing}, + author={Liu, Pengfei and Yuan, Weizhe and Fu, Jinlan and Jiang, Zhengbao and Hayashi, Hiroaki and Neubig, Graham}, + journal={ACM computing surveys}, + volume={55}, + number={9}, + pages={1--35}, + year={2023}, + publisher={ACM New York, NY} +} + +@article{ma2023prompt, + title={Is prompt-based finetuning always better than vanilla finetuning? insights from cross-lingual language understanding}, + author={Ma, Bolei and Nie, Ercong and Schmid, Helmut and Sch{\"u}tze, Hinrich}, + journal={arXiv preprint arXiv:2307.07880}, + year={2023} +} + +@article{shin2020autoprompt, + title={Autoprompt: Eliciting knowledge from language models with automatically generated prompts}, + author={Shin, Taylor and Razeghi, Yasaman and Logan IV, Robert L and Wallace, Eric and Singh, Sameer}, + journal={arXiv preprint arXiv:2010.15980}, + year={2020} +} + +@article{li2021prefix, + title={Prefix-tuning: Optimizing continuous prompts for generation}, + author={Li, Xiang Lisa and Liang, Percy}, + journal={arXiv preprint arXiv:2101.00190}, + year={2021} +} + +@article{lester2021power, + title={The power of scale for parameter-efficient prompt tuning}, + author={Lester, Brian and Al-Rfou, Rami and Constant, Noah}, + journal={arXiv preprint arXiv:2104.08691}, + year={2021} +} + +@article{deng2022rlprompt, + title={Rlprompt: Optimizing discrete text prompts with reinforcement learning}, + author={Deng, Mingkai and Wang, Jianyu and Hsieh, Cheng-Ping and Wang, Yihan and Guo, Han and Shu, Tianmin and Song, Meng and Xing, Eric P and Hu, Zhiting}, + journal={arXiv preprint arXiv:2205.12548}, + year={2022} +} + +@article{rubin2021learning, + title={Learning to retrieve prompts for in-context learning}, + author={Rubin, Ohad and Herzig, Jonathan and Berant, Jonathan}, + journal={arXiv preprint arXiv:2112.08633}, + year={2021} +} + +@article{lewis2020retrieval, + title={Retrieval-augmented generation for knowledge-intensive nlp tasks}, + author={Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K{\"u}ttler, Heinrich and Lewis, Mike and Yih, Wen-tau and Rockt{\"a}schel, Tim and others}, + journal={Advances in neural information processing systems}, + volume={33}, + pages={9459--9474}, + year={2020} +} + +@article{xiang2025self, + title={Self-Supervised Prompt Optimization}, + author={Xiang, Jinyu and Zhang, Jiayi and Yu, Zhaoyang and Teng, Fengwei and Tu, Jinhao and Liang, Xinbing and Hong, Sirui and Wu, Chenglin and Luo, Yuyu}, + journal={arXiv preprint arXiv:2502.06855}, + year={2025} +} + +``` \ No newline at end of file