diff --git a/README.md b/README.md index 72f5d80..bc2c7f8 100644 --- a/README.md +++ b/README.md @@ -101,3 +101,4 @@ Disclaimer: Examples contributed by the community and partners do not represent | [Build a bank support agent with Pydantic AI and Mistral AI](third_party/PydanticAI/pydantic_bank_support_agent.ipynb)| Agent | Pydantic | | [Mistral and MLflow Tracing](third_party/MLflow/mistral-mlflow-tracing.ipynb) | Tracing, Observability | MLflow | | [Mistral OCR with Gradio](third_party/gradio/MistralOCR.md) | OCR | Gradio | +| [prompt_optimization.ipynb](third_party/metagpt/prompt_optimization.ipynb)) |Prompting | Optimizing prompts without any supervision diff --git a/third_party/metagpt/prompt_optimization.ipynb b/third_party/metagpt/prompt_optimization.ipynb new file mode 100644 index 0000000..75f941d --- /dev/null +++ b/third_party/metagpt/prompt_optimization.ipynb @@ -0,0 +1,613 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "Y_jLAryGIaAB", + "metadata": { + "id": "Y_jLAryGIaAB" + }, + "source": [ + "# Prompt optimization\n", + "\n", + "- β Prompt engineering... sucks. It's a non-standard process, heavily relying on trial and error and difficult to standardize\n", + "- π€© Luckily, we can automate it using β¨prompt optimzationβ¨, investigated in recent works such as [_Self-Supervised Prompt Optimization_](https://arxiv.org/pdf/2502.06855)\n", + "- π― In its essence, Prompt Optimization (PO) consists in the process of taking a prompt aiming at performing a certain task and iteratively refining it to make it better for the specific problem tackled.\n", + "- β This notebook gives an overview of how to use PO with Mistral models\n", + "\n", + "
\n", + "\n", + "# Problem setting\n", + "\n", + "- You have put up a form, and collected many more answers than the ones you can read.\n", + "- Your survey got popular---very popular, π ---and need to sift through the answers. To keep things accessibly, we allowed (and will continue to!) responses using plain text.\n", + "- Filtering is therefore _impossible_. Still, you need some strategies to sift through the applications received to identify the most promising profiles.\n", + "- Let's define a few prompts to process answers and output answers we can filter on effectively." + ] + }, + { + "cell_type": "markdown", + "id": "fy8aF06wOoBU", + "metadata": { + "id": "fy8aF06wOoBU" + }, + "source": [ + "### Task prompts\n", + "\n", + "- Let's define a few prompts to process answers\n", + "- These prompts are purposely not optimized, and rather serve as an example of something quick and dirty we wish to work with.\n", + "- For this example, we will consider answers collected as part of the applications for our [Ambassadorship Program](https://docs.mistral.ai/guides/contribute/ambassador/)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "zhIOJ8HKn31b", + "metadata": { + "id": "zhIOJ8HKn31b" + }, + "outputs": [], + "source": [ + "# overarching prompt, giving context\n", + "context = (\n", + " \"I am working on recruiting people to advocate about the products of an AI company. \"\n", + " \"The position in in close contact with the DevRel team, and we are looking at having people \"\n", + " \"share on their own personal social media more about the company and its products. \"\n", + " \"The company I work at produces Large Language Models and is very followed, \"\n", + " \"therefore I got a sheer amount of applications that I need to process \"\n", + " \"very soon. I won't be able to process them by hand, and there is little structure in the \"\n", + " \"form that we sent out to applicants. Therefore, I am expecting you to assist me into processing the \"\n", + " \"information these people gave to make it much more structured. This means that you do read \"\n", + " \"what applicants declared and extract key information based on the context of the question asked.\"\n", + ")\n", + "\n", + "# classifying job titles\n", + "job_prompt = lambda job_title: (\n", + " \"Your task is to provide me with a direct classification of the person's job title into one of 4 categories. \"\n", + " \"The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. \"\n", + " \"There is no possibility for mixed assignments. You always assign one and one only category to each subject. \"\n", + " \"When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. \"\n", + " \"This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. \"\n", + " \"Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word \"\n", + " \"based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if \"\n", + " \"(1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. \"\n", + " f\"# INPUT declared title: the person job title is {job_title}\"\n", + ")\n", + "\n", + "# getting the location in an easy way\n", + "location_prompt = lambda location: (\n", + " \"Your task is basic. Your task is to disambiguate the respondent's answer in terms of the location used. \"\n", + " \"Your output is always CITY, COUNTRY. Use always the English name of a city. Also, always use the international \"\n", + " \"country code. Nothing else. For instance, if a user answered with 'Rome', you would output 'Rome, IT'. \"\n", + " \"In the rare case when someone puts down multiple locations, make sure you always select the first one. Nothing more\"\n", + " f\" #INPUT declared location: the respondent declared being located in {location}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ZAT4fuHlOxlL", + "metadata": { + "id": "ZAT4fuHlOxlL" + }, + "source": [ + "### Installing dependancies\n", + "\n", + "To use SPO via MetaGPT you need to clone the repository, and move this notebook inside of it. Dependancies are not easily usable, but hacking around it is fairly straightforward π \n", + "\n", + "Just run:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "89c9bc38", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cloning into 'MetaGPT'...\n", + "remote: Enumerating objects: 48797, done.\u001b[K\n", + "remote: Counting objects: 100% (287/287), done.\u001b[K\n", + "remote: Compressing objects: 100% (136/136), done.\u001b[K\n", + "remote: Total 48797 (delta 195), reused 151 (delta 151), pack-reused 48510 (from 3)\u001b[K\n", + "Receiving objects: 100% (48797/48797), 179.81 MiB | 45.07 MiB/s, done.\n", + "Resolving deltas: 100% (36800/36800), done.\n", + "/Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT\n" + ] + } + ], + "source": [ + "# clone the repo\n", + "!git clone https://github.com/geekan/MetaGPT\n", + "\n", + "# install dependancies\n", + "!pip install -qUr MetaGPT/requirements.txt\n", + "\n", + "# move inside the directory, kernel-wise\n", + "%cd MetaGPT" + ] + }, + { + "cell_type": "markdown", + "id": "hLsS-Glveybr", + "metadata": { + "id": "hLsS-Glveybr" + }, + "source": [ + "## Create instruction files\n", + "\n", + "After having installed `metagpt`, we can perform prompt optimization creating a yaml file specifying the task tackled.\n", + "\n", + "From `metagpt` [documentation](https://github.com/geekan/MetaGPT/tree/main/examples/spo), this yaml file needs the following structure:\n", + "\n", + "```bash\n", + "prompt: |\n", + " Please solve the following problem.\n", + "\n", + "requirements: |\n", + " ...\n", + "\n", + "count: None\n", + "\n", + "qa:\n", + " - question: |\n", + " ...\n", + " answer: |\n", + " ...\n", + "\n", + " - question: |\n", + " ...\n", + " answer: |\n", + " ...\n", + "```\n", + "\n", + "We will need to generate one of these template files **for each** of the prompts we are seeking to optimize. Luckily, we can do so automatically. \n", + "\n", + "Also, as the tasks we're dealing with are fairly straightforward we can spare us providing few shot examples in the form Q&As π€©\n", + "\n", + "Still, these template files offer a very straightforward way to provide real-world few-shot examples so definitely worth looking into those." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "yPpTc7XuexPF", + "metadata": { + "id": "yPpTc7XuexPF" + }, + "outputs": [], + "source": [ + "from typing import Optional\n", + "\n", + "def prompt_to_dict(\n", + " prompt: str,\n", + " requirements: Optional[str],\n", + " questions: list[str],\n", + " answers: list[str],\n", + " count: Optional[int] = None,\n", + ")->dict:\n", + " return {\n", + " \"prompt\": prompt if isinstance(prompt, str) else prompt(\"\"),\n", + " \"requirements\": requirements,\n", + " \"count\": count,\n", + " \"qa\": [\n", + " {\n", + " \"question\": question,\n", + " \"answer\": answer\n", + " } for question, answer in zip(questions, answers)\n", + " ]\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "Plv2A2FcglAm", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Plv2A2FcglAm", + "outputId": "0beba67e-46d5-4cf6-fbf4-dfccd069f8d1" + }, + "outputs": [], + "source": [ + "import yaml\n", + "\n", + "prompts = {\n", + " \"job\": job_prompt,\n", + " \"location\": location_prompt\n", + "}\n", + "\n", + "requirements = [\n", + " \"The job title, categorized\",\n", + " \"The location, disambiguated\"\n", + "]\n", + "path = \"metagpt/ext/spo/settings\" # this is the path where the template files needs to be saved\n", + "\n", + "for (name, prompt), requirement in zip(prompts.items(), requirements):\n", + " # creating template files for each prompt\n", + " with open(f\"{path}/{name}.yaml\", \"w\") as f:\n", + " yaml.dump(\n", + " prompt_to_dict(\n", + " prompt, \n", + " requirement,\n", + " [\"\"], \n", + " [\"\"]\n", + " ),\n", + " f,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "866a4c2f", + "metadata": {}, + "source": [ + "## Creating model files\n", + "\n", + "Once you created template files for the different prompts, you need to specify which models you need to use as (1) executors (2) evaluators and (3) optimizers for the different prompts.\n", + "\n", + "metagpt's SPO requires you to provide these models within a specific `.yaml` file---you can use the following snippet to create these files using your own Mistral API key ([get one!](https://console.mistral.ai/api-keys))." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "LUZalCD-yhlC", + "metadata": { + "id": "LUZalCD-yhlC" + }, + "outputs": [], + "source": [ + "def models_dict(\n", + " mistral_api_key: str\n", + " )->dict:\n", + " return {\n", + " \"llm\": {\n", + " \"api_type\": \"openai\",\n", + " \"model\": \"mistral-small-latest\",\n", + " \"base_url\": \"https://api.mistral.ai/v1/\",\n", + " \"api_key\": mistral_api_key,\n", + " \"temperature\": 0\n", + " },\n", + " \"models\": {\n", + " \"mistral-small-latest\": {\n", + " \"api_type\": \"openai\",\n", + " \"base_url\": \"https://api.mistral.ai/v1/\",\n", + " \"api_key\": mistral_api_key,\n", + " \"temperature\": 0\n", + " },\n", + " \"mistral-large-latest\": {\n", + " \"api_type\": \"openai\",\n", + " \"base_url\": \"https://api.mistral.ai/v1/\",\n", + " \"api_key\": mistral_api_key,\n", + " \"temperature\": 0\n", + " }\n", + " }\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4401cb21", + "metadata": {}, + "outputs": [], + "source": [ + "path = \"config/config2.yaml\" # saving the models file here\n", + "\n", + "MISTRAL_API_KEY = \"adG4AjK52MSdfmOy2sBNtS71vJeXGF97\" # your api key\n", + "\n", + "with open(path, \"w\") as f:\n", + " yaml.dump(models_dict(MISTRAL_API_KEY), f)" + ] + }, + { + "cell_type": "markdown", + "id": "e22cc339", + "metadata": {}, + "source": [ + "**We're good! π** \n", + "\n", + "Once you have (1) template files for your candidate prompts and (2) a `models.yaml` file to identify the different models you wish to use, we can get start running rounds and optimizing the prompts π\n", + "\n", + "### A little hack: jupyter notebooks don't really work with `asyncio` π« \n", + "\n", + "...if only jupyter notebooks worked well with `asyncio` π The little hack here is to export the code you need to run prompt optimization to a `.py` file and then run that one using CLI-like instructions.\n", + "\n", + "Here we are only creating one file for the job title extraction prompt. Exporting these prompt optimization processes to different files also allows for parallel execution (π¨, right?). For the sake of demonstration, we are only showing how to optimize one prompt (job extraction), but you can easily switch this to other prompts yourself." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "JpgfHposxLPZ", + "metadata": { + "id": "JpgfHposxLPZ" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Overwriting spo.py\n" + ] + } + ], + "source": [ + "%%writefile spo.py\n", + "\n", + "from metagpt.ext.spo.components.optimizer import PromptOptimizer\n", + "from metagpt.ext.spo.utils.llm_client import SPO_LLM\n", + "\n", + "# Initialize LLM settings\n", + "SPO_LLM.initialize(\n", + " # same temperature settings as metagpt's default!\n", + " optimize_kwargs={\n", + " \"model\": \"mistral-large-latest\", \n", + " \"temperature\": 0.6\n", + " },\n", + " evaluate_kwargs={\n", + " \"model\": \"mistral-small-latest\", \n", + " \"temperature\": 0.3\n", + " },\n", + " execute_kwargs={\n", + " \"model\": \"mistral-small-latest\", \n", + " \"temperature\": 0\n", + " }\n", + ")\n", + "\n", + "template_name = \"job.yaml\" # change this for each prompt!\n", + "\n", + "# Create and run optimizer\n", + "optimizer = PromptOptimizer(\n", + " optimized_path=\"workspace\", # Output directory\n", + " initial_round=1, # Starting round\n", + " max_rounds=5, # Maximum optimization rounds\n", + " template=template_name, # Template file - Change this for each prompt!\n", + " name=\"Mistral-Prompt-Opt\", # Project name\n", + ")\n", + "\n", + "optimizer.optimize()" + ] + }, + { + "cell_type": "markdown", + "id": "0fb9af9b", + "metadata": {}, + "source": [ + "Now, let's run prompt optimization βοΈ" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "e211a622", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[32m2025-04-19 15:33:24.300\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.const\u001b[0m:\u001b[36mget_metagpt_package_root\u001b[0m:\u001b[36m15\u001b[0m - \u001b[1mPackage root set to /Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:24.300\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.const\u001b[0m:\u001b[36mget_metagpt_package_root\u001b[0m:\u001b[36m15\u001b[0m - \u001b[1mPackage root set to /Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:25.337\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_handle_first_round\u001b[0m:\u001b[36m80\u001b[0m - \u001b[1m\n", + "β‘ RUNNING Round 1 PROMPT β‘\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:43.216\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.000 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 226, completion_tokens: 2\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:43.370\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πRound 2 OPTIMIZATION STARTING π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:43.370\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 1 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.760\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.012 | Max budget: $10.000 | Current cost: $0.012, prompt_tokens: 587, completion_tokens: 321\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.761\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 2 round: Streamline the instructions and clarify the input format to reduce confusion and improve robustness against ambiguous job titles.\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.761\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 2 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. Provide your answer using one word only. Do not include any additional context or explanations.\n", + "\n", + " # INPUT declared title: the person's job title is\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:49.762\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "β‘ RUNNING OPTIMIZED PROMPT β‘\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:52.430\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 96, completion_tokens: 2\u001b[0m\n", + "\u001b[32m2025-04-19 15:33:52.430\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "π EVALUATING OPTIMIZED PROMPT π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:34:27.452\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.002 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 175\u001b[0m\n", + "\u001b[32m2025-04-19 15:34:33.397\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.005 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 548, completion_tokens: 237\u001b[0m\n", + "\u001b[32m2025-04-19 15:34:52.530\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.007 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 156\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.464\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.009 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 182\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.465\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [True, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.467\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "π― OPTIMIZATION RESULT π―\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.467\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 2 Optimization: β SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.473\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πRound 3 OPTIMIZATION STARTING π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:02.473\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 2 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.753\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.022 | Max budget: $10.000 | Current cost: $0.010, prompt_tokens: 408, completion_tokens: 260\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.754\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 3 round: Include instructions for handling ambiguous job titles and ensure consistent formatting.\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.754\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 3 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT declared title: the person's job title is\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:16.754\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "β‘ RUNNING OPTIMIZED PROMPT β‘\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:17.382\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 122, completion_tokens: 2\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:17.383\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "π EVALUATING OPTIMIZED PROMPT π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:34.698\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.011 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 185\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:39.916\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.013 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 243\u001b[0m\n", + "\u001b[32m2025-04-19 15:35:57.867\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.015 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 227\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.285\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.018 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 396, completion_tokens: 349\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.286\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [True, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.287\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "π― OPTIMIZATION RESULT π―\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.287\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 3 Optimization: β SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.291\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πRound 4 OPTIMIZATION STARTING π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:16.291\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 3 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.035 | Max budget: $10.000 | Current cost: $0.013, prompt_tokens: 436, completion_tokens: 398\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 4 round: Include explicit guidelines for handling ambiguity, provide contextual examples, define a clear input format, and reinforce the output format with an example.\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 4 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT: The person's job title is: [Job Title]\n", + "\n", + " # Example:\n", + " # INPUT: The person's job title is: Software Developer\n", + " # OUTPUT: ENGINEERING\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:25.535\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "β‘ RUNNING OPTIMIZED PROMPT β‘\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:56.956\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 185, completion_tokens: 21\u001b[0m\n", + "\u001b[32m2025-04-19 15:37:56.957\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "π EVALUATING OPTIMIZED PROMPT π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:10.473\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.020 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 230\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:10.476\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.023 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 236\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:14.260\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.025 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 210\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.786\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.028 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 507, completion_tokens: 250\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.787\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [True, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.789\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "π― OPTIMIZATION RESULT π―\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.789\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 4 Optimization: β SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.795\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m97\u001b[0m - \u001b[1m\n", + "πRound 5 OPTIMIZATION STARTING π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:34.795\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m98\u001b[0m - \u001b[1m\n", + "Selecting prompt for round 4 and advancing to the iteration phase\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.527\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.048 | Max budget: $10.000 | Current cost: $0.013, prompt_tokens: 529, completion_tokens: 383\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.527\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_generate_optimized_prompt\u001b[0m:\u001b[36m116\u001b[0m - \u001b[1mModification of 5 round: Include explicit criteria and additional examples for less common job titles to improve classification accuracy.\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.528\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_optimize_prompt\u001b[0m:\u001b[36m71\u001b[0m - \u001b[1m\n", + "Round 5 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT: The person's job title is: [Job Title]\n", + "\n", + " # Example:\n", + " # INPUT: The person's job title is: Software Developer\n", + " # OUTPUT: ENGINEERING\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:38:42.530\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m122\u001b[0m - \u001b[1m\n", + "β‘ RUNNING OPTIMIZED PROMPT β‘\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:49:40.335\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.002 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 202, completion_tokens: 12\u001b[0m\n", + "\u001b[32m2025-04-19 15:49:40.335\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_evaluate_new_prompt\u001b[0m:\u001b[36m125\u001b[0m - \u001b[1m\n", + "π EVALUATING OPTIMIZED PROMPT π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:49:58.080\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.030 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 246\u001b[0m\n", + "\u001b[32m2025-04-19 15:50:31.634\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.033 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 600, completion_tokens: 215\u001b[0m\n", + "\u001b[32m2025-04-19 15:50:41.015\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.035 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 232\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.878\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.utils.cost_manager\u001b[0m:\u001b[36mupdate_cost\u001b[0m:\u001b[36m57\u001b[0m - \u001b[1mTotal running cost: $0.038 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 231\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.879\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.utils.evaluation_utils\u001b[0m:\u001b[36mevaluate_prompt\u001b[0m:\u001b[36m63\u001b[0m - \u001b[1mEvaluation Results [False, True, True, True]\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.881\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m135\u001b[0m - \u001b[1m\n", + "π― OPTIMIZATION RESULT π―\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.882\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36m_log_optimization_result\u001b[0m:\u001b[36m136\u001b[0m - \u001b[1m\n", + "Round 5 Optimization: β SUCCESS\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.883\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m52\u001b[0m - \u001b[1m\n", + "==================================================\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m53\u001b[0m - \u001b[1m\n", + "π OPTIMIZATION COMPLETED - FINAL RESULTS π\n", + "\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m54\u001b[0m - \u001b[1m\n", + "π Best Performing Round: 5\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m55\u001b[0m - \u001b[1m\n", + "π― Final Optimized Prompt:\n", + "Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.\n", + "\n", + " # INPUT: The person's job title is: [Job Title]\n", + "\n", + " # Example:\n", + " # INPUT: The person's job title is: Software Developer\n", + " # OUTPUT: ENGINEERING\u001b[0m\n", + "\u001b[32m2025-04-19 15:51:11.884\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mmetagpt.ext.spo.components.optimizer\u001b[0m:\u001b[36mshow_final_result\u001b[0m:\u001b[36m56\u001b[0m - \u001b[1m\n", + "==================================================\n", + "\u001b[0m\n" + ] + } + ], + "source": [ + "!python spo.py" + ] + }, + { + "cell_type": "markdown", + "id": "ca302560", + "metadata": {}, + "source": [ + "## Asessing the results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "| Original Prompt | Optimized Prompt |\n", + "|-----------------|------------------|\n", + "| Your task is to provide me with a direct classification of the person's job title into one of 4 categories. The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. There is no possibility for mixed assignments. You always assign one and one only category to each subject. When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if (1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. # INPUT declared title: the person job title is {job_title} | Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.