web-infra-dev
diff --git a/‎apps/site/docs/en/api.mdx‎
Lines changed: 56 additions & 2 deletions b/‎apps/site/docs/en/api.mdx‎
Lines changed: 56 additions & 2 deletions
diff --git a/‎apps/site/docs/en/choose-a-model.mdx‎
Lines changed: 31 additions & 15 deletions b/‎apps/site/docs/en/choose-a-model.mdx‎
Lines changed: 31 additions & 15 deletions
diff --git a/‎apps/site/docs/en/model-provider.mdx‎
Lines changed: 19 additions & 15 deletions b/‎apps/site/docs/en/model-provider.mdx‎
Lines changed: 19 additions & 15 deletions
diff --git a/‎apps/site/docs/zh/api.mdx‎
Lines changed: 58 additions & 2 deletions b/‎apps/site/docs/zh/api.mdx‎
Lines changed: 58 additions & 2 deletions
@@ -25,6 +25,58 @@ In Playwright and Puppeteer, there are some common parameters:
 - `forceSameTabNavigation: boolean`: If true, page navigation is restricted to the current tab. (Default: true)
 - `waitForNavigationTimeout: number`: The timeout for waiting for navigation finished. (Default: 5000ms, set to 0 means not waiting for navigation finished)
 
+These Agents also support the following advanced configuration parameters:
+
+- `modelConfig: () => IModelConfig`: Optional. Custom model configuration function. Allows you to dynamically configure different models through code instead of environment variables. This is particularly useful when you need to use different models for different AI tasks (such as VQA, planning, grounding, etc.).
+
+  **Example:**
+  ```typescript
+  const agent = new PuppeteerAgent(page, {
+    modelConfig: () => ({
+      MIDSCENE_MODEL_NAME: 'qwen3-vl-plus',
+      MIDSCENE_MODEL_BASE_URL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
+      MIDSCENE_MODEL_API_KEY: 'sk-...',
+      MIDSCENE_LOCATOR_MODE: 'qwen3-vl'
+    })
+  });
+  ```
+
+- `createOpenAIClient: (config) => OpenAI`: Optional. Custom OpenAI client factory function. Allows you to create custom OpenAI client instances for integrating observability tools (such as LangSmith, LangFuse) or using custom OpenAI-compatible clients.
+
+  **Parameter Description:**
+  - `config.modelName: string` - Model name
+  - `config.openaiApiKey?: string` - API key
+  - `config.openaiBaseURL?: string` - API endpoint URL
+  - `config.intent: string` - AI task type ('VQA' | 'planning' | 'grounding' | 'default')
+  - `config.vlMode?: string` - Visual language model mode
+  - Other configuration parameters...
+
+  **Example (LangSmith Integration):**
+  ```typescript
+  import OpenAI from 'openai';
+  import { wrapOpenAI } from 'langsmith/wrappers';
+
+  const agent = new PuppeteerAgent(page, {
+    createOpenAIClient: (config) => {
+      const openai = new OpenAI({
+        apiKey: config.openaiApiKey,
+        baseURL: config.openaiBaseURL,
+      });
+
+      // Wrap with LangSmith for planning tasks
+      if (config.intent === 'planning') {
+        return wrapOpenAI(openai, {
+          metadata: { task: 'planning' }
+        });
+      }
+
+      return openai;
+    }
+  });
+  ```
+
+  **Note:** `createOpenAIClient` overrides the behavior of the `MIDSCENE_LANGSMITH_DEBUG` environment variable. If you provide a custom client factory function, you need to handle the integration of LangSmith or other observability tools yourself.
+
 In Puppeteer, there is also a parameter:
 
 - `waitForNetworkIdleTimeout: number`: The timeout for waiting for network idle between each action. (Default: 2000ms, set to 0 means not waiting for network idle)
@@ -854,9 +906,11 @@ You can override environment variables at runtime by calling the `overrideAIConf
 import { overrideAIConfig } from '@midscene/web/puppeteer'; // or another Agent
 
 overrideAIConfig({
-  OPENAI_BASE_URL: '...',
-  OPENAI_API_KEY: '...',
   MIDSCENE_MODEL_NAME: '...',
+  MODEL_BASE_URL: '...', // recommended, use new variable name
+  MODEL_API_KEY: '...', // recommended, use new variable name
+  // OPENAI_BASE_URL: '...', // deprecated but still compatible
+  // OPENAI_API_KEY: '...', // deprecated but still compatible
 });
 ```
 
 
@@ -4,6 +4,22 @@ import TroubleshootingLLMConnectivity from './common/troubleshooting-llm-connect
 
 Choose one of the following models, obtain the API key, complete the configuration, and you are ready to go. Choose the model that is easiest to obtain if you are a beginner.
 
+## Environment Variable Configuration
+
+Starting from version 1.0, Midscene.js recommends using the following new environment variable names:
+
+- `MODEL_API_KEY` - API key (recommended)
+- `MODEL_BASE_URL` - API endpoint URL (recommended)
+
+For backward compatibility, the following legacy variable names are still supported:
+
+- `OPENAI_API_KEY` - API key (deprecated but still compatible)
+- `OPENAI_BASE_URL` - API endpoint URL (deprecated but still compatible)
+
+When both new and old variables are set, the new variables (`MODEL_*`) will take precedence.
+
+In the configuration examples throughout this document, we will use the new variable names. If you are currently using the old variable names, there's no need to change them immediately - they will continue to work.
+
 ## Adapted models for using Midscene.js
 
 Midscene.js supports two types of models, visual-language models and LLM models.
@@ -46,8 +62,8 @@ We recommend the Qwen3-VL series, which clearly outperforms Qwen2.5-VL. Qwen3-VL
 Using the Alibaba Cloud `qwen3-vl-plus` model as an example:
 
 ```bash
-OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
-OPENAI_API_KEY="......"
+MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
+MODEL_API_KEY="......"
 MIDSCENE_MODEL_NAME="qwen3-vl-plus"
 MIDSCENE_USE_QWEN3_VL=1 # Note: cannot be set together with MIDSCENE_USE_QWEN_VL
 ```
@@ -57,8 +73,8 @@ MIDSCENE_USE_QWEN3_VL=1 # Note: cannot be set together with MIDSCENE_USE_QWEN_VL
 Using the Alibaba Cloud `qwen-vl-max-latest` model as an example:
 
 ```bash
-OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
-OPENAI_API_KEY="......"
+MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
+MODEL_API_KEY="......"
 MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
 MIDSCENE_USE_QWEN_VL=1 # Note: cannot be set together with MIDSCENE_USE_QWEN3_VL
 ```
@@ -85,8 +101,8 @@ They perform strongly for visual grounding and assertion in complex scenarios. W
 After obtaining an API key from [Volcano Engine](https://volcengine.com), you can use the following configuration:
 
 ```bash
-OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
-OPENAI_API_KEY="...."
+MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
+MODEL_API_KEY="...."
 MIDSCENE_MODEL_NAME="ep-..." # Inference endpoint ID or model name from Volcano Engine
 MIDSCENE_USE_DOUBAO_VISION=1
 ```
@@ -108,8 +124,8 @@ When using Gemini-2.5-Pro, set `MIDSCENE_USE_GEMINI=1` to enable Gemini-specific
 After applying for the API key on [Google Gemini](https://gemini.google.com/), you can use the following config:
 
 ```bash
-OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
-OPENAI_API_KEY="......"
+MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
+MODEL_API_KEY="......"
 MIDSCENE_MODEL_NAME="gemini-2.5-pro-preview-05-06"
 MIDSCENE_USE_GEMINI=1
 ```
@@ -130,8 +146,8 @@ With UI-TARS you can use goal-driven prompts, such as "Log in with username foo
 You can use the deployed `doubao-1.5-ui-tars` on [Volcano Engine](https://volcengine.com).
 
 ```bash
-OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
-OPENAI_API_KEY="...."
+MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
+MODEL_API_KEY="...."
 MIDSCENE_MODEL_NAME="ep-2025..." # Inference endpoint ID or model name from Volcano Engine
 MIDSCENE_USE_VLM_UI_TARS=DOUBAO
 ```
@@ -164,8 +180,8 @@ The token cost of GPT-4o is relatively high because Midscene sends DOM informati
 **Config**
 
 ```bash
-OPENAI_API_KEY="......"
-OPENAI_BASE_URL="https://custom-endpoint.com/compatible-mode/v1" # Optional, if you want an endpoint other than the default OpenAI one.
+MODEL_API_KEY="......"
+MODEL_BASE_URL="https://custom-endpoint.com/compatible-mode/v1" # Optional, if you want an endpoint other than the default OpenAI one.
 MIDSCENE_MODEL_NAME="gpt-4o-2024-11-20" # Optional. The default is "gpt-4o".
 ```
 
@@ -176,7 +192,7 @@ Other models are also supported by Midscene.js. Midscene will use the same promp
 
 1. A multimodal model is required, which means it must support image input.
 1. The larger the model, the better it works. However, it needs more GPU or money.
-1. Find out how to to call it with an OpenAI SDK compatible endpoint. Usually you should set the `OPENAI_BASE_URL`, `OPENAI_API_KEY` and `MIDSCENE_MODEL_NAME`. Config are described in [Config Model and Provider](./model-provider).
+1. Find out how to to call it with an OpenAI SDK compatible endpoint. Usually you should set the `MODEL_BASE_URL`, `MODEL_API_KEY` and `MIDSCENE_MODEL_NAME`. Config are described in [Config Model and Provider](./model-provider).
 1. If you find it not working well after changing the model, you can try using some short and clear prompt, or roll back to the previous model. See more details in [Prompting Tips](./prompting-tips).
 1. Remember to follow the terms of use of each model and provider.
 1. Don't include the `MIDSCENE_USE_VLM_UI_TARS` and `MIDSCENE_USE_QWEN_VL` config unless you know what you are doing.
@@ -185,8 +201,8 @@ Other models are also supported by Midscene.js. Midscene will use the same promp
 
 ```bash
 MIDSCENE_MODEL_NAME="....."
-OPENAI_BASE_URL="......"
-OPENAI_API_KEY="......"
+MODEL_BASE_URL="......"
+MODEL_API_KEY="......"
 ```
 
 For more details and sample config, see [Config Model and Provider](./model-provider).
 
@@ -9,12 +9,14 @@ In this article, we will show you how to config AI service provider and how to c
 ## Configs
 
 ### Common configs
-These are the most common configs, in which `OPENAI_API_KEY` is required.
+These are the most common configs, in which `MODEL_API_KEY` or `OPENAI_API_KEY` is required.
 
 | Name | Description |
 |------|-------------|
-| `OPENAI_API_KEY` | Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
-| `OPENAI_BASE_URL` | Optional. Custom endpoint URL for API endpoint. Use it to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
+| `MODEL_API_KEY` | Required (recommended). Your API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
+| `MODEL_BASE_URL` | Optional (recommended). Custom endpoint URL for API endpoint. Use it to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
+| `OPENAI_API_KEY` | Deprecated but still compatible. Recommended to use `MODEL_API_KEY` |
+| `OPENAI_BASE_URL` | Deprecated but still compatible. Recommended to use `MODEL_BASE_URL` |
 | `MIDSCENE_MODEL_NAME` | Optional. Specify a different model name other than `gpt-4o` |
 
 Extra configs to use `Qwen 2.5 VL` model:
@@ -69,7 +71,7 @@ Pick one of the following ways to config environment variables.
 
 ```bash
 # replace by your own
-export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
+export MODEL_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
 
 # if you are not using the default OpenAI model, you need to config more params
 # export MIDSCENE_MODEL_NAME="..."
@@ -89,7 +91,7 @@ npm install dotenv --save
 Create a `.env` file in your project root directory, and add the following content. There is no need to add `export` before each line.
 
 ```
-OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz
+MODEL_API_KEY=sk-abcdefghijklmnopqrstuvwxyz
 ```
 
 Import the dotenv module in your script. It will automatically read the environment variables from the `.env` file.
@@ -110,6 +112,8 @@ import { overrideAIConfig } from "@midscene/web/puppeteer";
 
 overrideAIConfig({
   MIDSCENE_MODEL_NAME: "...",
+  MODEL_BASE_URL: "...", // recommended, use new variable name
+  MODEL_API_KEY: "...", // recommended, use new variable name
   // ...
 });
 ```
@@ -119,8 +123,8 @@ overrideAIConfig({
 Configure the environment variables:
 
 ```bash
-export OPENAI_API_KEY="sk-..."
-export OPENAI_BASE_URL="https://endpoint.some_other_provider.com/v1" # config this if you want to use a different endpoint
+export MODEL_API_KEY="sk-..."
+export MODEL_BASE_URL="https://endpoint.some_other_provider.com/v1" # config this if you want to use a different endpoint
 export MIDSCENE_MODEL_NAME="gpt-4o-2024-11-20" # optional, the default is "gpt-4o"
 ```
 
@@ -129,8 +133,8 @@ export MIDSCENE_MODEL_NAME="gpt-4o-2024-11-20" # optional, the default is "gpt-4
 Configure the environment variables:
 
 ```bash
-export OPENAI_API_KEY="sk-..."
-export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
+export MODEL_API_KEY="sk-..."
+export MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
 export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
 export MIDSCENE_USE_QWEN_VL=1
 ```
@@ -142,8 +146,8 @@ Configure the environment variables:
 
 
 ```bash
-export OPENAI_BASE_URL="https://ark-cn-beijing.bytedance.net/api/v3"
-export OPENAI_API_KEY="..."
+export MODEL_BASE_URL="https://ark-cn-beijing.bytedance.net/api/v3"
+export MODEL_API_KEY="..."
 export MIDSCENE_MODEL_NAME='ep-...'
 export MIDSCENE_USE_DOUBAO_VISION=1
 ```
@@ -153,17 +157,17 @@ export MIDSCENE_USE_DOUBAO_VISION=1
 Configure the environment variables:
 
 ```bash
-export OPENAI_API_KEY="sk-..."
-export OPENAI_BASE_URL="http://localhost:1234/v1"
+export MODEL_API_KEY="sk-..."
+export MODEL_BASE_URL="http://localhost:1234/v1"
 export MIDSCENE_MODEL_NAME="ui-tars-72b-sft"
 export MIDSCENE_USE_VLM_UI_TARS=1
 ```
 
 ## Example: config request headers (like for openrouter)
 
 ```bash
-export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
-export OPENAI_API_KEY="..."
+export MODEL_BASE_URL="https://openrouter.ai/api/v1"
+export MODEL_API_KEY="..."
 export MIDSCENE_MODEL_NAME="..."
 export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'
 ```
 
@@ -25,6 +25,58 @@ Midscene 中每个 Agent 都有自己的构造函数。
 - `forceSameTabNavigation: boolean`: 如果为 true，则限制页面在当前 tab 打开。默认值为 true。
 - `waitForNavigationTimeout: number`: 在页面跳转后等待页面加载完成的超时时间，默认值为 5000ms，设置为 0 则不做等待。
 
+这些 Agent 还支持以下高级配置参数：
+
+- `modelConfig: () => IModelConfig`: 可选。自定义模型配置函数。允许你通过代码动态配置不同的模型，而不是通过环境变量。这在需要为不同的 AI 任务（如 VQA、规划、定位等）使用不同模型时特别有用。
+
+  **示例：**
+  ```typescript
+  const agent = new PuppeteerAgent(page, {
+    modelConfig: () => ({
+      MIDSCENE_MODEL_NAME: 'qwen3-vl-plus',
+      MIDSCENE_MODEL_BASE_URL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
+      MIDSCENE_MODEL_API_KEY: 'sk-...',
+      MIDSCENE_LOCATOR_MODE: 'qwen3-vl'
+    })
+  });
+  ```
+
+- `createOpenAIClient: (config) => OpenAI`: 可选。自定义 OpenAI 客户端工厂函数。允许你创建自定义的 OpenAI 客户端实例，用于集成可观测性工具（如 LangSmith、LangFuse）或使用自定义的 OpenAI 兼容客户端。
+
+  **参数说明：**
+  - `config.modelName: string` - 模型名称
+  - `config.openaiApiKey?: string` - API 密钥
+  - `config.openaiBaseURL?: string` - API 接入地址
+  - `config.intent: string` - AI 任务类型（'VQA' | 'planning' | 'grounding' | 'default'）
+  - `config.vlMode?: string` - 视觉语言模型模式
+  - 其他配置参数...
+
+  **示例（集成 LangSmith）：**
+  ```typescript
+  import OpenAI from 'openai';
+  import { wrapOpenAI } from 'langsmith/wrappers';
+
+  const agent = new PuppeteerAgent(page, {
+    createOpenAIClient: (config) => {
+      const openai = new OpenAI({
+        apiKey: config.openaiApiKey,
+        baseURL: config.openaiBaseURL,
+      });
+
+      // 为规划任务包装 LangSmith
+      if (config.intent === 'planning') {
+        return wrapOpenAI(openai, {
+          metadata: { task: 'planning' }
+        });
+      }
+
+      return openai;
+    }
+  });
+  ```
+
+  **注意：** `createOpenAIClient` 会覆盖 `MIDSCENE_LANGSMITH_DEBUG` 环境变量的行为。如果你提供了自定义的客户端工厂函数，需要自行处理 LangSmith 或其他可观测性工具的集成。
+
 在 Puppeteer 中，还有以下参数：
 
 - `waitForNetworkIdleTimeout: number`: 在执行每个操作后等待网络空闲的超时时间，默认值为 2000ms，设置为 0 则不做等待。
@@ -863,9 +915,13 @@ console.log(logContent);
 import { overrideAIConfig } from '@midscene/web/puppeteer'; // 或其他的 Agent
 
 overrideAIConfig({
-  OPENAI_BASE_URL: '...',
-  OPENAI_API_KEY: '...',
+  MODEL_BASE_URL: '...', // 推荐使用新的变量名
+  MODEL_API_KEY: '...', // 推荐使用新的变量名
   MIDSCENE_MODEL_NAME: '...',
+
+  // 旧的变量名仍然兼容：
+  // OPENAI_BASE_URL: '...',
+  // OPENAI_API_KEY: '...',
 });
 ```