Skip to content

Commit b150f2f

Browse files
quanruyuyutaotaoclaudegithub-actions[bot]Copilot
authored
Merge main (#1371)
* docs(awesome): add midscene java sdk (#1324) * fix(core): support number type for aiInput value field (#1339) * fix(core): support number type for aiInput value field This change allows aiInput.value to accept both string and number types, addressing scenarios where: 1. AI models return numeric values instead of strings 2. YAML files contain unquoted numbers that parse as number type Changes: - Updated type definitions to accept string | number - Added Zod schema transformation to convert numbers to strings - Updated runtime validation to accept both types - Added explicit conversion in YAML player as fallback All conversions happen internally and are transparent to users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): update aiInput type signatures to accept number values Update the TypeScript method signatures for aiInput to accept string | number for the value parameter, matching the runtime implementation. Changes: - New signature: opt parameter now accepts { value: string | number } - Legacy signature: first parameter now accepts string | number - Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number - Type assertion updated from `as string` to `as string | number` This ensures type safety and allows users to pass number values directly without TypeScript errors, while maintaining backward compatibility with existing string-based usage. Fixes type errors in test cases that use number values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * fix(report): prevent sidebar jitter when expanding case selector (#1344) Fixed sidebar shifting 1-2 pixels when clicking to expand the playwright case selector. The issue was caused by adding a border only in the expanded state, causing a sudden height change. Solution: Added transparent border to the collapsed state, ensuring consistent height across both states. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * refactor(core): unify cache config parameters (#1346) Simplified `processCacheConfig` function signature from 3 to 2 parameters. Unified `fallbackId` and `cacheId` into single `cacheId` parameter. BREAKING CHANGE: processCacheConfig signature changed Changed from: processCacheConfig(cache, fallbackId, cacheId?) To: processCacheConfig(cache, cacheId) The cacheId parameter now serves dual purpose: 1. Fallback ID when cache is true or cache object lacks ID 2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env) Updated call sites: - packages/core/src/agent/agent.ts - packages/web-integration/src/playwright/ai-fixture.ts - packages/cli/src/create-yaml-player.ts (4 locations) Added comprehensive test coverage for legacy compatibility mode: - process-cache-config.test.ts: 18 tests passing - create-yaml-player.test.ts: 13 tests passing (6 new) - playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new) Benefits: - Simpler API with fewer parameters - Unified semantics for new and legacy use cases - Full backward compatibility maintained - Better test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * release: v0.30.5 * docs(site): optimize v0.30 changelog with user-focused improvements (#1352) Improved the v0.30 changelog to be more user-centric and less promotional: - Reduced hyperbolic language ("comprehensive upgrade" → "improved", etc.) - Reorganized content structure with clearer user value sections - Added specific usage scenarios and examples for cache strategies - Enhanced mobile platform sections with iOS and Android subsections - Simplified technical descriptions to be more objective - Added cross-platform consistency section for ClearInput feature - Translated optimized content to English version These changes make the changelog more professional and easier for users to understand the actual benefits of the update. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * fix(ios): correct horizontal scroll direction and improve swipe implementation (#1358) * fix(ios): correct horizontal scroll direction and improve swipe implementation Fixed two issues with iOS horizontal scrolling: 1. **Corrected scroll direction semantics** - scrollLeft now swipes right (brings left content into view) - scrollRight now swipes left (brings right content into view) - This aligns with Android and Web scroll behavior where the direction indicates which content enters the viewport 2. **Improved swipe implementation** - Implemented W3C Actions API for better scroll support - Falls back to dragfromtoforduration if Actions API fails - Increased scroll distance from width/3 to width*0.7 (70%) to prevent bounce-back 3. **Fixed scrollUntilBoundary directions** - Corrected left/right swipe directions in boundary detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(ios): remove fallback from swipe method, use W3C Actions API only --------- Co-authored-by: Claude <[email protected]> * feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice (#1363) * fix(docs): add alwaysFetchScreenInfo parameter to AndroidDevice constructor documentation * feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice Configure AndroidDevice instance with alwaysFetchScreenInfo option set to true to ensure screen information is always fetched during device operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for consistency --------- Co-authored-by: Claude <[email protected]> * fix(core): handle ZodEffects and ZodUnion in schema parsing (#1359) * fix(core): handle ZodEffects and ZodUnion in schema parsing - Add support for ZodEffects (transformations) in getTypeName and getDescription - Add support for ZodUnion types with proper type display (type1 | type2) - Fixes "failed to parse Zod type" warning on first execution with caching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): add tests for descriptionForAction with ZodEffects and ZodUnion * chore(core): update test cases --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yutao <[email protected]> * feat(playground): implement task cancellation for Android/iOS playgrounds (#1355) * feat(playground): implement task cancellation for Android/iOS playgrounds This PR implements task cancellation functionality for Android and iOS playgrounds using a singleton + recreation pattern. When users clicked the "Stop" button in Android/iOS playground, the task continued to execute and control the device via ADB commands. This was because: - Agent instances were global singletons created at server startup - The /cancel endpoint only deleted progress tips without stopping execution - There was no mechanism to interrupt ongoing tasks Implemented a singleton + recreation pattern: - PlaygroundServer now accepts factory functions instead of instances - Added task locking mechanism (currentTaskId) to prevent concurrent tasks - When cancel is triggered, the agent is destroyed and recreated - Device operations stop immediately as destroyed agents reject new commands 1. **PlaygroundServer** (packages/playground/src/server.ts) - Added factory function support for page and agent creation - Added `recreateAgent()` method to destroy and recreate agent - Added `currentTaskId` to track running tasks - Enhanced `/execute` endpoint with task conflict detection - Enhanced `/cancel` endpoint to recreate agent on cancellation - Backward compatible with existing instance-based usage 2. **Android Playground** (packages/android-playground/src/bin.ts) - Updated to use factory pattern for server creation - Each recreation creates fresh AndroidDevice and AndroidAgent instances 3. **iOS Playground** (packages/ios/src/bin.ts) - Updated to use factory pattern for server creation - Each recreation creates fresh IOSDevice and IOSAgent instances - Added test script `test-cancel-android.sh` for automated testing - Manual testing confirmed device operations stop when cancel is triggered ``` User clicks Stop ↓ Frontend calls /cancel/:requestId ↓ Server checks if current running task ↓ Call recreateAgent() ├─ Destroy old agent (agent.destroy()) ├─ Destroy old device (device.destroy()) ├─ Create new device (pageFactory()) └─ Create new agent (agentFactory(device)) ↓ Clear task lock and progress tips ↓ Device stops operations ✅ ``` - ✅ Simple implementation (minimal code changes) - ✅ Effective cancellation (destroy() immediately sets destroyed flag) - ✅ Backward compatible (still accepts instances) - ✅ Natural serialization (one task at a time per device) ```bash pnpm run android:playground ./test-cancel-android.sh ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(page): ensure keyboard actions return promises for better async handling * refactor(playground): update PlaygroundServer to use agent factories and simplify server creation * fix(ios): round coordinates for tap and swipe actions to improve accuracy * fix(android): round coordinates in scrolling and gesture methods for improved accuracy * refactor(playground): simplify PlaygroundServer instantiation and improve code readability --------- Co-authored-by: Claude <[email protected]> * fix(yaml): skip environment variable interpolation in YAML comments (#1361) * Initial plan * fix(yaml): skip environment variable interpolation in YAML comments * style(yaml): apply biome linting fixes Co-authored-by: quanru <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: quanru <[email protected]> * fix(core): handle null data in WaitFor and support array keyName in KeyboardPress (#1354) * fix(core): handle null data in WaitFor and support array keyName in KeyboardPress This commit fixes two critical bugs: 1. **Fix null data handling in task execution** - Fixed TypeError when AI extract() returns null for WaitFor operations - Added null/undefined check before accessing data properties - WaitFor operations now return false when data is null (condition not met) - Other operations (Assert, Query, String, Number) return null when data is null - Location: src/agent/tasks.ts:936-938 2. **Add array support for keyName in KeyboardPress** - Updated actionKeyboardPressParamSchema to accept string | string[] - Allows key combinations like ['Control', 'A'] for keyboard shortcuts - Maintains backward compatibility with string format - Updated type definitions in aiKeyboardPress method - Locations: - src/device/index.ts:197-199 - src/agent/agent.ts:575-622 **Test Coverage:** - Added comprehensive unit tests for null data handling (8 test cases) - Added unit tests for keyName array validation (7 test cases) - All tests verify edge cases and expected behavior Fixes issue where executor crashed with: "TypeError: Cannot read properties of null (reading 'StatementIsTruthy')" And fixes parameter validation error: "Invalid parameters for action KeyboardPress: Expected string, received array" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ios,android): handle array keyName in KeyboardPress action - Updated iOS and Android device implementations to handle keyName as string | string[] - For mobile devices, array keys are joined with '+' (e.g., ['Control', 'A'] becomes 'Control+A') - This fixes TypeScript compilation errors in iOS and Android packages - Maintains backward compatibility with string format Related to the KeyboardPress array support added in the previous commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(ios,android): improve KeyboardPress array handling - Remove incorrect join('+') approach that doesn't work on mobile devices - Use last key from array instead (e.g., ['Control', 'A'] → 'A') - Add clear warning messages when array input is used on mobile platforms - Mobile devices don't support keyboard combinations, this is a graceful degradation This makes the behavior more predictable and provides better feedback to developers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): fix TaskExecutor constructor arguments in null data tests - Fixed TaskExecutor constructor call to match actual signature - Constructor requires (interface, insight, options) instead of (insight, interface) - All 8 tests now passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ios,android): improve logging for unsupported key combinations in device input * fix(core): handle null data in WaitFor and improve keyName parameter description This commit fixes the null data handling bug and improves the KeyboardPress parameter description. ## Changes: ### 1. Fix null data handling in task execution - Fixed TypeError when AI extract() returns null for WaitFor operations - Added null/undefined check before accessing data properties (tasks.ts:936-938) - WaitFor operations now return false when data is null (condition not met) - Other operations (Assert, Query, String, Number) return null when data is null ### 2. Improve KeyboardPress parameter description - Reverted keyName to only accept string type (not array) - Added clear description: "Use '+' for key combinations, e.g., 'Control+A', 'Shift+Enter'" - This provides better guidance to AI for generating key combinations - Simplified iOS/Android implementations (no special array handling needed) ### 3. Test coverage - Added 8 unit tests for null data handling - Updated KeyboardPress tests to validate string-only format - Added test for key combination strings (e.g., 'Control+A') - Added test to verify arrays are rejected - Fixed unused variable warning in test file ## Fixed Issues: **Issue 1:** Executor crashes with null data ``` TypeError: Cannot read properties of null (reading 'StatementIsTruthy') ``` **Issue 2:** Unclear how to specify key combinations - Now clearly documented in parameter description with examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(core): align KeyboardPress action description with parameter schema Updated the KeyboardPress action description to explicitly mention support for key combinations (e.g., "Control+A", "Shift+Enter"), making it consistent with the keyName parameter description that already documented this functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): handle null and undefined data in WaitFor output processing --------- Co-authored-by: Claude <[email protected]> * perf(android): optimize clearInput performance by batching keyevents (#1366) * perf(android): optimize clearInput performance by batching keyevents Replace serial keyevent(67) calls with clearTextField() method from appium-adb library, which batches all keyevents into a single shell command. Performance improvement: - Before: ~50 seconds (100 sequential shell calls, ~500ms each) - After: ~1-2 seconds (single batched shell command) - Speedup: 25-50x Changes: - Use adb.clearTextField(100) instead of repeat(() => adb.keyevent(67)) - Add clearTextField mock to unit tests for compatibility All 75 unit tests passing, build successful. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): include device pixel ratio in size calculation for AndroidDevice --------- Co-authored-by: Claude <[email protected]> * release: v0.30.6 * fix(tests): enhance null data handling tests by adding uiContext parameter --------- Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: yutao <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: quanru <[email protected]>
1 parent 767127e commit b150f2f

File tree

29 files changed

+1142
-145
lines changed

29 files changed

+1142
-145
lines changed

apps/chrome-extension/static/manifest.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "Midscene.js",
33
"description": "Open-source SDK for automating web pages using natural language through AI.",
4-
"version": "0.136",
4+
"version": "0.137",
55
"manifest_version": 3,
66
"permissions": [
77
"activeTab",

apps/site/docs/en/changelog.mdx

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,60 @@
22

33
> For the complete changelog, please refer to: [Midscene Releases](https://github.com/web-infra-dev/midscene/releases)
44
5+
## v0.30 - 🎯 Cache management upgrade and mobile experience optimization
6+
7+
### 🎯 More flexible cache strategy
8+
9+
v0.30 improves the cache system, allowing you to control cache behavior based on actual needs:
10+
11+
- **Multiple cache modes available**: Supports read-only, write-only, and read-write strategies. For example, use read-only mode in CI environments to reuse cache, and use write-only mode in local development to update cache
12+
- **Automatic cleanup of unused cache**: Agent can automatically clean up unused cache records when destroyed, preventing cache files from accumulating
13+
- **Simplified unified configuration**: Cache configuration parameters for CLI and Agent are now unified, no need to remember different configurations
14+
15+
### 📊 Report management convenience
16+
17+
- **Support for merging multiple reports**: In addition to playwright scenarios, all scenarios now support merging multiple automation execution reports into a single file for centralized viewing and sharing of test results
18+
19+
### 📱 Mobile automation optimization
20+
21+
#### iOS platform improvements
22+
- **Real device support improvement**: Removed simctl check restriction, making iOS real device automation smoother
23+
- **Auto-adapt device display**: Implemented automatic device pixel ratio detection, ensuring accurate element positioning on different iOS devices
24+
25+
#### Android platform enhancements
26+
- **Flexible screenshot optimization**: Added `screenshotResizeRatio` option, allowing you to customize screenshot size while ensuring visual recognition accuracy, reducing network transmission and storage overhead
27+
- **Screen info cache control**: Use `alwaysRefreshScreenInfo` option to control whether to fetch screen information each time, allowing cache reuse in stable environments to improve performance
28+
- **Direct ADB command execution**: AndroidAgent added `runAdbCommand` method for convenient execution of custom device control commands
29+
30+
#### Cross-platform consistency
31+
- **ClearInput support on all platforms**: Solves the problem of AI being unable to accurately plan clear input operations across platforms
32+
33+
### 🔧 Feature enhancements
34+
35+
- **Failure classification**: CLI execution results can now distinguish between "skipped failures" and "actual failures", helping locate issue causes
36+
- **aiInput append mode**: Added `append` option to append input while preserving existing content, suitable for editing scenarios
37+
- **Chrome extension improvements**:
38+
- Popup mode preference saved to localStorage, remembering your choice on next open
39+
- Bridge mode supports auto-connect, reducing manual operations
40+
- Support for GPT-4o and non-visual language models
41+
42+
### 🛡️ Type safety improvements
43+
44+
- **Zod schema validation**: Introduced type checking for action parameters, detecting parameter errors during development to avoid runtime issues
45+
- **Number type support**: Fixed `aiInput` support for number type values, making type handling more robust
46+
47+
### 🐞 Bug fixes
48+
49+
- Fixed potential issues caused by Playwright circular dependencies
50+
- Fixed issue where `aiWaitFor` as the first statement could not generate reports
51+
- Improved video recorder delay logic to ensure the last frame is captured
52+
- Optimized report display logic to view both error information and element positioning information simultaneously
53+
- Fixed issue where `cacheable` option in `aiAction` subtasks was not properly passed
54+
55+
### 📚 Community
56+
57+
- Awesome Midscene section added [midscene-java](./awesome-midscene.md) community project
58+
559
## v0.29 - 📱 iOS platform support added
660

761
### 🚀 iOS platform support added

apps/site/docs/en/integrate-with-android.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ The AndroidDevice constructor supports the following parameters:
129129
- `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` - Optional, when should Midscene invoke [yadb](https://github.com/ysbing/YADB) to input texts. `'yadb-for-non-ascii'` uses yadb only when handling non-ASCII words, while `'always-yadb'` forces yadb for every input task. Try switching between these strategies if the default configuration fails to input texts. (Default: 'yadb-for-non-ascii')
130130
- `displayId?: number` - Optional, the display id to use. (Default: undefined, means use the current display)
131131
- `screenshotResizeScale?: number` - Optional, controls the size of the screenshot Midscene sends to the AI model. Default is `1 / devicePixelRatio`, so a 1200×800 display with a device pixel ratio of 3 sends an image of roughly 400×267 to the model. Adjusting this value manually is not recommended.
132+
- `alwaysRefreshScreenInfo?: boolean` - Optional, whether to re-fetch screen size and orientation information every time. Default is false (uses cache for better performance). Set to true if the device may rotate or you need real-time screen information.
132133

133134
### Additional Android Agent Interfaces
134135

apps/site/docs/en/mcp-android.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Midscene MCP provides the following Android device automation tools:
7676
Parameters:
7777
- deviceId: (Optional) Device ID to connect to. If not provided, uses the first available device.
7878
- displayId: (Optional) Display ID for multi-display Android devices (e.g., 0, 1, 2). When specified, all ADB input operations will target this specific display.
79-
- alwaysFetchScreenInfo: (Optional) Whether to always fetch screen size and orientation from the device on each call. Defaults to false (uses cache for better performance). Set to true if the device may rotate or you need real-time screen information.
79+
- alwaysRefreshScreenInfo: (Optional) Whether to always fetch screen size and orientation from the device on each call. Defaults to false (uses cache for better performance). Set to true if the device may rotate or you need real-time screen information.
8080
```
8181

8282
### App control

apps/site/docs/zh/changelog.mdx

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,60 @@
22

33
> 完整更新日志请参考:[Midscene Releases](https://github.com/web-infra-dev/midscene/releases)
44
5+
## V0.30 - 🎯 缓存管理升级与移动端体验优化
6+
7+
### 🎯 更灵活的缓存策略
8+
9+
v0.30 版本改进了缓存系统,让你可以根据实际需求控制缓存行为:
10+
11+
- **多种缓存模式可选**: 支持只读(read-only)、只写(write-only)、读写(read-write)等策略。例如在 CI 环境中使用只读模式复用缓存,在本地开发时使用只写模式更新缓存
12+
- **自动清理无用缓存**: Agent 销毁时可自动清理未使用的缓存记录,避免缓存文件越积越多
13+
- **配置更简洁统一**: CLI 和 Agent 的缓存配置参数已统一,无需记忆不同的配置方式
14+
15+
### 📊 报告管理更便捷
16+
17+
- **支持合并多个报告**: 除了 playwright 场景,现在任意场景均支持将多次自动化执行的报告合并为单个文件,方便集中查看和分享测试结果
18+
19+
### 📱 移动端自动化优化
20+
21+
#### iOS 平台改进
22+
- **真机支持改进**: 移除了 simctl 检查限制,iOS 真机设备的自动化更流畅
23+
- **自动适配设备显示**: 实现设备像素比自动检测,确保在不同 iOS 设备上元素定位准确
24+
25+
#### Android 平台增强
26+
- **灵活的截图优化**: 新增 `screenshotResizeRatio` 选项,你可以在保证视觉识别准确性的前提下自定义截图尺寸,减少网络传输和存储开销
27+
- **屏幕信息缓存控制**: 通过 `alwaysRefreshScreenInfo` 选项控制是否每次都获取屏幕信息,在稳定环境下可复用缓存提升性能
28+
- **直接执行 ADB 命令**: AndroidAgent 新增 `runAdbCommand` 方法,方便执行自定义的设备控制命令
29+
30+
#### 跨平台一致性
31+
- **ClearInput 全平台支持**: 解决 AI 无法准确规划各平台清空输入的操作问题
32+
33+
### 🔧 功能增强
34+
35+
- **失败分类**: CLI 执行结果现在可以区分「跳过的失败」和「真正的失败」,帮助定位问题原因
36+
- **aiInput 追加输入**: 新增 `append` 选项,在保留现有内容的基础上追加输入,适用于编辑场景
37+
- **Chrome 扩展改进**:
38+
- 弹窗模式偏好会保存到 localStorage,下次打开记住你的选择
39+
- Bridge 模式支持自动连接,减少手动操作
40+
- 支持 GPT-4o 和非视觉语言模型
41+
42+
### 🛡️ 类型安全改进
43+
44+
- **Zod 模式验证**: 为 action 参数引入类型检查,在开发阶段发现参数错误,避免运行时问题
45+
- **数字类型支持**: 修复了 `aiInput` 对 number 类型值的支持,类型处理更健壮
46+
47+
### 🐞 问题修复
48+
49+
- 修复了 Playwright 循环依赖导致的潜在问题
50+
- 修复了 `aiWaitFor` 作为首个语句时无法生成报告的问题
51+
- 改进视频录制器延迟逻辑,确保最后的画面帧也能被捕获
52+
- 优化报告展示逻辑,现在可以同时查看错误信息和元素定位信息
53+
- 修复了 `aiAction` 子任务中 `cacheable` 选项未正确传递的问题
54+
55+
### 📚 社区
56+
57+
- Awesome Midscene 板块新增 [midscene-java](./awesome-midscene.md) 社区项目
58+
559
## v0.29 - 📱 新增 iOS 平台支持
660

761
### 🚀 新增 iOS 平台支持

apps/site/docs/zh/integrate-with-android.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ AndroidDevice 的构造函数支持以下参数:
128128
- `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` - 可选参数,控制 Midscene 何时调用 [yadb](https://github.com/ysbing/YADB) 来输入文本。`'yadb-for-non-ascii'` 仅在输入非 ASCII 文本时启用 yadb,而 `'always-yadb'` 会在所有输入任务中都使用 yadb。如果默认配置无法正确输入文本,可尝试在这两种策略之间切换。默认值为 'yadb-for-non-ascii'。
129129
- `displayId?: number` - 可选参数,用于指定要使用的显示器 ID。默认值为 undefined,表示使用当前显示器。
130130
- `screenshotResizeScale?: number` - 可选参数,控制发送给 AI 模型的截图尺寸。默认值为 `1 / devicePixelRatio`,因此对于分辨率 1200×800、设备像素比(DPR)为 3 的界面,发送到模型的图片约为 400×267。不建议手动修改该参数。
131+
- `alwaysRefreshScreenInfo?: boolean` - 可选参数,是否每次都重新获取屏幕尺寸和方向信息。默认为 false(使用缓存以提高性能)。如果设备可能会旋转或需要实时屏幕信息,设置为 true。
131132

132133
### Android Agent 上的更多接口
133134

apps/site/docs/zh/mcp-android.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Midscene MCP 提供以下 Android 设备自动化工具:
7676
参数:
7777
- deviceId:(可选)要连接的设备 ID。如果未提供,使用第一个可用设备
7878
- displayId:(可选)多屏 Android 设备的显示屏 ID(如 0、1、2),当指定时,所有 ADB 输入操作将针对此特定显示屏
79-
- alwaysFetchScreenInfo:(可选)是否每次都重新获取屏幕尺寸和方向信息。默认为 false(使用缓存以提高性能)。如果设备可能会旋转或需要实时屏幕信息,设置为 true
79+
- alwaysRefreshScreenInfo:(可选)是否每次都重新获取屏幕尺寸和方向信息。默认为 false(使用缓存以提高性能)。如果设备可能会旋转或需要实时屏幕信息,设置为 true
8080
```
8181

8282
### 应用控制

packages/android-playground/src/bin.ts

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -119,11 +119,17 @@ const main = async () => {
119119
const selectedDeviceId = await selectDevice();
120120
console.log(`✅ Selected device: ${selectedDeviceId}`);
121121

122-
// Create device and agent instances with selected device
123-
const device = new AndroidDevice(selectedDeviceId);
124-
const agent = new AndroidAgent(device);
122+
// Create PlaygroundServer with agent factory
123+
const playgroundServer = new PlaygroundServer(
124+
// Agent factory - creates new agent with device each time
125+
async () => {
126+
const device = new AndroidDevice(selectedDeviceId);
127+
await device.connect();
128+
return new AndroidAgent(device);
129+
},
130+
staticDir,
131+
);
125132

126-
const playgroundServer = new PlaygroundServer(device, agent, staticDir);
127133
const scrcpyServer = new ScrcpyServer();
128134

129135
// Set the selected device in scrcpy server

packages/android/src/agent.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ export async function agentFromAdbDevice(
4545
usePhysicalDisplayIdForDisplayLookup:
4646
opts?.usePhysicalDisplayIdForDisplayLookup,
4747
screenshotResizeScale: opts?.screenshotResizeScale,
48-
alwaysFetchScreenInfo: opts?.alwaysFetchScreenInfo,
48+
alwaysRefreshScreenInfo: opts?.alwaysRefreshScreenInfo,
4949
});
5050

5151
await device.connect();

0 commit comments

Comments
 (0)