Skip to content

Feature/browser automation vision 6915#6959

Closed
Dev0907 wants to merge 2 commits intoaden-hive:mainfrom
Dev0907:feature/browser-automation-vision-6915
Closed

Feature/browser automation vision 6915#6959
Dev0907 wants to merge 2 commits intoaden-hive:mainfrom
Dev0907:feature/browser-automation-vision-6915

Conversation

@Dev0907
Copy link
Copy Markdown

@Dev0907 Dev0907 commented Apr 5, 2026

This PR adds a computer automation layer to the Hive codebase, implementing Playwright-based headless Chrome browser control for agent workflows. This enables agents to perform reliable, deterministic browser automation tasks including navigation, form interactions, screenshot capture, and JavaScript evaluation. Previously, agents could reason about browser tasks but lacked the capability to execute them reliably without manual intervention.

The implementation provides:

  • Headless Chrome browser control using Playwright
  • Isolated browser contexts per agent profile for safe concurrent execution
  • Comprehensive browser action tools (navigation, clicks, typing, scrolling, screenshots, JS evaluation)
  • Structured action execution for step-by-step browser automation
  • Screenshot capture functionality to support vision model workflows
  • MCP tool integration for seamless agent access
    This resolves the core issue of agents being unable to operate browser UIs deterministically and enables complex web automation workflows.

Related Issues
Fixes #6915: Browser automation for using Vision language models
Changes Made

  • Implement BeelineBridge class with Playwright-based headless Chrome automation
  • Add browser session management with isolated contexts per agent profile
  • Create comprehensive MCP tools for browser actions (lifecycle, navigation, interactions, inspection, advanced)
  • Enable screenshot capture for vision model integration
  • Update GCU tools module for agent workflow integration
  • Update GCU documentation to reflect Playwright implementation and new capabilities

Testing performed to verify the changes:

  • Code passes linting checks with ruff check (no errors or warnings)
  • Existing test suite validates API compatibility and tool registration
  • Mock-based testing ensures reliability for browser automation tools
  • Manual verification of implementation against problem statement requirements
  • Documentation updates reviewed for accuracy and completeness

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced Playwright-based headless browser automation with full screenshot capture support for vision model integration.
    • Added automated browser context isolation per profile for deterministic, structured action execution.
  • Documentation

    • Updated operational guidance and selection criteria for headless automation scenarios requiring computer vision capabilities.
  • Refactor

    • Simplified browser automation tool APIs with streamlined parameters and consistent error handling patterns.

Dev0907 added 2 commits April 6, 2026 00:53
- Implement BeelineBridge with headless Chrome control
- Add structured browser actions: navigation, interactions, inspection
- Register MCP tools for deterministic browser automation
- Enable screenshot capture for vision workflows
- Update GCU tools for agent integration

Resolves aden-hive#6915: Browser automation for using Vision language models
- Document Playwright implementation and vision workflow support
- Update system prompt best practices for screenshot usage
- Clarify isolated browser contexts and deterministic execution
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

PR Requirements Warning

This PR does not meet the contribution requirements.
If the issue is not fixed within ~24 hours, it may be automatically closed.

PR Author: @Dev0907
Found issues: #6915 (assignees: none)
Problem: The PR author must be assigned to the linked issue.

To fix:

  1. Assign yourself (@Dev0907) to one of the linked issues
  2. Re-open this PR

Exception: To bypass this requirement, you can:

  • Add the micro-fix label or include micro-fix in your PR title for trivial fixes
  • Add the documentation label or include doc/docs in your PR title for documentation changes

Micro-fix requirements (must meet ALL):

Qualifies Disqualifies
< 20 lines changed Any functional bug fix
Typos & Documentation & Linting Refactoring for "clean code"
No logic/API/DB changes New features (even tiny ones)

Why is this required? See #472 for details.

@github-actions github-actions bot added the pr-requirements-warning PR doesn't follow contribution guidelines. Please fix or it will be auto-closed. label Apr 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

Replaces the BeelineBridge WebSocket/CDP browser automation architecture with an in-process Playwright-based system. Refactors all GCU browser tools to accept profile identifiers for context isolation, removes advanced parameters and telemetry instrumentation, and updates documentation to describe deterministic headless-Chrome workflows with structured action execution and screenshot capture for vision integration.

Changes

Cohort / File(s) Summary
Documentation
core/framework/agents/queen/reference/gcu_guide.md
Updated GCU guide selection criteria to include vision scenarios, narrowed tool coverage from "all 31 browser tools" to "all browser automation tools," and replaced manual tab isolation guidance with automatic isolated browser context per subagent. Added browser_screenshot recommendation for vision workflows.
Core Bridge Refactor
tools/src/gcu/browser/bridge.py
Replaced WebSocket/CDP extension bridge with in-process Playwright Chromium instance. Added BrowserConfig for headless/viewport/user-agent configuration, refactored to manage per-profile BrowserContexts and Page objects, converted all operations from CDP protocol calls to Playwright method calls (navigation, interactions, evaluation, screenshot capture), and removed module-level singleton lifecycle (init_bridge, get_bridge, start, stop).
Advanced Tools Simplification
tools/src/gcu/browser/tools/advanced.py
Removed 6 MCP-registered tools (browser_wait, browser_get_text, browser_get_attribute, browser_resize, browser_upload, browser_dialog). Retained only browser_evaluate(profile, script) which now derives tab via bridge.get_current_tab(profile) and simplified error handling.
Inspection Tools Reduction
tools/src/gcu/browser/tools/inspection.py
Removed screenshot normalization/annotation (Pillow overlays), shadow DOM queries, accessibility snapshots, HTML/console introspection, and coordinate conversion logic. Kept minimal trio: browser_get_text, browser_screenshot, and browser_evaluate, each accepting profile parameter and deriving active tab from bridge state.
Interaction Tools Refactoring
tools/src/gcu/browser/tools/interactions.py
Consolidated 9+ parameterized interaction tools into 5 simplified async functions: browser_click(profile, selector), browser_type(profile, selector, text), browser_press_key(profile, key), browser_scroll(profile, direction, amount), browser_select_option(profile, selector, values). Removed timing, telemetry, and advanced parameters (button, delay_ms, timeout_ms).
Lifecycle Tools Simplification
tools/src/gcu/browser/tools/lifecycle.py
Removed global context management (_contexts), eliminated browser_setup() and browser_status() tools, simplified remaining browser_start(profile) and browser_stop(profile) to delegate directly to bridge.create_context() and bridge.destroy_context() with minimal error handling.
Navigation Tools Streamlining
tools/src/gcu/browser/tools/navigation.py
Renamed browser_navigate to browser_open(profile, url, wait_until), refactored browser_go_back, browser_go_forward, browser_reload to accept only profile parameter, removed tab lookup/connectivity checks and telemetry, simplified error handling.
Tab Tools Consolidation
tools/src/gcu/browser/tools/tabs.py
Removed 4 MCP-registered tab tools (browser_open, browser_focus, browser_close_all, browser_close_finished). Introduced 2 new functions: browser_tabs(profile) returns tab list, browser_close_tab(tab_id, profile) closes a specific tab; removed context/telemetry plumbing.

Sequence Diagram

sequenceDiagram
    participant Agent as GCU Agent
    participant Tool as Browser Tool<br/>(e.g., browser_click)
    participant Bridge as BeelineBridge<br/>(in-process)
    participant PW as Playwright API
    participant Browser as Chromium Browser

    Agent->>Tool: call browser_click(profile="work", selector=".btn")
    Tool->>Bridge: get_current_tab(profile)
    Bridge-->>Tool: tab_id=42
    Tool->>Bridge: click(tab_id=42, selector=".btn")
    Bridge->>PW: page.click(selector)
    PW->>Browser: send CDP click action
    Browser-->>PW: action completed
    PW-->>Bridge: success
    Bridge-->>Tool: {"ok": True}
    Tool-->>Agent: {"ok": True}

    Agent->>Tool: call browser_screenshot(profile="work")
    Tool->>Bridge: get_current_tab(profile)
    Bridge-->>Tool: tab_id=42
    Tool->>Bridge: screenshot(tab_id=42, full_page=False)
    Bridge->>PW: page.screenshot()
    PW->>Browser: capture screen
    Browser-->>PW: base64 image data
    PW-->>Bridge: base64 string
    Bridge-->>Tool: {"screenshot": "<base64>"}
    Tool-->>Agent: {"screenshot": "<base64>"}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Poem

🐰 Hop, hop! The bridge takes a leap,
From sockets dancing deep in the CDP weep,
Now Playwright springs in-process and fleet,
Profile-based contexts, a deterministic beat! 🎯
Browser automation—simple, clean, complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feature/browser automation vision 6915' is a branch name rather than a descriptive summary; it lacks clarity about the actual implementation (Playwright-based headless Chrome with MCP tools). Consider renaming to a more descriptive title like 'Add Playwright-based browser automation with MCP tools for headless Chrome control' to clearly communicate the main change.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The implementation comprehensively addresses all stated objectives: Playwright-based headless Chrome (BeelineBridge), isolated contexts per profile, MCP tools for lifecycle/navigation/interactions/inspection, screenshot capture, structured action execution, and test coverage.
Out of Scope Changes check ✅ Passed All changes directly support browser automation: GCU guide documentation updates, BeelineBridge implementation, MCP tool modules, and removal of obsolete CDP-based infrastructure are all in scope.
Docstring Coverage ✅ Passed Docstring coverage is 96.61% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/framework/agents/queen/reference/gcu_guide.md (1)

54-58: ⚠️ Potential issue | 🟡 Minor

Update the examples to the new browser_open contract.

tools/src/gcu/browser/tools/navigation.py Lines 14-36 now use browser_open to navigate the current tab and return the final URL; it no longer opens a new tab or returns targetId. These prompt examples still teach agents to capture and propagate target_id, which no longer exists.

Also applies to: 160-164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/framework/agents/queen/reference/gcu_guide.md` around lines 54 - 58,
Update the workflow and examples to match the new browser_open contract: remove
references to returning or propagating targetId/target_id and instead show that
browser_open(url=TARGET_URL) navigates the current tab and returns the final
URL; adjust the sequence (browser_start, browser_open, browser_snapshot,
[task-specific steps]) so examples capture and pass the returned URL value (not
a target id), and update the other example block mentioned (lines ~160-164) the
same way; look for references to browser_open, browser_start, browser_snapshot,
targetId, and target_id when making the edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/src/gcu/browser/bridge.py`:
- Around line 130-145: When removing contexts or tabs update the _current_tabs
mapping so it doesn't leave dead tab IDs: in destroy_context (and the analogous
close_tab handler) after deleting from _tabs and _contexts check if
_current_tabs.get(profile) references a removed tab and either set it to another
existing tab ID for that profile (pick any tab in _tabs whose page.context
matches the remaining context or same profile) or delete the _current_tabs entry
to clear focus; ensure you reference and update _tabs, _contexts, _current_tabs,
get_current_tab, destroy_context and close_tab so subsequent calls to
get_current_tab(profile) return a valid tab or None.
- Around line 98-109: When handling an existing profile in browser_start, don't
always allocate a new tabId; if pages exist and pages[0] is already referenced
in _current_tabs return that existing tab id instead of creating a fresh alias.
Modify the branch in browser_start (the block referencing self._contexts,
context.pages, self._next_tab_id, self._tabs, and context.new_page) to: check if
pages and pages[0] is present in self._current_tabs.values (or if you maintain a
profile->tab mapping, check self._current_tabs[profile]) and return that tab id;
otherwise create a new tab id, assign it into both self._tabs and
self._current_tabs (e.g. self._current_tabs[profile] = tab_id) before returning
{"groupId": id(context), "tabId": tab_id}.
- Around line 58-76: The connect() method is racy because multiple concurrent
callers can pass the initial is_connected check and concurrently start
Playwright/Chromium, so add a per-bridge asyncio.Lock (e.g., self._connect_lock
= asyncio.Lock()) and wrap the startup path in an async with self._connect_lock:
block that re-checks self.is_connected, then initializes self._playwright and
self._browser and sets self.is_connected=True only after successful launch;
ensure you reset/cleanup _playwright/_browser and leave is_connected False if
launch raises so subsequent callers can retry (use the existing _playwright and
_browser symbols and the is_connected flag).
- Around line 356-372: The drag() implementation incorrectly uses
page.drag_and_drop() with a synthesized CSS selector for absolute coordinates;
instead locate the source element via the provided selector (use
page.locator(selector) or page.query_selector and call bounding_box()), compute
the start coordinates (center of bounding box), then perform coordinate-based
mouse actions with page.mouse.move(start_x, start_y), page.mouse.down(),
page.mouse.move(target_x, target_y, steps=...), and page.mouse.up(); update the
error handling around PlaywrightError to remain consistent and return {"ok":
False, "error": str(e)} on failure.
- Around line 449-461: The current wait_for_text method interpolates the raw
text into the JS predicate, enabling injection and breaking on
quotes/backslashes; change the call in wait_for_text to pass the text as an
argument to page.wait_for_function instead of string interpolation (e.g., use a
predicate like "text => document.body.innerText.includes(text)" and pass
arg=text), keep the existing timeout handling, and ensure you reference the same
method name wait_for_text and variable page when making the replacement.

In `@tools/src/gcu/browser/tools/advanced.py`:
- Around line 38-40: register_advanced_tools currently re-registers
browser_evaluate and thus never exposes the wait primitives; change
register_advanced_tools to register a browser_wait tool that wraps
BeelineBridge.wait_for_selector and BeelineBridge.wait_for_text (or a single
handler that dispatches to those methods) instead of re-registering
browser_evaluate, and ensure the tool name matches the agents' expectation
("browser_wait"); update the registration in register_advanced_tools and the
exported handler function name so it doesn't conflict with the existing
browser_evaluate registration in inspection.py and so callers of browser_wait
will invoke the BeelineBridge wait_for_selector/wait_for_text behavior.

In `@tools/src/gcu/browser/tools/inspection.py`:
- Around line 86-90: register_inspection_tools currently registers
browser_get_text, browser_screenshot, and browser_evaluate but omits
browser_snapshot, which removes the structured DOM/accessibility snapshot from
the MCP API; update register_inspection_tools to also register browser_snapshot
(i.e., call mcp.tool()(browser_snapshot) alongside the other tools), ensuring
the browser_snapshot function referenced in tools/src/gcu/browser/bridge.py is
imported/available in the module so the MCP exposes the same snapshot primitive
used by the guide and snapshot() implementation.

In `@tools/src/gcu/browser/tools/tabs.py`:
- Around line 53-56: The MCP registration in register_tab_tools only exposes
browser_tabs and browser_close_tab, blocking multi-tab workflows; update
register_tab_tools to also register the tab-creation and tab-activation tools by
adding mcp.tool()(browser_create_tab) and mcp.tool()(browser_activate_tab) (or
whatever the existing tool wrappers are named) so the MCP surface matches
BeelineBridge.create_tab and BeelineBridge.activate_tab; then update
tools/src/gcu/browser/tools/navigation.py to call the newly-registered
browser_create_tab/browser_activate_tab tools (instead of only browser_open)
when the agent intends to open a new tab or switch tabs, ensuring the MCP API
supports creating and activating tabs.
- Around line 33-50: Validate tab ownership before calling bridge.close_tab: use
get_bridge() to retrieve the bridge and check its mapping/registry (e.g.,
bridge.tabs, bridge.get_tab_owner or similar) to assert that the provided tab_id
belongs to the supplied profile; if it does, call await bridge.close_tab(tab_id)
and return the result, otherwise return {"ok": False, "error": "tab does not
belong to profile"} (or raise an appropriate error). Update browser_close_tab to
perform this ownership check and avoid delegating directly to bridge.close_tab
without verifying profile ownership.

---

Outside diff comments:
In `@core/framework/agents/queen/reference/gcu_guide.md`:
- Around line 54-58: Update the workflow and examples to match the new
browser_open contract: remove references to returning or propagating
targetId/target_id and instead show that browser_open(url=TARGET_URL) navigates
the current tab and returns the final URL; adjust the sequence (browser_start,
browser_open, browser_snapshot, [task-specific steps]) so examples capture and
pass the returned URL value (not a target id), and update the other example
block mentioned (lines ~160-164) the same way; look for references to
browser_open, browser_start, browser_snapshot, targetId, and target_id when
making the edits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76438fd2-c647-4891-90df-0996ab6d1a97

📥 Commits

Reviewing files that changed from the base of the PR and between aaa5d66 and 0ed1037.

📒 Files selected for processing (8)
  • core/framework/agents/queen/reference/gcu_guide.md
  • tools/src/gcu/browser/bridge.py
  • tools/src/gcu/browser/tools/advanced.py
  • tools/src/gcu/browser/tools/inspection.py
  • tools/src/gcu/browser/tools/interactions.py
  • tools/src/gcu/browser/tools/lifecycle.py
  • tools/src/gcu/browser/tools/navigation.py
  • tools/src/gcu/browser/tools/tabs.py

Comment on lines +58 to +76
async def connect(self) -> None:
"""Initialize Playwright and browser."""
if self.is_connected:
return

try:
# Suppress noisy websockets logging for invalid upgrade attempts
# by providing a null logger
import logging

null_logger = logging.getLogger("websockets.null")
null_logger.setLevel(logging.CRITICAL)
null_logger.addHandler(logging.NullHandler())

self._server = await websockets.serve(
self._handle_connection,
"127.0.0.1",
port,
logger=null_logger,
max_size=50
* 1024
* 1024, # 50 MB — CDP responses (AX tree, screenshots) can be large
)
logger.info("Beeline bridge listening on ws://127.0.0.1:%d", port)
except OSError as e:
logger.warning("Beeline bridge could not start on port %d: %s", port, e)

# Start a tiny HTTP server on port+1 for status polling.
# websockets 16 rejects plain HTTP before process_request is called, so
# we need a separate server.
status_port = port + 1
try:
self._status_server = await asyncio.start_server(
self._http_status_handler,
"127.0.0.1",
status_port,
)
logger.info("Bridge status endpoint on http://127.0.0.1:%d/status", status_port)
except OSError as e:
logger.warning("Bridge status server could not start on port %d: %s", status_port, e)

async def stop(self) -> None:
if self._server:
self._server.close()
try:
await self._server.wait_closed()
except Exception:
pass
self._server = None
if self._status_server:
self._status_server.close()
try:
await self._status_server.wait_closed()
except Exception:
pass
self._status_server = None

async def _http_status_handler(
self, reader: asyncio.StreamReader, writer: asyncio.StreamWriter
) -> None:
"""Minimal asyncio TCP handler serving HTTP GET /status on the status port."""
try:
raw = await asyncio.wait_for(reader.read(512), timeout=2.0)
first_line = raw.split(b"\r\n", 1)[0].decode(errors="replace")
if first_line.startswith("GET /status"):
body = json.dumps({"connected": self.is_connected, "bridge": "running"}).encode()
response = (
b"HTTP/1.1 200 OK\r\n"
b"Content-Type: application/json\r\n"
b"Access-Control-Allow-Origin: *\r\n"
b"Access-Control-Allow-Headers: *\r\n"
+ b"Content-Length: "
+ str(len(body)).encode()
+ b"\r\n"
+ b"Connection: close\r\n"
b"\r\n" + body
)
elif first_line.startswith("OPTIONS "):
response = (
b"HTTP/1.1 204 No Content\r\n"
b"Access-Control-Allow-Origin: *\r\n"
b"Access-Control-Allow-Headers: *\r\n"
b"Content-Length: 0\r\n"
b"Connection: close\r\n"
b"\r\n"
)
self._playwright = await async_playwright().start()
self._browser = await self._playwright.chromium.launch(
headless=self.config.headless,
args=[
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-accelerated-2d-canvas",
"--no-first-run",
"--no-zygote",
"--disable-gpu",
],
)
self.is_connected = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Serialize connect() on the shared bridge.

This singleton is used by concurrent profiles, but connect() has no lock around the async startup path. Two first-use calls can both launch Playwright/Chromium before is_connected flips, leaking one browser instance and leaving _browser/_playwright cleanup nondeterministic.

Suggested fix
 class BeelineBridge:
     def __init__(self, config: BrowserConfig | None = None):
         self.config = config or BrowserConfig()
         self._playwright: Playwright | None = None
         self._browser: Browser | None = None
+        self._connect_lock = asyncio.Lock()
         self._contexts: dict[str, BrowserContext] = {}  # profile -> context
         self._tabs: dict[int, Page] = {}  # tab_id -> page
         self._current_tabs: dict[str, int] = {}  # profile -> current tab_id
         self._next_tab_id = 1000
         self._cdp_attached: set[int] = set()
         self.is_connected = False

     async def connect(self) -> None:
         """Initialize Playwright and browser."""
         if self.is_connected:
             return

-        self._playwright = await async_playwright().start()
-        self._browser = await self._playwright.chromium.launch(
-            headless=self.config.headless,
-            args=[
-                "--no-sandbox",
-                "--disable-setuid-sandbox",
-                "--disable-dev-shm-usage",
-                "--disable-accelerated-2d-canvas",
-                "--no-first-run",
-                "--no-zygote",
-                "--disable-gpu",
-            ],
-        )
-        self.is_connected = True
+        async with self._connect_lock:
+            if self.is_connected:
+                return
+
+            self._playwright = await async_playwright().start()
+            self._browser = await self._playwright.chromium.launch(
+                headless=self.config.headless,
+                args=[
+                    "--no-sandbox",
+                    "--disable-setuid-sandbox",
+                    "--disable-dev-shm-usage",
+                    "--disable-accelerated-2d-canvas",
+                    "--no-first-run",
+                    "--no-zygote",
+                    "--disable-gpu",
+                ],
+            )
+            self.is_connected = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/bridge.py` around lines 58 - 76, The connect() method
is racy because multiple concurrent callers can pass the initial is_connected
check and concurrently start Playwright/Chromium, so add a per-bridge
asyncio.Lock (e.g., self._connect_lock = asyncio.Lock()) and wrap the startup
path in an async with self._connect_lock: block that re-checks
self.is_connected, then initializes self._playwright and self._browser and sets
self.is_connected=True only after successful launch; ensure you reset/cleanup
_playwright/_browser and leave is_connected False if launch raises so subsequent
callers can retry (use the existing _playwright and _browser symbols and the
is_connected flag).

Comment on lines +98 to +109
if profile in self._contexts:
# Return existing context info
context = self._contexts[profile]
pages = context.pages
tab_id = self._next_tab_id
self._next_tab_id += 1
if pages:
self._tabs[tab_id] = pages[0]
else:
response = (
b"HTTP/1.1 404 Not Found\r\nContent-Length: 0\r\nConnection: close\r\n\r\n"
)
writer.write(response)
await writer.drain()
except Exception:
pass
finally:
writer.close()

async def _handle_connection(self, ws) -> None:
logger.info("Chrome extension connected")
log_connection_event("connect")
self._ws = ws
try:
async for raw in ws:
try:
msg = json.loads(raw)
except json.JSONDecodeError:
continue

if msg.get("type") == "hello":
logger.info("Extension hello: version=%s", msg.get("version"))
log_connection_event("hello", {"version": msg.get("version")})
continue

msg_id = msg.get("id")
if msg_id and msg_id in self._pending:
fut = self._pending.pop(msg_id)
if not fut.done():
if "error" in msg:
log_bridge_message(
"recv", "response", msg_id=msg_id, error=msg["error"]
)
fut.set_exception(RuntimeError(msg["error"]))
else:
log_bridge_message(
"recv", "response", msg_id=msg_id, result=msg.get("result")
)
fut.set_result(msg.get("result", {}))
except Exception:
pass
finally:
# Only clear self._ws if this handler still owns it.
if self._ws is ws:
logger.info("Chrome extension disconnected")
log_connection_event("disconnect")
self._ws = None
# Cancel any pending requests
for fut in self._pending.values():
if not fut.done():
fut.cancel()
self._pending.clear()

async def _send(self, type_: str, **params) -> dict:
"""Send a command to the extension and wait for the result."""
if not self._ws:
raise RuntimeError("Extension not connected")
self._counter += 1
msg_id = str(self._counter)
fut: asyncio.Future = asyncio.get_event_loop().create_future()
self._pending[msg_id] = fut
start = time.perf_counter()

log_bridge_message("send", type_, msg_id=msg_id, params=params)
page = await context.new_page()
self._tabs[tab_id] = page
return {"groupId": id(context), "tabId": tab_id}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return the existing tab id for an existing profile.

A second browser_start(profile) currently allocates a fresh tabId for pages[0] without updating _current_tabs. That creates multiple ids for the same Page, and closing one alias leaves the others pointing at a closed tab.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/bridge.py` around lines 98 - 109, When handling an
existing profile in browser_start, don't always allocate a new tabId; if pages
exist and pages[0] is already referenced in _current_tabs return that existing
tab id instead of creating a fresh alias. Modify the branch in browser_start
(the block referencing self._contexts, context.pages, self._next_tab_id,
self._tabs, and context.new_page) to: check if pages and pages[0] is present in
self._current_tabs.values (or if you maintain a profile->tab mapping, check
self._current_tabs[profile]) and return that tab id; otherwise create a new tab
id, assign it into both self._tabs and self._current_tabs (e.g.
self._current_tabs[profile] = tab_id) before returning {"groupId": id(context),
"tabId": tab_id}.

Comment on lines +130 to +145
async def destroy_context(self, profile: str) -> dict[str, Any]:
"""Destroy the browser context for the given profile."""
if profile not in self._contexts:
return {"ok": False, "error": "Context not found"}

async def destroy_context(self, group_id: int) -> dict:
"""Close all tabs in the group and remove it."""
result = await self._send("context.destroy", groupId=group_id)
log_context_event("destroy", _get_active_profile(), group_id=group_id, details=result)
return result
context = self._contexts[profile]
await context.close()
del self._contexts[profile]

# ── Tab Management ─────────────────────────────────────────────────────────
# Remove associated tabs
to_remove = [tab_id for tab_id, page in self._tabs.items() if page.context == context]
for tab_id in to_remove:
del self._tabs[tab_id]
self._cdp_attached.discard(tab_id)

async def create_tab(self, url: str = "about:blank", group_id: int | None = None) -> dict:
"""Create a new tab and optionally add it to a group.
return {"ok": True}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep _current_tabs consistent when tabs or contexts disappear.

All wrappers resolve actions through get_current_tab(profile), but neither destroy_context() nor close_tab() repairs that map. After browser_stop() or closing the active tab, the profile keeps a dead tab id; if other tabs still exist, the session also loses any way to recover because there is no focus tool on the MCP surface.

Suggested fix
 async def destroy_context(self, profile: str) -> dict[str, Any]:
     """Destroy the browser context for the given profile."""
     if profile not in self._contexts:
         return {"ok": False, "error": "Context not found"}

     context = self._contexts[profile]
     await context.close()
     del self._contexts[profile]
+    self._current_tabs.pop(profile, None)

     # Remove associated tabs
     to_remove = [tab_id for tab_id, page in self._tabs.items() if page.context == context]
     for tab_id in to_remove:
         del self._tabs[tab_id]
         self._cdp_attached.discard(tab_id)

     return {"ok": True}

 async def close_tab(self, tab_id: int) -> dict[str, Any]:
     """Close the specified tab."""
     if tab_id not in self._tabs:
         return {"ok": False, "error": "Tab not found"}

     page = self._tabs[tab_id]
+    context = page.context
     await page.close()
     del self._tabs[tab_id]
     self._cdp_attached.discard(tab_id)
+
+    for profile, profile_context in self._contexts.items():
+        if profile_context == context and self._current_tabs.get(profile) == tab_id:
+            replacement = next(
+                (
+                    other_tab_id
+                    for other_tab_id, other_page in self._tabs.items()
+                    if other_page.context == context and not other_page.is_closed()
+                ),
+                None,
+            )
+            if replacement is None:
+                self._current_tabs.pop(profile, None)
+            else:
+                self._current_tabs[profile] = replacement
+            break

     return {"ok": True}

Also applies to: 164-174

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/bridge.py` around lines 130 - 145, When removing
contexts or tabs update the _current_tabs mapping so it doesn't leave dead tab
IDs: in destroy_context (and the analogous close_tab handler) after deleting
from _tabs and _contexts check if _current_tabs.get(profile) references a
removed tab and either set it to another existing tab ID for that profile (pick
any tab in _tabs whose page.context matches the remaining context or same
profile) or delete the _current_tabs entry to clear focus; ensure you reference
and update _tabs, _contexts, _current_tabs, get_current_tab, destroy_context and
close_tab so subsequent calls to get_current_tab(profile) return a valid tab or
None.

Comment on lines +356 to +372
async def drag(
self, tab_id: int, selector: str, target_x: float, target_y: float
) -> dict[str, Any]:
"""Drag the element to the target coordinates."""
if tab_id not in self._tabs:
return {"ok": False, "error": "Tab not found"}

async def evaluate(self, tab_id: int, script: str) -> dict:
"""Execute JavaScript in the page."""
await self.cdp_attach(tab_id)
await self._try_enable_domain(tab_id, "Runtime")
page = self._tabs[tab_id]
try:
await page.drag_and_drop(
selector,
f"css=[style*='position: absolute; left: {target_x}px; top: {target_y}px;']",
timeout=self.config.timeout_ms,
)
return {"ok": True}
except PlaywrightError as e:
return {"ok": False, "error": str(e)}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Playwright Python Page.drag_and_drop API coordinates vs target selector

💡 Result:

The Playwright Python Page.drag_and_drop API supports both selector-based targeting and precise coordinate positioning relative to element corners. Basic usage with selectors (defaults to element centers): page.drag_and_drop("#source", "#target") With coordinates (added in v1.14): page.drag_and_drop( "#source", "#target", source_position={"x": 34, "y": 7}, target_position={"x": 10, "y": 20} ) source_position and target_position are optional Dict[str, float] specifying drop points relative to the top-left corner of each element's padding box. Without them, it drags from/to the center. Selectors find the first matching element if multiple exist. Other options include force (bypass actionability), timeout, etc. This high-level method handles mousedown on source, move to target, mouseup automatically, unlike manual page.mouse.move/down/up which may not trigger HTML5 drag events reliably.

Citations:


🌐 Web query:

Playwright Python page.mouse drag absolute coordinates target position

💡 Result:

To perform a mouse drag using absolute coordinates (viewport CSS pixels) in Playwright Python, use the page.mouse API with move, down, move to target, and up. There is no direct page.mouse.drag method; drag is composed from these primitives. Example code: python # Move to start position (absolute viewport coordinates) page.mouse.move(start_x, start_y) # Press mouse button to start drag page.mouse.down # Move to target position (absolute viewport coordinates) page.mouse.move(target_x, target_y, steps=10) # steps for smoother drag # Release mouse button to end drag page.mouse.up This simulates dragging from (start_x, start_y) to (target_x, target_y). Coordinates are relative to the top-left of the viewport in CSS pixels. Use element.bounding_box to get coordinates from elements, e.g.: python src_box = page.locator('#source').bounding_box page.mouse.move(src_box['x'] + src_box['width']/2, src_box['y'] + src_box['height']/2) page.mouse.down tgt_box = page.locator('#target').bounding_box page.mouse.move(tgt_box['x'] + tgt_box['width']/2, tgt_box['y'] + tgt_box['height']/2, steps=10) page.mouse.up For HTML5 drag-and-drop, this basic mouse simulation may not trigger drag events; use page.drag_and_drop or dispatch_event('dragstart'/'drop') instead. Note: page.drag_and_drop supports source_position/target_position as offsets relative to elements, not absolute viewport coords.

Citations:


🏁 Script executed:

cd tools/src/gcu/browser && head -20 bridge.py && tail -20 bridge.py && sed -n '1,50p' bridge.py | grep -E 'import|from'

Repository: aden-hive/hive

Length of output: 1359


🏁 Script executed:

# Check the suggested fix syntax - is bounding_box async?
rg -A 5 "bounding_box" --type py

Repository: aden-hive/hive

Length of output: 40


🏁 Script executed:

sed -n '356,372p' tools/src/gcu/browser/bridge.py

Repository: aden-hive/hive

Length of output: 718


🏁 Script executed:

# Check Playwright Page.locator().bounding_box() signature - is it async?
rg "bounding_box" --type py -B 2 -A 2 | head -50

Repository: aden-hive/hive

Length of output: 40


🏁 Script executed:

# Search for mouse operations in the file to see how they're used
rg "mouse\." tools/src/gcu/browser/bridge.py -B 2 -A 2

Repository: aden-hive/hive

Length of output: 214


drag() wires to the wrong Playwright API for coordinate-based dragging.

Playwright's page.drag_and_drop() expects source and target selectors (optionally with element-relative offsets via source_position/target_position), not absolute viewport coordinates. Synthesizing a CSS selector from target_x/target_y only works if the DOM contains an element with that exact inline style, which is unreliable and incorrect.

For absolute coordinate dragging, use manual mouse operations: page.mouse.move() to position, page.mouse.down()/up() to drag. Get source element coordinates via locator.bounding_box(), move to its center, then drag to the target coordinates.

Suggested fix
 async def drag(
     self, tab_id: int, selector: str, target_x: float, target_y: float
 ) -> dict[str, Any]:
     """Drag the element to the target coordinates."""
     if tab_id not in self._tabs:
         return {"ok": False, "error": "Tab not found"}

     page = self._tabs[tab_id]
     try:
-        await page.drag_and_drop(
-            selector,
-            f"css=[style*='position: absolute; left: {target_x}px; top: {target_y}px;']",
-            timeout=self.config.timeout_ms,
-        )
+        locator = page.locator(selector)
+        box = await locator.bounding_box()
+        if box is None:
+            return {"ok": False, "error": "Source element is not visible"}
+
+        await page.mouse.move(box["x"] + box["width"] / 2, box["y"] + box["height"] / 2)
+        await page.mouse.down()
+        await page.mouse.move(target_x, target_y)
+        await page.mouse.up()
         return {"ok": True}
     except PlaywrightError as e:
         return {"ok": False, "error": str(e)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/bridge.py` around lines 356 - 372, The drag()
implementation incorrectly uses page.drag_and_drop() with a synthesized CSS
selector for absolute coordinates; instead locate the source element via the
provided selector (use page.locator(selector) or page.query_selector and call
bounding_box()), compute the start coordinates (center of bounding box), then
perform coordinate-based mouse actions with page.mouse.move(start_x, start_y),
page.mouse.down(), page.mouse.move(target_x, target_y, steps=...), and
page.mouse.up(); update the error handling around PlaywrightError to remain
consistent and return {"ok": False, "error": str(e)} on failure.

Comment on lines +449 to 461
async def wait_for_text(
self, tab_id: int, text: str, timeout: int | None = None
) -> dict[str, Any]:
"""Wait for the specified text to appear."""
if tab_id not in self._tabs:
return {"ok": False, "error": "Tab not found"}

return {"ok": False, "error": f"Element not found: {selector}"}

async def get_attribute(
self, tab_id: int, selector: str, attribute: str, timeout_ms: int = 30000
) -> dict:
"""Get an attribute value of an element."""
await self.cdp_attach(tab_id)

script = f"""
(function() {{
const el = document.querySelector({json.dumps(selector)});
return el ? el.getAttribute({json.dumps(attribute)}) : null;
}})()
"""

deadline = asyncio.get_event_loop().time() + timeout_ms / 1000
while asyncio.get_event_loop().time() < deadline:
result = await self._cdp(
tab_id,
"Runtime.evaluate",
{"expression": script, "returnByValue": True},
page = self._tabs[tab_id]
try:
await page.wait_for_function(
f"document.body.innerText.includes('{text}')",
timeout=timeout or self.config.timeout_ms,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Playwright Python wait_for_function pass argument to predicate

💡 Result:

In Playwright Python, page.wait_for_function(expression, arg=None, ...) evaluates the JavaScript expression in the browser context repeatedly until it returns a truthy value. To pass arguments to the predicate function defined in expression, provide them via the arg parameter. The expression must be written to accept parameters matching the arg structure. Example (from official docs): selector = ".foo" page.wait_for_function("selector => !!document.querySelector(selector)", selector) Here, selector is passed as arg, and the JS function receives it as the first parameter. This works for primitives, arrays, objects, and JSHandles. Complex Python objects are serialized (JSON-compatible values). For sync API (from playwright.sync_api): page.wait_for_function(expression: str, arg: "EvaluationArgument" = None, timeout: float = None, polling: Literal["raf", "mutate", float] = None) -> JSHandle For async API, use await page.wait_for_function(...). Note: wait_for_function executes in the browser context, so arg must be serializable to pass from Python to JS. Use page.evaluate with arg for one-off evaluations, but wait_for_function polls the expression with the arg until truthy. Official Python docs confirm this pattern matches JS/Node.js API, with explicit examples for passing selectors and other values.

Citations:


🏁 Script executed:

cat -n tools/src/gcu/browser/bridge.py | sed -n '440,470p'

Repository: aden-hive/hive

Length of output: 1425


🏁 Script executed:

rg "wait_for_function" tools/src/gcu/browser/bridge.py -A 3

Repository: aden-hive/hive

Length of output: 235


🏁 Script executed:

head -50 tools/src/gcu/browser/bridge.py | grep -E "^import|^from"

Repository: aden-hive/hive

Length of output: 249


🌐 Web query:

Playwright Python page.wait_for_function arg parameter syntax documentation

💡 Result:

The syntax for the arg parameter in Playwright Python's page.wait_for_function method is page.wait_for_function(expression, arg=value), where expression is a string representing the JavaScript function or expression to evaluate (that receives arg as its first parameter), and arg is an optional serializable Python value (e.g., string, number, list, dict) passed to the expression. Example from analogous language docs and patterns (Python docs lack explicit example but match): selector = ".foo" page.wait_for_function("selector => !!document.querySelector(selector)", arg=selector) This waits until an element matching the selector exists. Official Playwright Python docs confirm the method exists (Added before v1.9) and returns when the expression returns truthy, resolving to JSHandle. The arg parameter enables passing data into the browser context JS evaluation, consistent across Playwright bindings (e.g., Ruby shows arg: nil). GitHub source indicates standard API structure with parameters like arg. For full signature: page.wait_for_function(expression, arg=None, timeout=None, polling=None) (inferred from patterns and impl). Use serializable args only; complex objects use page.evaluate instead.

Citations:


Pass text as a wait_for_function() argument to prevent injection vulnerabilities.

Interpolating raw text directly into the JavaScript expression breaks on quotes and backslashes, enabling unintended execution for inputs like Bob's. Playwright's wait_for_function() API accepts arguments separately via the arg parameter, allowing safe data passing to the predicate function.

Suggested fix
         page = self._tabs[tab_id]
         try:
             await page.wait_for_function(
-                f"document.body.innerText.includes('{text}')",
-                timeout=timeout or self.config.timeout_ms,
+                "(text) => document.body?.innerText.includes(text)",
+                text,
+                timeout=timeout or self.config.timeout_ms,
             )
             return {"ok": True}
         except PlaywrightError as e:
             return {"ok": False, "error": str(e)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/bridge.py` around lines 449 - 461, The current
wait_for_text method interpolates the raw text into the JS predicate, enabling
injection and breaking on quotes/backslashes; change the call in wait_for_text
to pass the text as an argument to page.wait_for_function instead of string
interpolation (e.g., use a predicate like "text =>
document.body.innerText.includes(text)" and pass arg=text), keep the existing
timeout handling, and ensure you reference the same method name wait_for_text
and variable page when making the replacement.

Comment on lines +38 to +40
def register_advanced_tools(mcp: FastMCP) -> None:
"""Register advanced browser tools with the MCP server."""
mcp.tool()(browser_evaluate)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The advanced registration no longer exposes any wait primitive.

Right now this module spends its only registration on another browser_evaluate, even though tools/src/gcu/browser/tools/inspection.py Lines 86-90 already register the same tool name and tools/src/gcu/browser/__init__.py calls both registration functions. That leaves no advanced MCP wrapper for BeelineBridge.wait_for_selector() / wait_for_text(), even though core/framework/agents/queen/reference/gcu_guide.md Line 111 still tells agents to call browser_wait after navigation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/tools/advanced.py` around lines 38 - 40,
register_advanced_tools currently re-registers browser_evaluate and thus never
exposes the wait primitives; change register_advanced_tools to register a
browser_wait tool that wraps BeelineBridge.wait_for_selector and
BeelineBridge.wait_for_text (or a single handler that dispatches to those
methods) instead of re-registering browser_evaluate, and ensure the tool name
matches the agents' expectation ("browser_wait"); update the registration in
register_advanced_tools and the exported handler function name so it doesn't
conflict with the existing browser_evaluate registration in inspection.py and so
callers of browser_wait will invoke the BeelineBridge
wait_for_selector/wait_for_text behavior.

Comment on lines +86 to +90
def register_inspection_tools(mcp: FastMCP) -> None:
"""Register browser inspection tools with the MCP server."""
mcp.tool()(browser_get_text)
mcp.tool()(browser_screenshot)
mcp.tool()(browser_evaluate)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

browser_snapshot is missing from the inspection API.

The guide still uses browser_snapshot as the primary inspection primitive in core/framework/agents/queen/reference/gcu_guide.md Lines 57 and 109, and tools/src/gcu/browser/bridge.py Lines 386-396 still implement snapshot(). This registration change removes that structured DOM/accessibility view from MCP entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/tools/inspection.py` around lines 86 - 90,
register_inspection_tools currently registers browser_get_text,
browser_screenshot, and browser_evaluate but omits browser_snapshot, which
removes the structured DOM/accessibility snapshot from the MCP API; update
register_inspection_tools to also register browser_snapshot (i.e., call
mcp.tool()(browser_snapshot) alongside the other tools), ensuring the
browser_snapshot function referenced in tools/src/gcu/browser/bridge.py is
imported/available in the module so the MCP exposes the same snapshot primitive
used by the guide and snapshot() implementation.

Comment on lines +33 to +50
async def browser_close_tab(tab_id: int, profile: str) -> dict[str, Any]:
"""Close the specified tab in the browser session.

bridge = get_bridge()
if not bridge or not bridge.is_connected:
result = {"ok": False, "error": "Browser extension not connected"}
log_tool_call("browser_close", params, result=result)
return result
Args:
tab_id: ID of the tab to close
profile: Unique identifier for the agent/profile

ctx = _get_context(profile)
if not ctx:
result = {"ok": False, "error": "Browser not started. Call browser_start first."}
log_tool_call("browser_close", params, result=result)
return result

# Use active tab if not specified
target_tab = tab_id or ctx.get("activeTabId")
if target_tab is None:
result = {"ok": False, "error": "No tab to close"}
log_tool_call("browser_close", params, result=result)
return result

try:
await bridge.close_tab(target_tab)

# Update active tab if we closed it
if ctx.get("activeTabId") == target_tab:
result = await bridge.list_tabs(ctx.get("groupId"))
tabs = result.get("tabs", [])
ctx["activeTabId"] = tabs[0].get("id") if tabs else None

result = {"ok": True, "closed": target_tab}
log_tool_call(
"browser_close",
params,
result=result,
duration_ms=(time.perf_counter() - start) * 1000,
)
return result
except Exception as e:
result = {"ok": False, "error": str(e)}
log_tool_call(
"browser_close", params, error=e, duration_ms=(time.perf_counter() - start) * 1000
)
return result

@mcp.tool()
async def browser_focus(tab_id: int, profile: str | None = None) -> dict:
"""
Focus a browser tab.

Args:
tab_id: Chrome tab ID to focus
profile: Browser profile name (default: "default")

Returns:
Dict with focus status
"""
start = time.perf_counter()
params = {"tab_id": tab_id, "profile": profile}

bridge = get_bridge()
if not bridge or not bridge.is_connected:
result = {"ok": False, "error": "Browser extension not connected"}
log_tool_call("browser_focus", params, result=result)
return result

ctx = _get_context(profile)
if not ctx:
result = {"ok": False, "error": "Browser not started. Call browser_start first."}
log_tool_call("browser_focus", params, result=result)
return result

try:
await bridge.activate_tab(tab_id)
ctx["activeTabId"] = tab_id
result = {"ok": True, "tabId": tab_id}
log_tool_call(
"browser_focus",
params,
result=result,
duration_ms=(time.perf_counter() - start) * 1000,
)
return result
except Exception as e:
result = {"ok": False, "error": str(e)}
log_tool_call(
"browser_focus", params, error=e, duration_ms=(time.perf_counter() - start) * 1000
)
return result

@mcp.tool()
async def browser_close_all(
keep_active: bool = True,
profile: str | None = None,
) -> dict:
"""
Close all browser tabs in the agent's group, optionally keeping active.

Args:
keep_active: If True (default), keep the active tab open.
If False, close ALL tabs (group remains but empty).
profile: Browser profile name (default: "default")

Returns:
Dict with number of closed tabs and remaining count
"""
start = time.perf_counter()
params = {"keep_active": keep_active, "profile": profile}

bridge = get_bridge()
if not bridge or not bridge.is_connected:
result = {"ok": False, "error": "Browser extension not connected"}
log_tool_call("browser_close_all", params, result=result)
return result

ctx = _get_context(profile)
if not ctx:
result = {"ok": False, "error": "Browser not started. Call browser_start first."}
log_tool_call("browser_close_all", params, result=result)
return result

try:
result = await bridge.list_tabs(ctx.get("groupId"))
tabs = result.get("tabs", [])
active_tab_id = ctx.get("activeTabId")

closed = 0
for tab in tabs:
tid = tab.get("id")
if keep_active and tid == active_tab_id:
continue
try:
await bridge.close_tab(tid)
closed += 1
except Exception:
pass

# Update active tab
if not keep_active:
ctx["activeTabId"] = None
else:
result = await bridge.list_tabs(ctx.get("groupId"))
remaining = result.get("tabs", [])
ctx["activeTabId"] = remaining[0].get("id") if remaining else None

result = {
"ok": True,
"closed_count": closed,
"remaining": len(tabs) - closed,
}
log_tool_call(
"browser_close_all",
params,
result=result,
duration_ms=(time.perf_counter() - start) * 1000,
)
return result
except Exception as e:
result = {"ok": False, "error": str(e)}
log_tool_call(
"browser_close_all",
params,
error=e,
duration_ms=(time.perf_counter() - start) * 1000,
)
return result

@mcp.tool()
async def browser_close_finished(
keep_active: bool = True,
profile: str | None = None,
) -> dict:
"""
Close all tabs except the active one.
Returns:
Dict with ok status
"""
from .lifecycle import get_bridge

This is a convenience wrapper around browser_close_all.
bridge = get_bridge()
try:
result = await bridge.close_tab(tab_id)
return result
except Exception as e:
return {"ok": False, "error": str(e)}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate that the tab belongs to profile before closing it.

tab_id is global across the singleton bridge, but this wrapper ignores profile and delegates straight to close_tab(tab_id). That lets one profile close another profile's tab if it ever learns the id, which breaks the isolation guarantee this PR is adding.

Suggested fix
# tools/src/gcu/browser/tools/tabs.py
-        result = await bridge.close_tab(tab_id)
+        result = await bridge.close_tab(tab_id, profile)

# tools/src/gcu/browser/bridge.py
-    async def close_tab(self, tab_id: int) -> dict[str, Any]:
+    async def close_tab(self, tab_id: int, profile: str) -> dict[str, Any]:
         if tab_id not in self._tabs:
             return {"ok": False, "error": "Tab not found"}

         page = self._tabs[tab_id]
+        context = self._contexts.get(profile)
+        if context is None or page.context != context:
+            return {"ok": False, "error": "Tab does not belong to profile"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/tools/tabs.py` around lines 33 - 50, Validate tab
ownership before calling bridge.close_tab: use get_bridge() to retrieve the
bridge and check its mapping/registry (e.g., bridge.tabs, bridge.get_tab_owner
or similar) to assert that the provided tab_id belongs to the supplied profile;
if it does, call await bridge.close_tab(tab_id) and return the result, otherwise
return {"ok": False, "error": "tab does not belong to profile"} (or raise an
appropriate error). Update browser_close_tab to perform this ownership check and
avoid delegating directly to bridge.close_tab without verifying profile
ownership.

Comment on lines +53 to +56
def register_tab_tools(mcp: FastMCP) -> None:
"""Register browser tab management tools with the MCP server."""
mcp.tool()(browser_tabs)
mcp.tool()(browser_close_tab)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The registered tab tools no longer support multi-tab workflows.

BeelineBridge still exposes create_tab() and activate_tab(), but the MCP surface only registers list and close. tools/src/gcu/browser/tools/navigation.py Lines 14-36 now use browser_open to navigate the current tab instead of creating one, so agents cannot open a second tab or switch the active tab at all.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/src/gcu/browser/tools/tabs.py` around lines 53 - 56, The MCP
registration in register_tab_tools only exposes browser_tabs and
browser_close_tab, blocking multi-tab workflows; update register_tab_tools to
also register the tab-creation and tab-activation tools by adding
mcp.tool()(browser_create_tab) and mcp.tool()(browser_activate_tab) (or whatever
the existing tool wrappers are named) so the MCP surface matches
BeelineBridge.create_tab and BeelineBridge.activate_tab; then update
tools/src/gcu/browser/tools/navigation.py to call the newly-registered
browser_create_tab/browser_activate_tab tools (instead of only browser_open)
when the agent intends to open a new tab or switch tabs, ensuring the MCP API
supports creating and activating tabs.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Closing PR because the contribution requirements were not resolved within the 24-hour grace period.
If this was closed in error, feel free to reopen the PR after fixing the requirements.

@github-actions github-actions bot closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-requirements-warning PR doesn't follow contribution guidelines. Please fix or it will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Browser automation for using Vision language models

1 participant