feat: connect to remote browser services#3545
Conversation
… support
# Task 1: Type Definitions & LaunchContext `isRemote` Flag
## Goal
Add the foundational types and the `isRemote` flag that all other remote browser tasks depend on.
## Dependencies
None — this is the foundation task.
## Scope
### 1. Add `isRemote` to `LaunchContext`
**File:** `packages/browser-pool/src/launch-context.ts`
- Add `isRemote?: boolean` to the `LaunchContextOptions` interface (alongside `id`, `browserPlugin`, etc.)
- Add a public readonly `isRemote: boolean` property to the `LaunchContext` class
- Set it from constructor options, defaulting to `false`
### 2. Define connect option types on PlaywrightPlugin
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
Add the following type to the plugin file (or a co-located types file):
```typescript
// Mirrors browserType.connectOverCDP(endpointURL, options)
interface PlaywrightConnectOverCDPOptions {
endpointURL: string;
options?: Parameters<BrowserType['connectOverCDP']>[1];
}
// Mirrors browserType.connect(wsEndpoint, options)
interface PlaywrightConnectOptions {
wsEndpoint: string;
options?: Parameters<BrowserType['connect']>[1];
}
```
Use the existing `Parameters` utility type pattern (see how `SafeParameters` is used elsewhere in the codebase) — do NOT redefine Playwright's types manually.
### 3. Define connect option types on PuppeteerPlugin
**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`
```typescript
// Mirrors puppeteer.connect({ browserWSEndpoint, ...rest })
// Flat object matching Puppeteer's ConnectOptions
type PuppeteerConnectOverCDPOptions = Parameters<typeof puppeteer.connect>[0];
```
Use the `Parameters` pattern to extract the type from Puppeteer's `connect` method.
### 4. Add connect option fields to `BrowserPluginOptions`
**File:** `packages/browser-pool/src/abstract-classes/browser-plugin.ts`
This is a design choice — the PRD says connect options live on the plugin subclass, not on `LaunchContext`. Add the fields to the plugin options type so they flow through the constructor:
- `PlaywrightPlugin` options should accept `connectOptions?` and `connectOverCDPOptions?`
- `PuppeteerPlugin` options should accept `connectOverCDPOptions?`
These can be added to subclass-specific option types rather than the base `BrowserPluginOptions`.
### 5. Add connect option fields to launcher-level interfaces
**File:** `packages/playwright-crawler/src/internals/playwright-launcher.ts`
Add to `PlaywrightLaunchContext`:
```typescript
connectOptions?: PlaywrightConnectOptions;
connectOverCDPOptions?: PlaywrightConnectOverCDPOptions;
```
**File:** `packages/puppeteer-crawler/src/internals/puppeteer-launcher.ts`
Add to `PuppeteerLaunchContext`:
```typescript
connectOverCDPOptions?: PuppeteerConnectOverCDPOptions;
```
This enables IDE autocomplete when users configure `launchContext` on the crawler.
### 6. Export new types
**File:** `packages/browser-pool/src/index.ts`
Export the new connect option types so they're available to consumers.
## Key Files
| File | Change |
|------|--------|
| `packages/browser-pool/src/launch-context.ts` | Add `isRemote` option + property |
| `packages/browser-pool/src/playwright/playwright-plugin.ts` | Add connect option types |
| `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts` | Add connect option type |
| `packages/playwright-crawler/src/internals/playwright-launcher.ts` | Add connect options to `PlaywrightLaunchContext` |
| `packages/puppeteer-crawler/src/internals/puppeteer-launcher.ts` | Add connect options to `PuppeteerLaunchContext` |
| `packages/browser-pool/src/index.ts` | Export new types |
| `packages/browser-crawler/src/internals/browser-launcher.ts` | May need connect options on `BrowserLaunchContext` base |
## Acceptance Criteria
- [x] `LaunchContext` has `isRemote` boolean property, defaults to `false`
- [x] Connect option types are defined using library `Parameters` extraction (not manual redefinition)
- [x] `PlaywrightLaunchContext` shows `connectOptions` and `connectOverCDPOptions` in IDE autocomplete
- [x] `PuppeteerLaunchContext` shows `connectOverCDPOptions` in IDE autocomplete
- [x] New types are exported from `@crawlee/browser-pool`
- [x] TypeScript compiles with no errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and `connectOverCDP()`
# Task 2: PlaywrightPlugin Remote Connection Routing
## Goal
Make `PlaywrightPlugin._launch()` branch to `connect()` or `connectOverCDP()` when remote connection options are present, instead of calling `launch()`.
## Dependencies
- Task 1 (types and `isRemote` flag)
## Scope
### 1. Store connect options on the plugin instance
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
- Accept `connectOptions` and `connectOverCDPOptions` in the constructor options
- Store them as instance properties
- **Validation:** If both `connectOptions` AND `connectOverCDPOptions` are provided, throw an error immediately in the constructor:
```
Cannot set both 'connectOptions' and 'connectOverCDPOptions' — pick one protocol.
```
### 2. Branch in `_launch()` for remote connections
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
In the existing `_launch()` method (currently lines 22-102), add branching logic **before** the existing local launch code:
```typescript
protected async _launch(launchContext: LaunchContext<...>): Promise<Browser> {
// Remote CDP connection
if (this.connectOverCDPOptions) {
const { endpointURL, options } = this.connectOverCDPOptions;
const browser = await browserType.connectOverCDP(endpointURL, options);
return browser;
}
// Remote Playwright WebSocket connection
if (this.connectOptions) {
const { wsEndpoint, options } = this.connectOptions;
const browser = await browserType.connect(wsEndpoint, options);
return browser;
}
// Existing local launch logic...
}
```
**Reference:** See `StagehandPlugin._launch()` at `packages/stagehand-crawler/src/internals/stagehand-plugin.ts:102-107` for the CDP connection pattern:
```typescript
const cdpUrl = await stagehand.connectURL();
const browser = await chromium.connectOverCDP(cdpUrl);
```
### 3. Set `isRemote` on LaunchContext
**File:** `packages/browser-pool/src/playwright/playwright-plugin.ts`
In `createLaunchContext()` (or wherever the plugin creates the LaunchContext), pass `isRemote: true` when connect options are present. This can be done by overriding `createLaunchContext()` in the subclass, or by passing it through the options.
Check how the base `BrowserPlugin.createLaunchContext()` works (at `packages/browser-pool/src/abstract-classes/browser-plugin.ts:149-174`) and determine the best insertion point.
## Key Design Decisions
- **No new abstract method:** The routing happens inside `_launch()` via internal branching, not a new `_connect()` method. This keeps the abstract interface unchanged and doesn't affect custom plugins like StagehandPlugin.
- **`browser.close()` for cleanup:** Remote browsers are closed the same way as local browsers — via `browser.close()`. No special disconnect handling.
- **No proxy server setup for remote:** The remote branch skips the local proxy server setup that exists in the current `_launch()` code.
## Key Files
| File | Change |
|------|--------|
| `packages/browser-pool/src/playwright/playwright-plugin.ts` | Constructor stores options, `_launch()` branches for remote |
## Acceptance Criteria
- [x] `PlaywrightPlugin` accepts `connectOptions` in constructor and calls `browserType.connect()` with `wsEndpoint` and `options`
- [x] `PlaywrightPlugin` accepts `connectOverCDPOptions` in constructor and calls `browserType.connectOverCDP()` with `endpointURL` and `options`
- [x] Setting both `connectOptions` and `connectOverCDPOptions` throws an error
- [x] `launchContext.isRemote` is `true` when connect options are present
- [x] Remote branch skips local proxy server setup and persistent context logic
- [x] TypeScript compiles with no errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nnect()`
# Task 3: PuppeteerPlugin Remote Connection Routing
## Goal
Make `PuppeteerPlugin._launch()` branch to `puppeteer.connect()` when remote connection options (CDP) are present, instead of calling `puppeteer.launch()`.
## Dependencies
- Task 1 (types and `isRemote` flag)
## Scope
### 1. Store connect options on the plugin instance
**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`
- Accept `connectOverCDPOptions` in the constructor options
- Store as an instance property
- Puppeteer only supports CDP — there is no `connectOptions` field (Playwright-only)
### 2. Branch in `_launch()` for remote connections
**File:** `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts`
In the existing `_launch()` method (currently lines 22-203), add branching logic **before** the existing local launch code:
```typescript
protected async _launch(launchContext: LaunchContext<...>): Promise<Browser> {
// Remote CDP connection
if (this.connectOverCDPOptions) {
const browser = await puppeteer.connect(this.connectOverCDPOptions);
// Wrap with the same Proxy handler for newPage() interception
// (see existing code at lines 138-200)
return wrappedBrowser;
}
// Existing local launch logic...
}
```
**Important:** Puppeteer's `connect()` takes a flat options object: `puppeteer.connect({ browserWSEndpoint, ...rest })`. This is different from Playwright's two-argument pattern. The type should match Puppeteer's `ConnectOptions`.
### 3. Handle the `newPage()` Proxy wrapper for remote
The existing `_launch()` wraps the browser in a `Proxy` that intercepts `newPage()` calls to support `useIncognitoPages` (lines 138-200). This proxy wrapper should also be applied to remote browsers so that incognito context creation works correctly.
### 4. Set `isRemote` on LaunchContext
Same pattern as Task 2 — pass `isRemote: true` when `connectOverCDPOptions` is present.
## Key Design Decisions
- **Flat options object:** Puppeteer's `connect()` API takes a single options object (not `endpointURL, options` like Playwright). The `connectOverCDPOptions` type matches this flat shape directly.
- **`browser.close()` for cleanup:** Same as Playwright — remote browsers closed via `browser.close()`, not `browser.disconnect()`.
- **`newPage()` proxy still needed:** The Proxy wrapper that intercepts `newPage()` to create incognito contexts must still wrap remote browsers.
## Key Files
| File | Change |
|------|--------|
| `packages/browser-pool/src/puppeteer/puppeteer-plugin.ts` | Constructor stores options, `_launch()` branches for remote |
## Acceptance Criteria
- [x] `PuppeteerPlugin` accepts `connectOverCDPOptions` in constructor and calls `puppeteer.connect()` with the options object
- [x] The `newPage()` Proxy wrapper is applied to remote browsers (for incognito support)
- [x] `launchContext.isRemote` is `true` when connect options are present
- [x] Remote branch skips user data directory setup, headless handling, and other local-only logic
- [x] TypeScript compiles with no errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nection logging
## Goal
Make `BrowserPlugin.launch()` skip proxy injection and webdriver hiding when `launchContext.isRemote` is `true`, since these operations modify `launchOptions` which are not used for remote connections.
## Dependencies
- Task 1 (`isRemote` flag on LaunchContext)
## Scope
### 1. Skip `_addProxyToLaunchOptions()` for remote
**File:** `packages/browser-pool/src/abstract-classes/browser-plugin.ts`
In the `launch()` method, the call to `_addProxyToLaunchOptions()` is now gated on `!isRemote`:
```typescript
if (launchContext.proxyUrl && !launchContext.isRemote) {
await this._addProxyToLaunchOptions(launchContext);
}
```
### 2. Skip `_mergeArgsToHideWebdriver()` for remote
```typescript
if (!launchContext.isRemote && this._isChromiumBasedBrowser(launchContext)) {
this._mergeArgsToHideWebdriver(launchContext);
}
```
### 3. No changes to `_addProxyToLaunchOptions()` or `_mergeArgsToHideWebdriver()` themselves
The methods remain unchanged — the skip logic lives in the calling `launch()` method.
## Key Design Decisions
- **Skip at call site, not in the methods**
- **`proxyUrl` + remote triggers a warning:** Handled in Task 6 (Warnings)
- **Fingerprinting hooks are unchanged**
## Additional
- Fixed `isRemote` not being passed through base class `createLaunchContext()`
- Added info-level logs for remote connections in base class and both plugins
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ht overloads Playwright: change PlaywrightConnectOverCDPOptions and PlaywrightConnectOptions from type aliases (all-optional fields) to interfaces with required `wsEndpoint`. Use the non-deprecated two-argument overloads in _launch(). Puppeteer: add runtime guard that throws if neither `browserWSEndpoint` nor `browserURL` is provided in connectOverCDPOptions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions # Task 5: `useIncognitoPages` Defaults to `true` for Remote ## Goal When remote connection options are present and `useIncognitoPages` was not explicitly set by the user, default it to `true` and log an info message. If the user explicitly sets `false`, log a warning. ## Dependencies - Task 2 (PlaywrightPlugin stores connect options) - Task 3 (PuppeteerPlugin stores connect options) ## Scope ### 1. Preserve `undefined` vs `false` in base constructor The base `BrowserPlugin` constructor currently collapses `useIncognitoPages` to `false`. The subclass checks `options.useIncognitoPages` directly (preserves `undefined`) and overrides after `super()`. ### 2. Override default in PlaywrightPlugin constructor After the `super()` call, if connect options are present: - `undefined` → set to `true`, info log - `false` → warning log - `true` → no extra log ### 3. Override default in PuppeteerPlugin constructor Same logic, checking `connectOverCDPOptions`. ## Key Design Decisions - **Info vs warning:** Defaulting to `true` is an info message (expected behavior). Explicit `false` is a warning (user should understand implications). - **`useIncognitoPages: false` + `connect()` is not special-cased:** The warning covers this case — no additional error or fallback. - **Uses existing `this.log`:** All logging uses the inherited `BrowserPlugin.log` logger. ## Acceptance Criteria - [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages` is not provided → defaults to `true`, info message logged - [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages: false` → stays `false`, warning logged - [x] When `connectOptions` or `connectOverCDPOptions` is set and `useIncognitoPages: true` → stays `true`, no extra log - [x] When no connect options are set → existing behavior unchanged - [x] Base constructor preserves `undefined` vs `false` distinction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename PlaywrightConnectOverCDPOptions.wsEndpoint → endpointURL to match Playwright's own terminology and avoid field conflict with inherited ConnectOverCDPOptions.endpointURL - Wrap connectOverCDP() and connect() failures with BrowserLaunchError including sanitized endpoint URL (credentials stripped) and actionable guidance - Move endpoint validation to constructors (fail fast) — Playwright validates endpointURL and wsEndpoint are non-empty, Puppeteer validates browserWSEndpoint || browserURL - Add _sanitizeEndpointForLog() to both plugins to strip credentials from URLs before including them in error messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions
- Close BrowserContext on page close when useIncognitoPages is true.
Previously contexts were only cleaned up when an anonymized proxy was
active, causing context accumulation on remote browsers without proxy.
- Clean up targetcreated listener on remote browser disconnect via
browser.once('disconnected') handler to prevent listener leaks.
- Guard anonymizeProxySugar call with proxyUrl check — skip the async
call entirely when no proxy is configured (common for remote browsers).
- Conditionally omit proxyServer from context options when no proxy is
set, instead of passing { proxyServer: undefined }.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ket connections - Add comments in both plugin constructors explaining why options.useIncognitoPages is checked instead of this.useIncognitoPages (super() collapses undefined to false, losing the "not set" signal). - Strengthen warning for Playwright connectOptions (WebSocket) + useIncognitoPages: false — connect() returns a browser with no default context, which is more severe than just sharing cookies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove spurious launchOptions warning that always fired due to framework-injected defaults, and share log instances in launchers. PRD Task 6: Warnings for Ignored & Conflicting Options - proxyUrl + remote → warning in base BrowserPlugin.launch() - useChrome + remote → warning in launcher constructors - executablePath + remote → warning in launcher constructors - useIncognitoPages: false + remote → handled by Task 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PRD Task 7: Unit Tests - Connection routing (Playwright CDP/WS/local, Puppeteer CDP/local) - Validation (mutual exclusion, missing endpoints) - isRemote correctness for all plugin variants - Proxy/webdriver skipping for remote, applied for local - useIncognitoPages defaults (true for remote, false for local) - Warnings (proxyUrl, useIncognitoPages: false, CDP vs WS variants) - 40 tests, all mocked (no real browser instances) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…texts When useIncognitoPages is true (default for remote) and proxyUrl is set, the newPage handler was passing proxyServer to createBrowserContext even for remote connections. For credentialed proxies this also spun up a localhost tunnel unreachable by the remote browser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Examples for Browserbase, Browserless, Rebrowser, and Steel using Playwright and Puppeteer.
…oteBrowser config Add a unified API for connecting crawlers to remote browser services (Browserbase, Browserless, Steel, Rebrowser). Users can either pass a RemoteBrowserConfig object or extend RemoteBrowserProvider with typed connect()/release() lifecycle methods. - Add RemoteBrowserProvider abstract class with generic TContext - Add RemoteBrowserConfig interface (endpoint + release + type) - Wire remoteBrowser through BrowserPlugin, PlaywrightPlugin, PuppeteerPlugin - Auto-call release() on browser close/crash/pool destroy - Skip fingerprinting, proxy injection, and webdriver stealth for remote browsers - Skip session-based browser retirement for remote browsers (isRemote guard) - Default useIncognitoPages to true for remote connections - Add 30+ unit tests for both config and provider patterns - Update all temp-examples to use RemoteBrowserProvider
… overflow Remote browser services enforce concurrent session limits. During browser retirement transitions, the pool could briefly exceed the limit by launching a new browser before the retired one fully closed. - Add maxOpenBrowsers to RemoteBrowserConfig and RemoteBrowserProvider - BrowserCrawler reads it from the plugin and applies it to the pool - Gate new tasks via _isTaskReadyFunction (same pattern as maxConcurrency) - Add hasFreeBrowserSlot() and hasActiveBrowserWithFreeCapacity() to BrowserPool - Only activates when maxOpenBrowsers is set (remote browsers); local browsers unaffected
| * } | ||
| * ``` | ||
| */ | ||
| export abstract class RemoteBrowserProvider<TContext extends Record<string, unknown> = Record<string, unknown>> { |
There was a problem hiding this comment.
Browserless is good example of minimal implementation
temp-examples/examples/browserless-playwright.ts
const endpointUrl = `wss://production-sfo.browserless.io?token=${token}`
class BrowserlessProvider extends RemoteBrowserProvider {
async connect() {
return { url: endpointUrl };
}
}There was a problem hiding this comment.
On the other hand browserbase need full implementation
temp-examples/examples/browserbase-playwright.ts
class BrowserbaseWsProvider extends RemoteBrowserProvider<{ id: string }> {
type = 'websocket' as const;
maxOpenBrowsers = 2;
async connect() {
const response = await fetch('https://api.browserbase.com/v1/sessions', {
method: 'POST',
headers: { 'x-bb-api-key': apiKey, 'Content-Type': 'application/json' },
body: JSON.stringify({ projectId }),
});
if (!response.ok) {
throw new Error(`Failed to create Browserbase session: ${response.status} ${response.statusText}`);
}
const session = await response.json();
const url = `wss://connect.browserbase.com?apiKey=${apiKey}&sessionId=${session.id}`;
return { url, context: { id: session.id } };
}
async release({ id }: { id: string }) {
await fetch(`https://api.browserbase.com/v1/sessions/${id}`, {
method: 'POST',
headers: { 'x-bb-api-key': apiKey, 'Content-Type': 'application/json' },
body: JSON.stringify({ status: 'REQUEST_RELEASE' }),
})
}
}There was a problem hiding this comment.
Should we have a separate package (@crawlee/remote or similar) with these implementations for major remote browser providers?
So our users can do
import { BrowserbaseBackend } from '@crawlee/remote';
import { PlaywrightCrawler } from '@crawlee/playwright';
new PlaywrightCrawler({
launchContext: {
remoteBrowser: new BrowserbaseBackend({ url: '', token: '', ...})
}
})Imo this could be really useful. Definitely not necessary to add in this PR, though, it looks pretty self-contained.
There was a problem hiding this comment.
Somewhere in process we dropped this idea, but I am open to create an issue. In general (at least for services I worked with) setup is similar service to service. Maybe also this class BrowserbaseWsProvider extends RemoteBrowserProvider is too much boilerplate for what is really does.
| * | ||
| * @param _context The same `context` object returned by {@link connect}. | ||
| */ | ||
| async release(_context: TContext): Promise<void> {} |
There was a problem hiding this comment.
I am aware that this is not the best name for it
… cookie sharing Remote CDP browsers (both Puppeteer and Playwright) now default useIncognitoPages to false, matching local behavior. For Playwright CDP, the browser's default context is wrapped in PlaywrightBrowserWithPersistentContext so pages share cookies — the same mechanism used locally with launchPersistentContext(). Playwright WebSocket still defaults to true since connect() returns a browser with no default context to wrap. The wrapper passes the real Browser as parentBrowser so close() also closes the WebSocket transport and disconnected events are forwarded to the pool.
| if (isWebSocket) { | ||
| this.useIncognitoPages = true; | ||
| this.log.info( | ||
| 'Remote Playwright WebSocket connection detected — defaulting useIncognitoPages to true.', | ||
| ); |
There was a problem hiding this comment.
traditional WS does not return context (as CDP does) so no way to pass this context to another browser instance
| * When useIncognitoPages is false and we have a CDP-connected browser, | ||
| * wrap its default context in PlaywrightBrowser so that all pages share | ||
| * a single context (matching local persistent-context behavior). | ||
| * | ||
| * Playwright's browser.newPage() always creates a new context, so without | ||
| * this wrapper, pages would never share cookies even with useIncognitoPages: false. | ||
| */ | ||
| private _maybeWrapWithSharedContext( |
There was a problem hiding this comment.
locally this is handled by launchPersistentContext but here we have to grab context from remote browser
The proxyUrl from Crawlee's ProxyConfiguration is now passed to
RemoteBrowserProvider.connect({ proxyUrl }) and RemoteBrowserConfig.endpoint({ proxyUrl }),
letting provider implementations forward it to the remote service's proxy API
(e.g. Browserless externalProxyServer, Browserbase external proxies).
Also adds a userDataDir warning for remote connections, matching the existing
proxyUrl warning pattern.
|
I guess it shouldn't be a problem, but can you wire up https://gologin.com/docs/api-reference/sdks/nodejs-sdk into crawlee with this? |
| const connectOptionsPresent = !!(launchContext.connectOptions || launchContext.connectOverCDPOptions); | ||
|
|
||
| if (connectOptionsPresent && (launchContext.useChrome || launchContext.launchOptions?.executablePath)) { |
There was a problem hiding this comment.
Btw, it seems that different launchOptions are ignored when using connectOptions and connectOverCDPOptions (and some are ignored in both).
Maybe I'm being too rough, but I'd just enforce having at most one of launchOptions | connectOptions | connectOverCDPOptions for now. We can always relax this if the need ever arises.
There was a problem hiding this comment.
true, I am afraid there will be more funny stuff after my last 6 rewrites of this :] Lets throw when launchOptions is used with remote and do mutually exclusive connectOptions | connectOverCDPOptions | remoteBrowser (also throw)
| /** | ||
| * Connection type to use. `'cdp'` uses `browserType.connectOverCDP()`, | ||
| * `'websocket'` uses `browserType.connect()`. | ||
| * @default 'cdp' | ||
| */ | ||
| type?: 'cdp' | 'websocket'; |
There was a problem hiding this comment.
Options for connecting to a remote browser via WebSocket.
Both the Playwright's internal "client-server" protocol (no public name that I know of) and CDP are based on WebSocket (both are higher-level protocols).
Perhaps we can make this into type?: 'cdp' | 'playwright'?
There was a problem hiding this comment.
I guess I will drop that websocket completely, it is specific to playwright (and obsolete ?). cc @janbuchar
There was a problem hiding this comment.
Au contraire - according to the docs, the Playwright-specific protocol offers more detailed control over the browser. Especially since this is inside playwright-launcher.ts, we should imo keep it.
There was a problem hiding this comment.
nvm, I see the commit message now
Callers who genuinely need Playwright's connect() can still use connectOptions directly.
sounds reasonable 👍
| * } | ||
| * ``` | ||
| */ | ||
| export abstract class RemoteBrowserProvider<TContext extends Record<string, unknown> = Record<string, unknown>> { |
There was a problem hiding this comment.
Should we have a separate package (@crawlee/remote or similar) with these implementations for major remote browser providers?
So our users can do
import { BrowserbaseBackend } from '@crawlee/remote';
import { PlaywrightCrawler } from '@crawlee/playwright';
new PlaywrightCrawler({
launchContext: {
remoteBrowser: new BrowserbaseBackend({ url: '', token: '', ...})
}
})Imo this could be really useful. Definitely not necessary to add in this PR, though, it looks pretty self-contained.
| * } | ||
| * ``` | ||
| */ | ||
| export interface RemoteBrowserConfig { |
There was a problem hiding this comment.
Is there something we can express with RemoteBrowserConfig, that we cannot do with the other interfaces?
The RemoteBrowserConfig interface is IIUC halfway between the connectOptions object (simple) and the RemoteBrowserProvider class (for advanced use cases)... and I'm not convinced we need all three.
edit: perhaps we can keep using the ...Config internally, but export only ...Provider?
| protected _sanitizeEndpointForLog(endpoint: string): string { | ||
| try { | ||
| const url = new URL(endpoint); | ||
| if (url.username || url.password) { | ||
| url.username = '***'; | ||
| url.password = '***'; | ||
| } | ||
| return url.toString(); | ||
| } catch { | ||
| return '<invalid URL>'; | ||
| } |
There was a problem hiding this comment.
Some providers (e.g., Browserless) authenticate using a token in the query params (docs), so we'd leak the secret anyway.
It feels like this should be the logger's responsibility, and we shouldn't deal with this here. How about, e.g., logging just the hostname in the error messages?
| this.log.warning( | ||
| 'Both remoteBrowser and connectOverCDPOptions/connectOptions are set. ' + | ||
| 'remoteBrowser is ignored when explicit connect options are provided.', | ||
| ); |
There was a problem hiding this comment.
So... which one is right? 😄
It seems that the comment there is wrong, and this error message is right, see example:
import { PlaywrightCrawler } from './packages/playwright-crawler/src/index.js';
import { chromium, firefox } from 'playwright';
const firefoxServer = await firefox.launchServer();
const crawler = new PlaywrightCrawler({
launchContext: {
remoteBrowser: {
connect: async () => {
const chromiumServer = await chromium.launchServer();
return { url: chromiumServer.wsEndpoint() };
},
type: 'websocket',
release: async () => {}
},
connectOptions: {
wsEndpoint: firefoxServer.wsEndpoint()
}
},
requestHandler: async ({ page, log }) => {
log.info(await page.evaluate(() => navigator.userAgent));
},
});
await crawler.run(['https://crawlee.dev/js']);This will print Firefox's user agent, i.e., remoteBrowser is dropped, connectOptions prevails
One more reason to enforce the mutual exclusivity on the type level 😄
There was a problem hiding this comment.
Please let's also add a docs article about how to set up the remote browser-enabled crawler (short page with examples, main limitations, etc. would imo suffice for now).
There was a problem hiding this comment.
done, also temp-examples dropped
Drop the _maybeWrapWithSharedContext workaround that emulated launchPersistentContext semantics over connectOverCDP. Remote connections now always run with useIncognitoPages: true; explicit false is overridden with a warning pointing users at SessionPool for cross-request state sharing. Also remove the now-unused parentBrowser plumbing from PlaywrightBrowser, which only existed to keep the underlying CDP Browser alive while the wrapper was active.
Adds a vitest-based integration suite under test/integration that exercises Crawlee end-to-end against a real Browserless instance. The first test verifies the force-incognito behavior for remote Playwright CDP connections: two requests landing on the same browser do not share cookies even when retireBrowserAfterPageCount is high and saveResponseCookies is disabled. Gated on CRAWLEE_DIFFICULT_TESTS so `pnpm test` skips the suite by default — `pnpm test:integration` and `pnpm test:full` set the flag. The suite expects Browserless and httpbin running on a shared Docker network; `pnpm test:integration:services:up` spins them up locally, and a new GitHub Actions workflow provides them as service containers. Also sets core-js-pure: false in pnpm-workspace.yaml allowBuilds to match prior skip-by-default behavior under pnpm 11.
Replace the warn-and-silently-drop path with a constructor throw in both PlaywrightPlugin and PuppeteerPlugin when more than one of remoteBrowser, connectOptions, or connectOverCDPOptions is set. Fixes the doc/impl mismatch where JSDoc claimed remoteBrowser "Takes precedence" but the implementation actually dropped it.
PlaywrightLauncher and PuppeteerLauncher now throw if launchOptions is set alongside connectOptions, connectOverCDPOptions, or remoteBrowser. The launcher is the right layer for this check — at the plugin level the launcher always injects defaults (executablePath) into launchOptions, so the plugin cannot distinguish user-set from framework-default. Removes the now-unreachable executablePath warning and consolidates the useChrome warning behind the unified hasRemote flag.
CDP is also a WebSocket protocol, so 'websocket' was a misleading label. Rename to 'playwright', which names the actual transport (Playwright's client-server protocol exposed via browserType.connect()). Updated: RemoteBrowserConfig.type, PlaywrightRemoteBrowserConfig.type, RemoteBrowserProvider.type, the playwright-plugin branch, the puppeteer "not supported" error message, the connect log line, and all tests.
The RemoteBrowserConfig / RemoteBrowserProvider abstraction is built for remote browser services (Browserless, Browserbase, Steel), which all speak CDP. The 'websocket'/'playwright' branch (browserType.connect()) had no real provider behind it, and naming it 'websocket' was misleading (CDP also rides WebSocket). Rather than commit to a name that BiDi will make obsolete anyway, drop the field entirely. Callers who genuinely need Playwright's connect() can still use connectOptions directly. Removes: - RemoteBrowserConfig.type and RemoteBrowserProvider.type - PlaywrightRemoteBrowserConfig and PuppeteerRemoteBrowserConfig (now-empty interface extensions) - The 'playwright' branch in PlaywrightPlugin._launch - The "Puppeteer does not support 'playwright'" throw + tests - 5 type-related test cases
The crawler-level 'headless' shortcut synthesized a launchContext.launchOptions object, which then tripped the launcher's mutual-exclusion check against remoteBrowser. Warn and skip the mutation instead — remote services control headless mode anyway. Mirrors the existing useChrome warning in the launcher.
Replace apify/workflows/pnpm-install@main with a direct pnpm install call without --loglevel error and without the pnpm store cache, to surface the actual error behind the 8-min silent hang on Node 24. Revert once root cause is identified.
This reverts commit ca529b6.

This is still WIP, but I will left comment on places which are worth to look at.