Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions pages/docs/configuration/tools/_meta.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
export default {
index: 'Intro',
azure_ai_search: 'Azure AI Search',
flux: 'Flux',
google_search: 'Google Search',
openweather: 'OpenWeather',
stable_diffusion: 'Stable Diffusion',
webnavigator: 'WebNavigator',
wolfram: 'Wolfram Alpha',
}
285 changes: 285 additions & 0 deletions pages/docs/configuration/tools/webnavigator.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
---
title: 🌐 WebNavigator
description: Configure the WebNavigator plugin for LibreChat
---

# WebNavigator Configuration

The WebNavigator plugin is a versatile tool for making HTTP requests and parsing web content. It provides comprehensive options for customizing requests and controlling how responses are processed, including support for Firecrawl integration for enhanced web scraping capabilities.

## Prerequisites

- No API key required for basic functionality
- Optional: Firecrawl API key for enhanced web scraping capabilities

## Getting Started

### Basic Usage

The WebNavigator tool works out of the box without any configuration. It can make HTTP requests to any public URL and parse the content.

### Enhanced Features (Optional)

For advanced web scraping capabilities, especially for JavaScript-heavy sites or pages with anti-bot protection, you can integrate with Firecrawl:

1. Sign up for a Firecrawl account at [Firecrawl](https://firecrawl.dev/)
2. Get your API key from the dashboard
3. Configure the environment variable (see below)

## Configuration

### Environment Variables

Add the following to your `.env` file for enhanced functionality:

```bash
# Optional: For enhanced web scraping with Firecrawl
FIRECRAWL_API_KEY=your_firecrawl_api_key_here

# Optional: Custom Firecrawl API URL (defaults to https://api.firecrawl.dev)
FIRECRAWL_API_URL=https://api.firecrawl.dev

# Optional: Disable Firecrawl even if API key is present
WEB_NAVIGATOR_USE_FIRECRAWL=false
```

### Plugin Configuration

Add the plugin to any [agent](https://www.librechat.ai/docs/features/agents)

## Usage

The WebNavigator plugin supports various HTTP methods and content parsing options:

### Basic Web Scraping

```
Get the main content from this news article: https://example.com/article
```

### API Requests

```
Make a POST request to https://api.example.com/data with this JSON: {"key": "value"}
```

### Targeted Content Extraction

```
Extract only the navigation links from https://example.com
```

### Image Download

```
Download and save this image: https://example.com/image.jpg
```

## Features

### HTTP Methods
- GET, POST, PUT, PATCH, DELETE requests
- Custom headers and query parameters
- Request body support for POST/PUT/PATCH
- Cookie handling for session management

### Content Parsing
- **returnOnlyTags**: Extract specific HTML tags (e.g., `["article", "main", "h1"]`)
- **excludeTags**: Remove unwanted elements (defaults include `style`, `script`, `svg`)
- **returnTextOnly**: Get clean text without HTML markup
- **includeAttributes**: Control which HTML attributes to keep (default: `["href", "alt"]`)

### Browser Simulation
- **browserImpersonation**: Simulate different browsers (`chrome`, `firefox`, `safari`, `plain`)
- Automatic user-agent and header management

### Caching
- Built-in 15-minute cache for successful requests
- **bypassCache**: Force fresh requests when needed
- Performance optimization with SQLite storage

### Image Handling
- **imageDownloadLink**: Download and save images locally
- Automatic file naming and organization
- Multiple image format support

### Firecrawl Integration
- **useFirecrawl**: Enable/disable Firecrawl for specific requests
- **firecrawlProxy**: Choose proxy mode (`basic`, `stealth`, `auto`)
- **firecrawlScrapeOptions**: Advanced Firecrawl configuration
- Automatic fallback for 403/404 errors

## Advanced Configuration

### Firecrawl Options

The `firecrawlScrapeOptions` parameter supports all Firecrawl `/v1/scrape` API options:

```javascript
{
"url": "https://example.com",
"useFirecrawl": true,
"firecrawlScrapeOptions": {
"formats": ["markdown", "html", "links"],
"onlyMainContent": true,
"waitFor": 2000,
"timeout": 30000,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"jsonOptions": {
"schema": {"title": "string", "content": "string"}
}
}
}
```

### Content Filtering Examples

**Extract main article content:**
```javascript
{
"url": "https://news-site.com/article",
"returnOnlyTags": ["article"]
}
```

**Get navigation links:**
```javascript
{
"url": "https://example.com",
"returnOnlyTags": ["header", "nav"]
}
```

**Clean text extraction:**
```javascript
{
"url": "https://example.com",
"returnTextOnly": true
}
```

## Use Cases

### Web Scraping
- Extract article content from news sites
- Gather product information from e-commerce sites
- Monitor website changes
- Collect contact information

### API Integration
- Interact with REST APIs
- Submit form data
- Retrieve JSON data
- Handle authentication

### Content Analysis
- Extract specific data from web pages
- Monitor competitor websites
- Collect research data
- Analyze website structure

## Troubleshooting

### Common Issues

1. **403 Forbidden or 404 Not Found**
- Tool automatically attempts Firecrawl fallback if configured
- Try enabling Firecrawl with `"useFirecrawl": true`
- Use stealth mode: `"firecrawlProxy": "stealth"`

2. **JavaScript-Heavy Sites**
- Enable Firecrawl integration for better JavaScript handling
- Use `"browserImpersonation": "chrome"` for better compatibility
- Add wait time with Firecrawl: `"waitFor": 3000`

3. **Rate Limiting**
- Tool includes built-in caching to reduce requests
- Respect robots.txt and site terms of service
- Use delays between requests for bulk operations

4. **Large Response Size**
- Use `returnOnlyTags` to limit content
- Enable `returnTextOnly` for cleaner output
- Exclude unnecessary tags with `excludeTags`

### Performance Tips

- Use caching for repeated requests to the same URLs
- Specify `returnOnlyTags` to reduce response size
- Enable `returnTextOnly` when HTML markup isn't needed
- Use browser impersonation for better compatibility

## API Limits

### Without Firecrawl
- No specific limits, but respect target site's rate limits
- Built-in caching helps reduce redundant requests

### With Firecrawl
- Check your [Firecrawl subscription](https://firecrawl.dev/pricing) for specific limits
- Monitor usage through the Firecrawl dashboard
- Consider implementing request queuing for high-volume usage

## Security Considerations

⚠️ **IMPORTANT SECURITY WARNING** ⚠️

The WebNavigator tool has **unrestricted web access** and can make HTTP requests to any URL, including internal network resources. This creates significant security risks that must be carefully considered before deployment.

### Key Security Risks

1. **Data Exfiltration**
- The tool can send confidential data to external servers
- It can access internal APIs and services within your network
- Chat conversations containing sensitive information could be transmitted to arbitrary URLs

2. **Server-Side Request Forgery (SSRF)**
- Can access internal services not intended for public access
- May bypass firewall restrictions by making requests from the server
- Could potentially access cloud metadata services (AWS, GCP, Azure)

3. **Network Reconnaissance**
- Can be used to scan internal network infrastructure
- May reveal information about internal services and their configurations
- Could identify vulnerable endpoints within your network

### The "Lethal Trifecta" Threat Model

This tool contributes to what security researcher Simon Willison describes as the ["Lethal Trifecta"](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) of AI security risks:

1. **Prompt Injection** - Malicious instructions embedded in content
2. **Arbitrary Code Execution** - Running untrusted code
3. **Unrestricted Web Access** - Making HTTP requests to any URL (this tool)

When combined, these capabilities can lead to severe security breaches.

### Best Practices for Safe Usage

- **Never use** in environments that where prompt injection is possible
- **Disable in agents** that have access to internal data
- **Educate users** about the risks of sharing sensitive information in chats

### Disabling the Tool

If the security risks are too high for your environment, you can disable the WebNavigator tool by:

1. Removing it from agent configurations
2. Filtering it out with the `filteredTools` [config in LibreChat.yaml](https://www.librechat.ai/docs/configuration/librechat_yaml/object_structure/config#filteredtools)

Remember: **Security should always be prioritized over convenience** when deploying tools with unrestricted network access.

## Support

For issues with the plugin:
- Open an issue at [LibreChat Issues](https://github.com/danny-avila/LibreChat/issues)
- Check the [LibreChat Documentation](https://www.librechat.ai/docs)
- For Firecrawl-specific issues, contact [Firecrawl support](https://firecrawl.dev/support)

## Notes

- The tool automatically handles redirects and common HTTP scenarios
- Caching is enabled by default for performance optimization
- Firecrawl integration provides enhanced capabilities for complex sites
- Always respect website terms of service and robots.txt files
- Consider implementing delays for bulk scraping operations
Loading