🕸️ Web Content Scraper Extension

A browser extension that extracts structured content from any webpage and exports it in multiple formats. Built for developers, researchers, and data collectors who need clean, structured page data without writing custom scraping scripts.

Supports Chrome and Firefox.

🎬 Demo

Screen.Recording.2026-03-29.at.11.09.34.AM.mov

✨ Features

Feature	Description
Page Extraction	Pulls titles, headings (h1–h6), paragraphs, lists, links, images, and metadata from any page
Multi-Format Export	Export as JSON, XML, Markdown, or plain text - download to file or copy to clipboard
Content Preview & Edit	Review and remove unwanted sections before exporting
Custom CSS Selectors	Define your own selectors to target specific elements on any site
Visual Element Picker	Click any element on a page and get its CSS selector generated automatically
Clean Content Mode	Strips ads, navbars, sidebars, and other noise before extracting
Site-Specific Profiles	Save selector rules per domain - applied automatically on revisit
Batch Scraping	Provide a list of URLs and scrape them all in one run with smart rule application
Data Sync	Rules and preferences sync across devices via `storage.sync`
Advanced Settings	Configure scraping preferences, timeouts, and batch processing options

🛠️ Tech Stack

Area	Tool
UI & bundling	Vite + React + TypeScript
Language	TypeScript
Styling	Tailwind CSS
Browser API	WebExtensions API + `webextension-polyfill`
Testing	Jest + Testing Library + jsdom
CI/CD	GitHub Actions + Docker
Target browsers	Chrome (Manifest V3), Firefox
Package management	npm

📁 Project Structure

app/
 ├── src/
 │   ├── background.ts              # Service worker - message routing between popup and content
 │   ├── content.ts                 # Injected into pages - DOM scraping and element picker
 │   ├── content-picker.ts          # Visual element picker UI and interaction handling
 │   ├── scraper/
 │   │   ├── extractor.ts           # Core content extraction logic
 │   │   ├── html-parser.ts         # HTML parsing with URL resolution
 │   │   ├── selector-builder.ts    # CSS selector generation from DOM elements
 │   │   ├── batch-scraper.ts       # Multi-URL batch scraping functionality
 │   │   └── formatter.ts           # JSON / XML / Markdown / plain text formatters
 │   ├── popup/
 │   │   ├── App.tsx                # Main popup UI and view routing
 │   │   └── components/
 │   │       ├── MainView.tsx       # Main interface with scraping options
 │   │       ├── SelectorsView.tsx  # Custom selector configuration
 │   │       ├── Preview.tsx        # Tabbed content preview with per-item removal
 │   │       ├── ExportButtons.tsx  # Download and clipboard export controls
 │   │       ├── BatchView.tsx      # Batch URL input and progress
 │   │       ├── BatchResultsView.tsx # Batch results display and export
 │   │       ├── RulesView.tsx      # Saved site rules management
 │   │       └── SettingsView.tsx   # User preferences and settings
 │   ├── storage/
 │   │   └── rules.ts               # Site rule profiles and user preferences
 │   └── styles/
 │       └── tailwind.css           # Tailwind base + custom component classes
 ├── test/
 │   ├── App.test.tsx
 │   ├── background.test.ts
 │   ├── content.test.ts
 │   ├── extractor.test.ts
 │   ├── formatter.test.ts
 │   └── storage.test.ts
 ├── manifest.json
 ├── vite.config.ts
 └── popup.html

⚙️ Setup

Requirements

Node.js 18+
Docker (optional, for containerized builds)

Install & Build Locally

cd app
npm install

# Build for Chrome
npm run build:chrome

# Build for Firefox
npm run build:firefox

Built extensions are output to app/dist/chrome and app/dist/firefox.

Run with Docker

# Build and run both browser variants
docker compose up

# Chrome only
docker compose up chrome

# Firefox only
docker compose up firefox

Run Tests

cd app
npm test

🔌 Loading the Extension in Dev Mode

Chrome

Build: npm run build:chrome
Open Chrome and navigate to chrome://extensions
Enable Developer mode using the toggle in the top right
Click Load unpacked
Select the app/dist/chrome folder
The extension icon appears in your toolbar

To reload after a code change: rebuild, then click the refresh icon on the extension card.

Firefox

Build: npm run build:firefox
Open Firefox and navigate to about:debugging
Click This Firefox in the left sidebar
Click Load Temporary Add-on
Open the app/dist/firefox folder and select manifest.json
The extension icon appears in your toolbar

Temporary add-ons in Firefox are removed when the browser closes. Repeat these steps after each restart.

Using Docker Builds with the Browser

Docker builds write output to app/dist/chrome or app/dist/firefox on your host machine via the volume mount. Load that folder exactly the same way as above - Docker only handles the build step, not the browser loading.

🎯 Goal

Provide a comprehensive browser scraping tool that lets users extract structured content from any webpage without writing custom scripts. Features visual element selection, batch processing, site-specific rule profiles, and flexible export options to handle diverse scraping needs while maintaining ease of use.

📄 License

MIT - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕸️ Web Content Scraper Extension

🎬 Demo

✨ Features

🛠️ Tech Stack

📁 Project Structure

⚙️ Setup

Requirements

Install & Build Locally

Run with Docker

Run Tests

🔌 Loading the Extension in Dev Mode

Chrome

Firefox

Using Docker Builds with the Browser

🎯 Goal

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕸️ Web Content Scraper Extension

🎬 Demo

✨ Features

🛠️ Tech Stack

📁 Project Structure

⚙️ Setup

Requirements

Install & Build Locally

Run with Docker

Run Tests

🔌 Loading the Extension in Dev Mode

Chrome

Firefox

Using Docker Builds with the Browser

🎯 Goal

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages