ShopScraper is a modular, no-browser web scraper for online shops with a modern Qt interface.
It extracts product data (schema.org / JSON-LD), supports keyword search and domain crawling with pagination, keeps a persistent run history, and exports clean CSV/Excel files — all wrapped in a polished UI.
-
🔎 Scraping Modes
- Keyword mode: search inside target domains using seed keywords, collect valid URLs, and fetch results.
- Domain mode: provide multiple domains/URLs and crawl with simple pagination (
?page=2,/page/2, etc.).
-
📦 Product Extraction
Extractstitle,brand,price,currency,availability, andimagesfrom JSON-LD product blocks. -
💾 Outputs
- CSV and Excel (dynamic columns)
- Persistent history stored in SQLite and visible in the Runs tab
-
⚙️ Settings
Delay & jitter, retries & backoff, custom User-Agent, proxy, Light/Dark themes, output format (CSV/Excel). -
🖥 Modern UI
PySide6 + custom QSS themes, icons, and Linux desktop launcher support.
➡️ Download the latest Linux build here
Currently only the Linux tar.gz package is available.
Windows builds will be added in future releases.
git clone https://github.com/hesameworks/shop-scraper.git
cd shop-scraper
python -m venv .venv
source .venv/bin/activate
pip install -U pip -r requirements.txt
PYTHONPATH=src python src/ui/main_window.pypyinstaller src/ui/main_window.py \
--name ShopScraper \
--noconsole \
--paths src \
--add-data "assets:assets"
./dist/ShopScraper/ShopScraperCreate a file at ~/.local/share/applications/ShopScraper.desktop:
[Desktop Entry]
Type=Application
Name=ShopScraper
Comment=Modular shop web scraper
Exec=/absolute/path/to/dist/ShopScraper/ShopScraper
Icon=/absolute/path/to/dist/ShopScraper/assets/logo.png
Terminal=false
Categories=Utility;Development;Network;
StartupNotify=trueThen make it executable:
chmod +x ~/.local/share/applications/ShopScraper.desktopThis project ships with a GitHub Actions workflow (.github/workflows/build.yml) that automatically builds Linux (and later Windows) binaries on every push to main.
Artifacts are uploaded as build outputs, and can also be attached to GitHub Releases when tagging a version (e.g. v1.0.0).
- Add Windows builds (zip + exe)
- CSS-based extraction fallback (when JSON-LD is missing)
- Smarter rate limiting and concurrency controls
- AppImage /
.debpackages for Linux - Code signing for Windows builds
Pull requests and issues are welcome! Please:
- Run code style checks (
ruff/flake8if available). - Provide a clear description of your changes.
- Add screenshots if you modify the UI.
Released under the MIT License.