Skip to content

Modular no-browser shop web scraper with Qt UI, product extraction, history, and CSV/Excel export.

License

hesameworks/shop-scraper

Repository files navigation

🛒 ShopScraper

ShopScraper is a modular, no-browser web scraper for online shops with a modern Qt interface.
It extracts product data (schema.org / JSON-LD), supports keyword search and domain crawling with pagination, keeps a persistent run history, and exports clean CSV/Excel files — all wrapped in a polished UI.


✨ Features

  • 🔎 Scraping Modes

    • Keyword mode: search inside target domains using seed keywords, collect valid URLs, and fetch results.
    • Domain mode: provide multiple domains/URLs and crawl with simple pagination (?page=2, /page/2, etc.).
  • 📦 Product Extraction
    Extracts title, brand, price, currency, availability, and images from JSON-LD product blocks.

  • 💾 Outputs

    • CSV and Excel (dynamic columns)
    • Persistent history stored in SQLite and visible in the Runs tab
  • ⚙️ Settings
    Delay & jitter, retries & backoff, custom User-Agent, proxy, Light/Dark themes, output format (CSV/Excel).

  • 🖥 Modern UI
    PySide6 + custom QSS themes, icons, and Linux desktop launcher support.


📥 Download (Linux only)

➡️ Download the latest Linux build here

Currently only the Linux tar.gz package is available.
Windows builds will be added in future releases.


🚀 Quick Start (From Source)

git clone https://github.com/hesameworks/shop-scraper.git
cd shop-scraper
python -m venv .venv
source .venv/bin/activate
pip install -U pip -r requirements.txt
PYTHONPATH=src python src/ui/main_window.py

🛠 Local Build (Linux)

pyinstaller src/ui/main_window.py \
  --name ShopScraper \
  --noconsole \
  --paths src \
  --add-data "assets:assets"

./dist/ShopScraper/ShopScraper

🖼 Linux Desktop Launcher

Create a file at ~/.local/share/applications/ShopScraper.desktop:

[Desktop Entry]
Type=Application
Name=ShopScraper
Comment=Modular shop web scraper
Exec=/absolute/path/to/dist/ShopScraper/ShopScraper
Icon=/absolute/path/to/dist/ShopScraper/assets/logo.png
Terminal=false
Categories=Utility;Development;Network;
StartupNotify=true

Then make it executable:

chmod +x ~/.local/share/applications/ShopScraper.desktop

⚡ Continuous Integration

This project ships with a GitHub Actions workflow (.github/workflows/build.yml) that automatically builds Linux (and later Windows) binaries on every push to main. Artifacts are uploaded as build outputs, and can also be attached to GitHub Releases when tagging a version (e.g. v1.0.0).


🗺 Roadmap

  • Add Windows builds (zip + exe)
  • CSS-based extraction fallback (when JSON-LD is missing)
  • Smarter rate limiting and concurrency controls
  • AppImage / .deb packages for Linux
  • Code signing for Windows builds

🤝 Contributing

Pull requests and issues are welcome! Please:

  1. Run code style checks (ruff/flake8 if available).
  2. Provide a clear description of your changes.
  3. Add screenshots if you modify the UI.

📜 License

Released under the MIT License.

About

Modular no-browser shop web scraper with Qt UI, product extraction, history, and CSV/Excel export.

Topics

Resources

License

Stars

Watchers

Forks