Skip to content
Jeremy Chou edited this page Mar 6, 2022 · 6 revisions

Redis-based components for Scrapy.

Features

  • Distributed crawling/scraping

    • You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.
  • Distributed post-processing

    • Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue.
  • Scrapy plug-and-play components

    • Scheduler + Duplication Filter, Item Pipeline, Base Spiders.

Requirements

  • Python 2.7, 3.4 or 3.5
  • Redis >= 2.8
  • Scrapy >= 1.0
  • redis-py >= 2.10

Overview

Basic Concept

Contribution

History

Examples

Persist data on database or local file

Clone this wiki locally