Skip to content
CoderHXL edited this page Mar 25, 2023 · 5 revisions

x-crawl npm GitHub license

English | 简体中文

x-crawl is a flexible nodejs crawler library. It can crawl pages, control pages, batch network requests, batch download file resources, polling and crawling, etc. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.

If you feel good, you can give x-crawl repository a Star to support it, your Star will be the motivation for my update.

Features

  • Support asynchronous/synchronous way to crawl data.
  • Flexible writing, supporting multiple ways to write request configuration and obtain crawling results.
  • Flexible crawling interval, no interval/fixed interval/random interval, it is up to you to use/avoid high concurrent crawling.
  • Simple configuration can crawl pages, batch network requests, batch download file resources, polling and crawling, etc.
  • Crawl SPA (single-page application) to generate pre-rendered content (ie "SSR" (server-side rendering)), and use jsdom library to parse the content, and also supports self-parsing.
  • Form submissions, keystrokes, event actions, screenshots of generated pages, etc.
  • Capture and record the success and failure of crawling, and highlight the reminders.
  • Written in TypeScript, has types, provides generics.
Clone this wiki locally