Skip to content

Commit daf1555

Browse files
committed
docs: Solution to program crash caused by crawlPage API
1 parent 0303959 commit daf1555

File tree

3 files changed

+84
-6
lines changed

3 files changed

+84
-6
lines changed

README.md

+28-2
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ x-crawl is an open source project under the MIT license, completely free to use.
131131
- [AnyObject](#anyobject)
132132
- [FAQ](#faq)
133133
- [The relationship between crawlPage API and puppeteer](#the-relationship-between-crawlpage-api-and-puppeteer)
134+
- [Using crawlPage API causes the program to crash](#using-crawlpage-api-causes-the-program-to-crash)
134135
- [More](#more)
135136
- [Community](#community)
136137
- [Issues](#issues)
@@ -334,8 +335,6 @@ Lifecycle functions owned by the crawlPage API:
334335

335336
In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
336337

337-
**Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
338-
339338
#### Open Browser
340339

341340
Disable running the browser in headless mode.
@@ -2072,6 +2071,33 @@ export interface AnyObject extends Object {
20722071
20732072
The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppeteer), you only need to pass in some configuration options to let x-crawl help you simplify the operation and get the intact Brower instance and Page instance , x-crawl does not override it.
20742073
2074+
### Using crawlPage API causes the program to crash
2075+
2076+
If you need to crawl many pages in one crawlPage, it is recommended that after crawling each page, use [onCrawlItemComplete life cycle function] (#onCrawlItemComplete) to process the results of each target and close the page instance. If no shutdown operation is performed, then The program may crash due to too many pages being opened (related to the performance of the device itself).
2077+
2078+
```js
2079+
import xCrawl from 'x-crawl'
2080+
2081+
const myXCrawl = xCrawl()
2082+
2083+
// Use the advanced configuration mode
2084+
myXCrawl.crawlPage({
2085+
targets: [
2086+
'https://www.example.com/page-1',
2087+
'https://www.example.com/page-2',
2088+
'https://www.example.com/page-3',
2089+
'https://www.example.com/page-4',
2090+
'https://www.example.com/page-5',
2091+
'https://www.example.com/page-6'
2092+
],
2093+
onCrawlItemComplete(crawlPageSingleResult) {
2094+
const { page } = crawlPageSingleResult.data
2095+
2096+
page.close()
2097+
}
2098+
})
2099+
```
2100+
20752101
## More
20762102
20772103
### Community

docs/cn.md

+28-2
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ x-crawl 是采用 MIT 许可的开源项目,使用完全免费。如果你在
131131
- [AnyObject](#AnyObject)
132132
- [常见问题](#常见问题)
133133
- [crawlPage API 跟 puppeteer 的关系](#crawlPage-API-跟-puppeteer-的关系)
134+
- [使用 crawlPage API 造成程序崩溃](#使用-crawlPage-API-造成程序崩溃)
134135
- [更多](#更多)
135136
- [社区](#社区)
136137
- [Issues](#Issues)
@@ -331,8 +332,6 @@ crawlPage API 拥有的声明周期函数:
331332

332333
在 onCrawlItemComplete 函数中你可以提前拿到每次爬取目标的结果。
333334

334-
**注意:** 如果你需要一次性爬取很多页面,就需要在每个页面爬下来后,用这个生命周期函数来处理每个目标的结果并关闭 page 实例,如果不进行关闭操作,则会因开启的 page 过多而造成程序崩溃。
335-
336335
#### 打开浏览器
337336

338337
取消以无头模式运行浏览器。
@@ -2060,6 +2059,33 @@ export interface AnyObject extends Object {
20602059
20612060
crawlPage API 内置了 [puppeteer](https://github.com/puppeteer/puppeteer) ,您只需要传入一些配置选项即可让 x-crawl 帮助您简化操作,并拿到完好的 Brower 实例和 Page 实例,x-crawl 并不会对其重写。
20622061
2062+
### 使用 crawlPage API 造成程序崩溃
2063+
2064+
如果你需要在一个 crawlPage 爬取很多页面,建议在每个页面爬下来后,用 [onCrawlItemComplete 生命周期函数](#onCrawlItemComplete) 来处理每个目标的结果并关闭 page 实例,如果不进行关闭操作,则可能因开启的 page 过多而造成程序崩溃(跟自身设备性能有关)。
2065+
2066+
```js
2067+
import xCrawl from 'x-crawl'
2068+
2069+
const myXCrawl = xCrawl()
2070+
2071+
// 使用进阶配置方式
2072+
myXCrawl.crawlPage({
2073+
targets: [
2074+
'https://www.example.com/page-1',
2075+
'https://www.example.com/page-2',
2076+
'https://www.example.com/page-3',
2077+
'https://www.example.com/page-4',
2078+
'https://www.example.com/page-5',
2079+
'https://www.example.com/page-6'
2080+
],
2081+
onCrawlItemComplete(crawlPageSingleResult) {
2082+
const { page } = crawlPageSingleResult.data
2083+
2084+
page.close()
2085+
}
2086+
})
2087+
```
2088+
20632089
## 更多
20642090
20652091
### 社区

publish/README.md

+28-2
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ x-crawl is an open source project under the MIT license, completely free to use.
131131
- [AnyObject](#anyobject)
132132
- [FAQ](#faq)
133133
- [The relationship between crawlPage API and puppeteer](#the-relationship-between-crawlpage-api-and-puppeteer)
134+
- [Using crawlPage API causes the program to crash](#using-crawlpage-api-causes-the-program-to-crash)
134135
- [More](#more)
135136
- [Community](#community)
136137
- [Issues](#issues)
@@ -334,8 +335,6 @@ Lifecycle functions owned by the crawlPage API:
334335

335336
In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
336337

337-
**Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
338-
339338
#### Open Browser
340339

341340
Disable running the browser in headless mode.
@@ -2072,6 +2071,33 @@ export interface AnyObject extends Object {
20722071
20732072
The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppeteer), you only need to pass in some configuration options to let x-crawl help you simplify the operation and get the intact Brower instance and Page instance , x-crawl does not override it.
20742073
2074+
### Using crawlPage API causes the program to crash
2075+
2076+
If you need to crawl many pages in one crawlPage, it is recommended that after crawling each page, use [onCrawlItemComplete life cycle function] (#onCrawlItemComplete) to process the results of each target and close the page instance. If no shutdown operation is performed, then The program may crash due to too many pages being opened (related to the performance of the device itself).
2077+
2078+
```js
2079+
import xCrawl from 'x-crawl'
2080+
2081+
const myXCrawl = xCrawl()
2082+
2083+
// Use the advanced configuration mode
2084+
myXCrawl.crawlPage({
2085+
targets: [
2086+
'https://www.example.com/page-1',
2087+
'https://www.example.com/page-2',
2088+
'https://www.example.com/page-3',
2089+
'https://www.example.com/page-4',
2090+
'https://www.example.com/page-5',
2091+
'https://www.example.com/page-6'
2092+
],
2093+
onCrawlItemComplete(crawlPageSingleResult) {
2094+
const { page } = crawlPageSingleResult.data
2095+
2096+
page.close()
2097+
}
2098+
})
2099+
```
2100+
20752101
## More
20762102
20772103
### Community

0 commit comments

Comments
 (0)