docs: Solution to program crash caused by crawlPage API

coder-hxl · coder-hxl · commit daf1555fe8ab · 2023-12-26T15:07:15.000+08:00
diff --git a/README.md b/README.md
@@ -131,6 +131,7 @@ x-crawl is an open source project under the MIT license, completely free to use.
     - [AnyObject](#anyobject)
 - [FAQ](#faq)
   - [The relationship between crawlPage API and puppeteer](#the-relationship-between-crawlpage-api-and-puppeteer)
+  - [Using crawlPage API causes the program to crash](#using-crawlpage-api-causes-the-program-to-crash)
 - [More](#more)
   - [Community](#community)
   - [Issues](#issues)
@@ -334,8 +335,6 @@ Lifecycle functions owned by the crawlPage API:
 
 In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
 
-**Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
-
 #### Open Browser
 
 Disable running the browser in headless mode.
@@ -2072,6 +2071,33 @@ export interface AnyObject extends Object {
 
 The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppeteer), you only need to pass in some configuration options to let x-crawl help you simplify the operation and get the intact Brower instance and Page instance , x-crawl does not override it.
 
+### Using crawlPage API causes the program to crash
+
+If you need to crawl many pages in one crawlPage, it is recommended that after crawling each page, use [onCrawlItemComplete life cycle function] (#onCrawlItemComplete) to process the results of each target and close the page instance. If no shutdown operation is performed, then The program may crash due to too many pages being opened (related to the performance of the device itself).
+
+```js
+import xCrawl from 'x-crawl'
+
+const myXCrawl = xCrawl()
+
+// Use the advanced configuration mode
+myXCrawl.crawlPage({
+  targets: [
+    'https://www.example.com/page-1',
+    'https://www.example.com/page-2',
+    'https://www.example.com/page-3',
+    'https://www.example.com/page-4',
+    'https://www.example.com/page-5',
+    'https://www.example.com/page-6'
+  ],
+  onCrawlItemComplete(crawlPageSingleResult) {
+    const { page } = crawlPageSingleResult.data
+
+    page.close()
+  }
+})
+```
+
 ## More
 
 ### Community
diff --git a/docs/cn.md b/docs/cn.md
@@ -131,6 +131,7 @@ x-crawl 是采用 MIT 许可的开源项目，使用完全免费。如果你在
     - [AnyObject](#AnyObject)
 - [常见问题](#常见问题)
   - [crawlPage API 跟 puppeteer 的关系](#crawlPage-API-跟-puppeteer-的关系)
+  - [使用 crawlPage API 造成程序崩溃](#使用-crawlPage-API-造成程序崩溃)
 - [更多](#更多)
   - [社区](#社区)
   - [Issues](#Issues)
@@ -331,8 +332,6 @@ crawlPage API 拥有的声明周期函数:
 
 在 onCrawlItemComplete 函数中你可以提前拿到每次爬取目标的结果。
 
-**注意:** 如果你需要一次性爬取很多页面，就需要在每个页面爬下来后，用这个生命周期函数来处理每个目标的结果并关闭 page 实例，如果不进行关闭操作，则会因开启的 page 过多而造成程序崩溃。
-
 #### 打开浏览器
 
 取消以无头模式运行浏览器。
@@ -2060,6 +2059,33 @@ export interface AnyObject extends Object {
 
 crawlPage API 内置了 [puppeteer](https://github.com/puppeteer/puppeteer) ，您只需要传入一些配置选项即可让 x-crawl 帮助您简化操作，并拿到完好的 Brower 实例和 Page 实例，x-crawl 并不会对其重写。
 
+### 使用 crawlPage API 造成程序崩溃
+
+如果你需要在一个 crawlPage 爬取很多页面，建议在每个页面爬下来后，用 [onCrawlItemComplete 生命周期函数](#onCrawlItemComplete) 来处理每个目标的结果并关闭 page 实例，如果不进行关闭操作，则可能因开启的 page 过多而造成程序崩溃（跟自身设备性能有关）。
+
+```js
+import xCrawl from 'x-crawl'
+
+const myXCrawl = xCrawl()
+
+// 使用进阶配置方式
+myXCrawl.crawlPage({
+  targets: [
+    'https://www.example.com/page-1',
+    'https://www.example.com/page-2',
+    'https://www.example.com/page-3',
+    'https://www.example.com/page-4',
+    'https://www.example.com/page-5',
+    'https://www.example.com/page-6'
+  ],
+  onCrawlItemComplete(crawlPageSingleResult) {
+    const { page } = crawlPageSingleResult.data
+
+    page.close()
+  }
+})
+```
+
 ## 更多
 
 ### 社区
diff --git a/publish/README.md b/publish/README.md
@@ -131,6 +131,7 @@ x-crawl is an open source project under the MIT license, completely free to use.
     - [AnyObject](#anyobject)
 - [FAQ](#faq)
   - [The relationship between crawlPage API and puppeteer](#the-relationship-between-crawlpage-api-and-puppeteer)
+  - [Using crawlPage API causes the program to crash](#using-crawlpage-api-causes-the-program-to-crash)
 - [More](#more)
   - [Community](#community)
   - [Issues](#issues)
@@ -334,8 +335,6 @@ Lifecycle functions owned by the crawlPage API:
 
 In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
 
-**Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
-
 #### Open Browser
 
 Disable running the browser in headless mode.
@@ -2072,6 +2071,33 @@ export interface AnyObject extends Object {
 
 The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppeteer), you only need to pass in some configuration options to let x-crawl help you simplify the operation and get the intact Brower instance and Page instance , x-crawl does not override it.
 
+### Using crawlPage API causes the program to crash
+
+If you need to crawl many pages in one crawlPage, it is recommended that after crawling each page, use [onCrawlItemComplete life cycle function] (#onCrawlItemComplete) to process the results of each target and close the page instance. If no shutdown operation is performed, then The program may crash due to too many pages being opened (related to the performance of the device itself).
+
+```js
+import xCrawl from 'x-crawl'
+
+const myXCrawl = xCrawl()
+
+// Use the advanced configuration mode
+myXCrawl.crawlPage({
+  targets: [
+    'https://www.example.com/page-1',
+    'https://www.example.com/page-2',
+    'https://www.example.com/page-3',
+    'https://www.example.com/page-4',
+    'https://www.example.com/page-5',
+    'https://www.example.com/page-6'
+  ],
+  onCrawlItemComplete(crawlPageSingleResult) {
+    const { page } = crawlPageSingleResult.data
+
+    page.close()
+  }
+})
+```
+
 ## More
 
 ### Community