Skip to content

Commit ae4674b

Browse files
committed
feat: version update
1 parent 4023787 commit ae4674b

File tree

5 files changed

+36
-6
lines changed

5 files changed

+36
-6
lines changed

Diff for: CHANGELOG.md

+12
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
# [v10.0.1](https://github.com/coder-hxl/x-crawl/compare/v10.0.0..v10.0.1) (2024-04-10)
2+
3+
### 🐞 Bug fixes
4+
5+
- Fix the wrong export
6+
7+
---
8+
9+
### 🐞 漏洞修复
10+
11+
- 修复错误的导出
12+
113
# [v10.0.0](https://github.com/coder-hxl/x-crawl/compare/v9.0.0..v10.0.0) (2024-04-10)
214

315
### 🚀 Features

Diff for: package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "10.0.0",
4+
"version": "10.0.1",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible Node.js AI-assisted crawler library.",
77
"license": "MIT",

Diff for: publish/README.md

+21-3
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,16 @@ It consists of two parts:
2626
- **🧾 Crawl information** - Controllable crawl information, which will output colored string information in the terminal.
2727
- **🦾 TypeScript** - Own types and implement complete types through generics.
2828

29+
## AI assisted crawler
30+
31+
With the rapid development of network technology, website updates have become more frequent, and changes in class names or structures often bring considerable challenges to crawlers that rely on these elements. Against this background, crawlers combined with AI technology have become a powerful weapon to meet this challenge.
32+
33+
First of all, changes in class names or structures after website updates may cause traditional crawler strategies to fail. This is because crawlers often rely on fixed class names or structures to locate and extract the required information. Once these elements change, the crawler may not be able to accurately find the required data, thus affecting the effectiveness and accuracy of data crawling.
34+
35+
However, crawlers combined with AI technology are better able to cope with this change. AI can also understand and parse the semantic information of web pages through natural language processing and other technologies to more accurately extract the required data.
36+
37+
To sum up, crawlers combined with AI technology can better cope with the problem of class name or structure changes after website updates.
38+
2939
## Example
3040

3141
The combination of crawler and AI allows the crawler and AI to obtain pictures of high-rated vacation rentals according to our instructions:
@@ -54,10 +64,10 @@ crawlApp.crawlPage('https://www.airbnb.cn/s/select_homes').then(async (res) => {
5464
await page.waitForSelector(targetSelector)
5565
const highlyHTML = await page.$eval(targetSelector, (el) => el.innerHTML)
5666

57-
// Let AI obtain the url of img and remove duplicates
67+
// Let the AI get the image link and de-duplicate it (the more detailed the description, the better)
5868
const srcResult = await crawlOpenAIApp.parseElements(
5969
highlyHTML,
60-
'Get the url of img and remove duplicates'
70+
`Get the image link, don't source it inside, and de-duplicate it`
6171
)
6272

6373
browser.close()
@@ -70,13 +80,21 @@ crawlApp.crawlPage('https://www.airbnb.cn/s/select_homes').then(async (res) => {
7080
})
7181
```
7282

83+
**You can even send the whole HTML to the AI to help us operate, because the website content is more complex you also need to describe the location to get more accurately, and will consume a lot of Tokens.**
84+
85+
Procedure:
86+
87+
![](https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/example.gif)
88+
7389
Pictures of highly rated vacation rentals climbed to:
7490

7591
![](https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/example.png)
7692

7793
**Want to know more?**
7894

79-
https://coder-hxl.github.io/x-crawl/guide/#example
95+
For example: View the HTML that AI needs to process or view the srcResult (img url) returned by AI after parsing the HTML according to our instructions
96+
97+
All at the bottom of this example: https://coder-hxl.github.io/x-crawl/guide/#example
8098

8199
**warning**: x-crawl is for legal use only. Any illegal activity using this tool is prohibited. Please be sure to comply with the robots.txt file regulations of the target website. This example is only used to demonstrate the use of x-crawl and is not targeted at a specific website.
82100

Diff for: publish/index.js

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
export * from './dist/x-crawl'
1+
export * from './dist/x-crawl.js'

Diff for: publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "10.0.0",
3+
"version": "10.0.1",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible Node.js AI-assisted crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)