Crawl the page

Let's start crawling real web pages! For these remaining steps, you'll need a website you can crawl. Preferably a small one with less than 100 pages so the crawling doesn't take all day. You can use my personal blog, https://wagslane.dev if you don't have another in mind.

crawlPage(base_url, url, pages)

Create a crawlPage function in crawl.js. For now, it will just take a base URL (the root of the site we're going to crawl).

For now, your function should:

Use fetch to fetch the webpage of the baseURL
If the HTTP status code is an error level code, print an error and return
If the response content-type header isn't text/html print and error and return
Otherwise, just print the HTML body as a string and be done

Remember to use try/catch as appropriate for anything that could result in an error!

main.js

Import crawlPage into your main function, and call it with the base_url passed in and an empty dictionary. Give your program a shot! It should print some HTML that it fetched from the internet!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Crawl the page

crawlPage(base_url, url, pages)

main.js

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Crawl the page

crawlPage(base_url, url, pages)

main.js