Web Scraper With Selenium

Overview

This Python script is designed to scrape data from a webpage, including links, images, titles, and emails, and save the extracted data into an Excel file.

Features

Extracts inner and outer links from the webpage.
Downloads images from the webpage and saves them locally.
Retrieves titles and emails from the webpage.
Generates an Excel file containing the extracted data.

Requirements

Python 3.x
Selenium
Requests
BeautifulSoup4
Pandas
ChromeDriver (for Selenium WebDriver)

Installation

Clone or download the repository.
Install the required Python packages using pip:
```
pip install -r requirements.txt
```
Make sure to run this command from the directory where the requirements.txt file is located.
Use the ChromeDriver in the git repository or download the latest ChromeDriver executable and place it in your system PATH and specify the path to it in the script.

Usage

Instantiate the webScraper class with the URL of the webpage you want to scrape.
```
url = "https://example.com"
bot = webScraper(url)
```
Run all scraping functions using the runAllFunctions() method.
```
bot.runAllFunctions()
```
Generate the Excel file containing the scraped data using the makeExcelSheet() method.
```
bot.makeExcelSheet()
```

Example Code Snippet

url = "https://www.monolithai.com/blog/4-ways-ai-is-changing-the-packaging-industry"
bot = webScraper(url)
bot.runAllFunctions()
bot.makeExcelSheet()

Example Excel File and Image Directory

If you run above snippet (which is in python file by default) you get

An Excel file named monolithai.xlsx containing the scraped data will be generated after running the script.
Images from the webpage is saved in a directory named monolithai.

example excel file and images directory are in repository

Note: Please make sure to have proper permissions to create directories and write files in the script execution directory.

Author

Abhai Matta

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
monolithai		monolithai
.gitignore		.gitignore
README.md		README.md
chromedriver.exe		chromedriver.exe
monolithai.xlsx		monolithai.xlsx
requirements.txt		requirements.txt
webScraper.py		webScraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper With Selenium

Overview

Features

Requirements

Installation

Usage

Example Code Snippet

Example Excel File and Image Directory

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mari0-0/website-scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper With Selenium

Overview

Features

Requirements

Installation

Usage

Example Code Snippet

Example Excel File and Image Directory

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages