This is a Ruby script to scrape products from https://www.coldwellbankerhomes.com.
Scraped states, regions, products data then saved to output/ directory in
files with CSV format.
- Static HTML (DOM) parsing for links/general info
- Semantic annotation recognizing in product/residence Microformat for parsing estate-specific data embedded in the product pages
- Service Object Pattern which provide one public method -
#call - Ruby executable script
- All required gems installed with
Bundler curlsupport with Curb for getting pages HTML- Nokogiri for HTML parsing with XPath and CSS selector support.
- CSV export via CSV Ruby class
- Logging via Logger Ruby class
- Code style is provided via RuboCop
- Ruby code quality reporter via RubyCritic
- System: Linux, Mac
- Git
- Ruby version manager (
rbenvorRVM) - Ruby 2.5.0
Bundler- Gems installed via Bundler Gemfile
Clone with SSH:
$ git clone [email protected]:alex-petr/coldwell_banker_scraper.gitOr clone with HTTPS:
$ git clone https://github.com/alex-petr/coldwell_banker_scraper.git$ cd coldwell_banker_scraper/ && brew install rbenv$ rbenv install 2.5.0$ gem install bundler && bundleNo test suite is available. To ensure that this scraper works run it and check
output in terminal and output/ directory for CSV files.
$ bin/scraperAfter running script will generate a bunch of CSV files inside output/ directory.
