-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
It would be very useful to have a method to extract the sitemaps listed in a robots.txt file, per the sitemaps specification: https://www.sitemaps.org/protocol.html#submit_robots
Example usage:
http://www.nytimes.com/robots.txt
Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz
Sitemap: http://www.nytimes.com/sitemaps/sitemap_news/sitemap.xml.gz
Sitemap: http://spiderbites.nytimes.com/sitemaps/sitemap_video/sitemap.xml.gz
Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com_realestate/sitemap.xml.gz
Sitemap: http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/2016_election_sitemap.xml.gz
> robotex = Robotex.new
> robotex.sitemaps('http://www.nytimes.com')
=> ["http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz",
"http://www.nytimes.com/sitemaps/sitemap_news/sitemap.xml.gz",
"http://spiderbites.nytimes.com/sitemaps/sitemap_video/sitemap.xml.gz",
"http://spiderbites.nytimes.com/sitemaps/www.nytimes.com_realestate/sitemap.xml.gz",
"http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/2016_election_sitemap.xml.gz"]Metadata
Metadata
Assignees
Labels
No labels