NIPS Papers Scraper

This Java-based web scraper extracts metadata and PDF links from NIPS (NeurIPS) conference papers. It stores the data in a CSV file and downloads the PDFs to local directories by year.

Features

Scrapes paper metadata (title, authors, year, PDF link) from multiple years of NIPS.
Downloads PDFs into year-specific folders.
Stores metadata in papers_metadata.csv.
Retry mechanism with exponential backoff for network errors.
Progress bar for individual paper downloads and overall progress.

Requirements

Java 8 or higher
Jsoup for HTML parsing
Apache HttpClient for HTTP requests

Installation

Clone this repository:

git clone https://github.com/yourusername/nips-papers-scraper.git

Add the required dependencies: Jsoup and Apache HttpClient to your project.
Compile and run the Scraper class.

Usage

Run the Scraper class.
The scraper will: -Scrape metadata and download PDFs from NIPS papers. -Store metadata in papers_metadata.csv. -Create year-specific directories to save PDFs.
CSV Format: -"Title", "Year", "Authors", "PDF Link"

Download Progress

Progress bars are shown for each paper download and overall progress. Updates are printed in the terminal during the download process.

Customization

Thread Pool Size: Adjust the thread pool size (newFixedThreadPool(10)) for more or fewer threads.
CSV Path: Change the CSV_FILE_PATH constant to customize the CSV location.
Retries & Timeouts: Adjust the retry count and timeouts with constants like MAX_RETRIES and TIMEOUT.

Published by basim-12

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.settings		.settings
2023		2023
src/main/java/test		src/main/java/test
target/classes		target/classes
.classpath		.classpath
.project		.project
README.md		README.md
papers_metadata.csv		papers_metadata.csv
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NIPS Papers Scraper

Features

Requirements

Installation

Usage

Download Progress

Customization

About

Uh oh!

Releases

Packages

Uh oh!

Languages

basim-12/JavaScrapper-for-NEURips

Folders and files

Latest commit

History

Repository files navigation

NIPS Papers Scraper

Features

Requirements

Installation

Usage

Download Progress

Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages