Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 361 Bytes

File metadata and controls

7 lines (4 loc) · 361 Bytes

Paper Scraper

Code to streamline grabbing raw full text from Wiley journals using the crossref api.

The workflow takes a list of dois of papers in wiley journals, gets the metadata and then downloads the fulltext as a pdf. The pdf is then converted to plain text and dumped out to a file ready to be ingested by StanfordNLP for analysis.

SWDG -- 30 August