-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Following disinfoRG/ZeroScraper#105 we have these articles snapshoted from PTTRead. We also have the PTTRead parser ready with #25. To get them into the datasets we still need a way to switch between PTT and PTTRead parsers for these snapshots. Since ZeroScraper project concerns only about scraping, it seems more reasonable to leave the choice of parsers to ArticleParser project. That means we should replicate here the information in SnapshotLoss table in scraper db somehow.
I think this is something that will happen again in the future so better to build certain mechanism for it. We can:
- Add a "parser" field in
publication_mapping.info. - Add a CLI option for
ap-parse.pyto manually choose a parser for one article, overriding the default parser. This information should be recorded inpublication_mapping.info. - Have the program always check
publication_mapping.infoto see if a parser is specified when updating a publication; use the default parser if there is none.
Metadata
Metadata
Assignees
Labels
No labels