Skip to content

Recover loss PTT data #27

@pm5

Description

@pm5

Following disinfoRG/ZeroScraper#105 we have these articles snapshoted from PTTRead. We also have the PTTRead parser ready with #25. To get them into the datasets we still need a way to switch between PTT and PTTRead parsers for these snapshots. Since ZeroScraper project concerns only about scraping, it seems more reasonable to leave the choice of parsers to ArticleParser project. That means we should replicate here the information in SnapshotLoss table in scraper db somehow.

I think this is something that will happen again in the future so better to build certain mechanism for it. We can:

  • Add a "parser" field in publication_mapping.info.
  • Add a CLI option for ap-parse.py to manually choose a parser for one article, overriding the default parser. This information should be recorded in publication_mapping.info.
  • Have the program always check publication_mapping.info to see if a parser is specified when updating a publication; use the default parser if there is none.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions