-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
right now we are creating url_hash by using zlib.crc32 in python. It will be less work if we use mysql md5 for url_hash, like we did for redirect_to_hash (issue #120). However this requires some major change in our current code:
- migration: update url_hash to
binary(16), makeurl_hash = unhex(md5(url)), and create index for newurl_hash - update queries that inserted
url_hash:-
insert_article.sql -
get_article_id_by_url.sql -
get_article_by_url.sql
-
- update scripts that use
zlib.crc32for url_hash and uses the queries mentioned above:- newsSpiders/ptt.py
- toutiao_discover_spider.py
- basic_discover_spider.py
- dcard_dicsover_spider.py
- webapi/articles.py
- zs-article.py
- newsSpiders/webapi/articles.py
- newsSpiders/items.py
- newsSpiders/pipelines.py
Metadata
Metadata
Assignees
Labels
No labels