CherioCrawler not working "allow running single crawler instance multiple times" #2637
Replies: 1 comment
-
As you can see in the PR you mentioned, this behavior is covered by tests, and those are still passing. I guess it can be something about your specific setup, please provide a complete reproduction. Also note that this is generally a problem of reusing the same default storages. You can disable storage persistence for your crawler to get around it. This way you will be also able to run the crawlers in parallel, which I can imagine is something you are already doing, but not sharing it as part of the reproduction - that is otherwise not possible without the disabled persistence. https://crawlee.dev/api/core/interface/ConfigurationOptions#persistStorage |
Beta Was this translation helpful? Give feedback.
-
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/cheerio (CheerioCrawler)
Issue description
I believe this is expected to work but it does not
allow running single crawler instance multiple times
#1844
If I try to run() in a loop the first iteration works fine but all the subsequent iterations display.
2024-08-26T22:00:05.502Z INFO CheerioCrawler: Starting the crawler.
2024-08-26T22:00:05.576Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-08-26T22:00:05.783Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":0,"retryHistogram":[],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":451}
Code sample
Beta Was this translation helpful? Give feedback.
All reactions