Handling Errors from failedRequestHandler
in Single URL Flow with PlaywrightCrawler
#2175
-
Hello Crawlee maintainers and community, I am working on a project where I use IssueThe challenge arises when the Expected BehaviorWhat I expect is a mechanism by which Current BehaviorAs of now, errors in the Code SnippetHere is the relevant part of the main function that orchestrates the crawling process: // ... previous code
export const main = async (dbs) => {
// ... existing main function logic
while (true) {
// ... more code
try {
// ...
const result = await checkUrl(checkUrlParams);
// ...
} catch (error) {
// I want to handle errors from failedRequestHandler here
}
// ...
}
};
workerpool.worker({ main });
// ... more code The import { BasicCrawler, PlaywrightCrawler } from 'crawlee';
import { Page } from 'playwright';
import { CONFIG } from '../config.js';
// ... other code
export class PlaywrightCrawleePageResolver extends CrawleePromiseResolver<
PlaywrightCrawler,
Page
> {
constructor() {
super(
(resolveUrl) =>
new PlaywrightCrawler({
// ... other configurations
failedRequestHandler({ request, log }, error) {
log.info(`Request to: ${request.url} failed...`);
// Here, I want to propagate this error to the main function's context
// throw new CriticalCrawlError('Critical error occurred', { cause: error });
},
// ... rest of the configuration
})
);
}
} Specific FlowI do not follow a broad crawling flow as outlined in the Crawlee documentation; my application processes URLs one at a time, and based on the content, different scenarios are handled. Therefore, I am looking for a solution that fits this particular use case. RequestIs there an existing pattern or a recommended way to propagate errors from Handling errors effectively in my specified flow is crucial, and I believe having this feature would significantly improve Crawlee's flexibility for various use cases. Thank you for your time and for the excellent work on the library. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
What you can do is to re-add it to the queue (with a different |
Beta Was this translation helpful? Give feedback.
-
I had a similar requirement, whereby: const crawler = new PlaywrightCrawler(/* your options here */);
const url = '<some-url-here>';
try {
await crawler.run([url]);
} catch (error) {
// Error does not seem to propagate
} This was because I was throwing the default Error in the crawler request handler logic: throw new Error('some error message'); The workaround is to instead throw a CriticalError, e.g. import { CriticalError } from 'crawlee'; throw new CriticalError('some error message'); |
Beta Was this translation helpful? Give feedback.
I manage to solve this issue by adding
reject