[NUTCH-3160] Remove System.exit(..) from reusable code#903
[NUTCH-3160] Remove System.exit(..) from reusable code#903lewismc merged 4 commits intoapache:masterfrom
Conversation
lewismc
left a comment
There was a problem hiding this comment.
Hi @lfoppiano congratulations on your first pull request!
For each error message, we can now capture and increase a counter metric as well. See the ErrorTracker Javadoc for more details.
Thank you
|
@lewismc thank you for the review. I'm not sure I understood well, you want me to use the error tracker for this specific error? Would something like the following be OK? /**
* Configurable constructor
* @param config A populated {@link CommonCrawlConfig}
*/
public CommonCrawlDataDumper(CommonCrawlConfig config) {
this();
this.config = config;
}
/**
* Constructor
*/
public CommonCrawlDataDumper() {
this.errorTracker = new ErrorTracker(NutchMetrics.ERROR_OTHER_TOTAL);
}and then something like this: if (parts == null || parts.size() == 0) {
LOG.error( "No segment directories found in {} ",
segmentRootDir.getAbsolutePath());
this.errorTracker.recordError(ErrorTracker.ErrorType.OTHER);
return;
}I'm not sure which type of error I should use actually. Thank you in advance |
|
Another question, @lewismc , would you like me to apply this to all error messages or only to the one related to the initial change? |
@lfoppiano: I think it's sufficient to just track this error for now. PRs should be focused on a single problem, otherwise they tend to take longer to be ready for merge. We can open another issue to roll-out the error tracker entirely. What do you think, @lewismc ? |
[NUTCH-3160] Remove System.exit(..) from reusable code
|
Hi @lfoppiano I sent you a small PR to address some issues. This very small change has uncovered something really interesting about how metrics can/should be implemented in Nutch Tools Vs Nutch MapReduce jobs. The core issue is that in Nutch Tools (such as
What the dumper does: It uses the one-arg constructor and calls The result is that errors are counted in memory inside the dumper process, but they are never written to Hadoop’s counter system. They don’t show up in the job UI, in job history, or in any tool that reads job counters. The behavior can be considered as “record but don’t report.”... which for now is fine. As we continue to evolve the Nutch metrics system we can design how these errors can be made visible in the same way as in MapReduce jobs. Off the top of my head, we would likely have to add another reporting path (e.g. logging, a file, or a different metrics backend) because the Hadoop counter path is not available without a MapReduce context. This is an architectural choice and functional limitation of being so closely tied to the Hadoop metrics systems. |
|
|
Thanks @lewismc for the extensive explanation. Probably a simpler approach would be to have a logger reading those metrics and output them either regularly or when they change, in the log console. I did merged your PR in my branch, let me know if this PR is good to go. Thanks! |
I don't disagree 😄 I do think that we could sort that out in a different/follow-up issue though. This PR is good. I'll merge. |
lewismc
left a comment
There was a problem hiding this comment.
+1 thanks for accommodating the suggestions @lfoppiano



This small patch removes a
System.exit()within the code, causing e.g. tests to be terminated without notice.