NUTCH-3130 Address deprecated API usage across Nutch codebase and build#869
NUTCH-3130 Address deprecated API usage across Nutch codebase and build#869lewismc wants to merge 2 commits intoapache:masterfrom
Conversation
| int threshold = Math.min(3, searched.length() / TRESHOLD_DIVIDER); | ||
| for (int i = 0; i < normalized.length && value == null; i++) { | ||
| if (StringUtils.getLevenshteinDistance(searched, normalized[i]) < threshold) { | ||
| if (StringUtils.compareIgnoreCase(searched, normalized[i]) < threshold) { //.getLevenshteinDistance(searched, normalized[i]) < threshold) { |
There was a problem hiding this comment.
I still need to investigate whether this is a suitable replacement and also need to remove thie comment.
There was a problem hiding this comment.
SpellCheckedMetadata is used only by protocol-http and protocol-httpclient. We could deprecate it, use CaseInsensitiveMetadata instead (see NUTCH-3002) and later remove the class SpellCheckedMetadata entirely. Nowadays, spell-checking HTTP headers sounds odd, while 20 years ago it might have been a good idea.
Changing the behavior in opposite to the name does not seem the right way.
If we want to keep the class, we need to use LevenshteinDistance.
|
|
||
| @Override | ||
| protected void finalize() throws Throwable { | ||
| shutDown(); |
There was a problem hiding this comment.
I'm not sure we can simply remove the call to shutdown. I need to further investigate options and confirm.
|
|
||
| @Override | ||
| protected void finalize() { | ||
| /** |
There was a problem hiding this comment.
I'm also not sure on this right now. I need to do some more research.
|
I did want to also note, there are 15 instances of the @deprecated annotation across 9 files in the codebase: I'm not sure whether is the correct time to be removing these methods but if we decide to keep them we should definitely augment the Javadoc to detail the replacement API usage. |
sebastian-nagel
left a comment
There was a problem hiding this comment.
Great! Thanks, @lewismc!
Some work still to be done. Let me know whether I shall take over.
| int threshold = Math.min(3, searched.length() / TRESHOLD_DIVIDER); | ||
| for (int i = 0; i < normalized.length && value == null; i++) { | ||
| if (StringUtils.getLevenshteinDistance(searched, normalized[i]) < threshold) { | ||
| if (StringUtils.compareIgnoreCase(searched, normalized[i]) < threshold) { //.getLevenshteinDistance(searched, normalized[i]) < threshold) { |
There was a problem hiding this comment.
SpellCheckedMetadata is used only by protocol-http and protocol-httpclient. We could deprecate it, use CaseInsensitiveMetadata instead (see NUTCH-3002) and later remove the class SpellCheckedMetadata entirely. Nowadays, spell-checking HTTP headers sounds odd, while 20 years ago it might have been a good idea.
Changing the behavior in opposite to the name does not seem the right way.
If we want to keep the class, we need to use LevenshteinDistance.
|
|
||
| @Override | ||
| protected void finalize() throws Throwable { | ||
| shutDown(); |
| installExtensions(this.fRegisteredPlugins); | ||
| } catch (PluginRuntimeException e) { | ||
| LOG.error("Could not install extensions.", e.toString()); | ||
| LOG.error("Could not install extensions. {}", e.toString()); |
There was a problem hiding this comment.
+1
Or: LOG.error("Could not install extensions:", e);
|
I didn't see you had responded here @sebastian-nagel . I will revisit this PR, update it and perform some more testing. |
Proposed fix for https://issues.apache.org/jira/browse/NUTCH-3130
This is a fairly straightforward PR. Some grunt work involved...
I took a bit of time to investigate issues with and options for removing instances of
@override finalize()... this may need more investigation and certainly some testing to make sure this PR doesn't break any behavior.