NUTCH-3108 Fix SLF4J Class Loader Conflict in language-identifier#849
Conversation
|
@maciejpuzianowski thanks for the PR @tballison any comments on the tika dependency chain in Nutch. Should we be looking to upgrade tika to 3.1.0? Would that address the slf4j issue. |
We currently use a shaded Tika package (2.9.1.0, thanks @tballison!) because of a conflict with the commons-io version required by Tika (or POI) and provided by Hadoop, see NUTCH-2959. Upgrading will force everybody to use at least Hadoop 3.4.0 in distributed mode. @maciejpuzianowski, could you provide the Hadoop version of your cluster? This may help to reproduce the issue and test alternative solutions, such as an upgrade to a more recent version of Tika. Thanks! |
|
Sure @sebastian-nagel, |
|
Thank you @sebastian-nagel for beating me to it. Y, we had to shade commons-io because hadoop was using an old version, and Tika and POI were using some of the newer API calls. I just released this shim for 2.9.3 and 3.1.0. Maybe give those a try? I did notice this: https://github.com/tballison/hadoop-safe-tika/blob/main/tika-parsers-standard-package-shaded/pom.xml#L67 Which may be causing the problems. If we have to downgrade slf4j to match hadoop, we can do that...or maybe we shade logging too? I'm happy to make those updates and release a 2.9.3.1 and/or 3.1.0.1. :D Let me know what makes sense. |
|
Hi @maciejpuzianowski, finally got back to this issue and your PR:
@tballison: I'll try your shaded packages as well... |
|
Merging this PR. The Tika upgrade is tracked in NUTCH-3110 and #850. Thanks, @maciejpuzianowski and @tballison! |
When running Apache Nutch 1.20 on a distributed Hadoop cluster with the language-identifier plugin enabled, a class loader conflict occurs during the parse process. This results in the following error:
2025-02-24 08:58:59,152 INFO mapreduce.Job: Task Id : attempt_1740061418437_0135_m_000001_0, Status : FAILED
Error: loader constraint violation: when resolving method 'org.slf4j.ILoggerFactory org.slf4j.impl.StaticLoggerBinder.getLoggerFactory()' the class loader org.apache.nutch.plugin.PluginClassLoader @4c5228e7 of the current class, org/slf4j/LoggerFactory, and the class loader 'app' for the method's defining class, org/slf4j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature (org.slf4j.LoggerFactory is in unnamed module of loader org.apache.nutch.plugin.PluginClassLoader @4c5228e7, parent loader 'app'; org.slf4j.impl.StaticLoggerBinder is in unnamed module of loader 'app')
I have managed to resolve this issue by modifying following files:
ivy.xml ->
and plugin.xml ->