Hello! I am trying to run a job for our data team and we are getting errors using dumbo. We are using the latest version of Dumbo and Cloudera.
Command used to run the job:
"ls[benjamin@arya dedup]$ dumbo start jaccard.py -input products -output products-output13 -hadoop /usr/ -hadooplib /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/"
Stacktrace:
13/10/30 13:05:32 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
13/10/30 13:05:32 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/home/benjamin/mapreduce/jobs/dedup/typedbytes.pyc, /home/benjamin/mapreduce/jobs/dedup/jaccard.py, /home/benjamin/mapreduce/jobs/dedup/dumbo/backends/common.pyc] [] /tmp/streamjob5478521893861821465.jar tmpDir=null
13/10/30 13:05:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/30 13:05:34 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/30 13:05:35 INFO mapred.JobClient: Running job: job_201310231818_0015
13/10/30 13:05:36 INFO mapred.JobClient: map 0% reduce 0%
13/10/30 13:05:47 INFO mapred.JobClient: Task Id : attempt_201310231818_0015_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.AutoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:620)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.AutoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1641)
Any help would be greatly appreciated!
Hello! I am trying to run a job for our data team and we are getting errors using dumbo. We are using the latest version of Dumbo and Cloudera.
Command used to run the job:
"ls[benjamin@arya dedup]$ dumbo start jaccard.py -input products -output products-output13 -hadoop /usr/ -hadooplib /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/"
Stacktrace:
13/10/30 13:05:32 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
13/10/30 13:05:32 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/home/benjamin/mapreduce/jobs/dedup/typedbytes.pyc, /home/benjamin/mapreduce/jobs/dedup/jaccard.py, /home/benjamin/mapreduce/jobs/dedup/dumbo/backends/common.pyc] [] /tmp/streamjob5478521893861821465.jar tmpDir=null
13/10/30 13:05:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/30 13:05:34 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/30 13:05:35 INFO mapred.JobClient: Running job: job_201310231818_0015
13/10/30 13:05:36 INFO mapred.JobClient: map 0% reduce 0%
13/10/30 13:05:47 INFO mapred.JobClient: Task Id : attempt_201310231818_0015_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.AutoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:620)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.AutoInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1641)
Any help would be greatly appreciated!