Hadoop Mapreduce with two jars (one of the jars is needed on namenode only) -


the mapred task simple 'wordcount' implemented java (plz, see http://wiki.apache.org/hadoop/wordcount ).

after last line, "job.waitforcompletion(true);" add code implemented jython.

it means libraries jythoon needed on namenode. however, added libraries jython single jar, , executed it

hadoop jar wordcount.jar in out 

the wordcount done without problem.

the problem want solve have heavy libraries jython not needed slave nodes(mappers , reducers). jar 15m (upper 14m jython).

can split them, , same results?

nobody knows question.

i've solved problem follows: if it's not best.

simply, copy jython.jar /usr/local/hadoop (or path of hadoop installed) default classpath of hadoop, , make jar without jython.jar

if need big libraries mapreduce task,

  1. upload jython.jar hdfs

    hadoop fs -put jython.jar lib/jython.jar

  2. add follow line main code

    distributedcache.addfiletoclasspath(new uri("lib/jython.jar"));


Comments