Monday, November 7, 2016

Setting up Hadoop to run on Single Node in Ubuntu 15.04

This is tested on hadoop-2.7.3.

Improvement on Hadoop documentation : http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html

Step 1 

Make sure Java is installed

Installation instruction : http://suhothayan.blogspot.com/2010/02/how-to-set-javahome-in-ubuntu.html

Step 2

Install pre-requisites

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Step 3

Setup Hadoop

$ gedit hadoop-2.7.3/etc/hadoop/core-site.xml

Add (replace {user-name} with system username, E.g "foo" for /home/foo/)

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
<name>hadoop.proxyuser.{user-name}.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.{user-name}.hosts</name>
        <value>*</value>
    </property>
</configuration>

$ gedit hadoop-2.7.3/etc/hadoop/hdfs-site.xml 

Add 

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Step 4

Run

$ ssh localhost 

If it requested for password, run:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Try ssh localhost again.
If it still asks for password, run following and try again:

$ ssh-keygen -t rsa
#Press enter for each line
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod og-wx ~/.ssh/authorized_keys 

Step 5

Clean namenode

$ ./hadoop-2.7.3/bin/hdfs namenode -format

Step 6 * Not provided in Hadoop Documentation 

Replace ${JAVA_HOME} with hardcoded path in hadoop-env.sh

gedit hadoop-2.7.3/etc/hadoop/hadoop-env.sh

Edit the file as 

# The java implementation to use.
export JAVA_HOME={path}/jdk1.8.0_111

Step 7

Start Hadoop 

$ ./hadoop-2.7.3/sbin/start-all.sh

The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Browse the web interface for the NameNode;

http://localhost:50070/

Step 8

Check processors running by running:

$ jps

Output: 

xxxxx NameNode
xxxxx ResourceManager
xxxxx DataNode
xxxxx NodeManager
xxxxx SecondaryNameNode

Step 9

Make HDFS directories for MapReduce jobs:

$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user
$ ./hadoop-2.7.3/bin/hdfs dfs -mkdir /user/{user-name}