Installing Hadoop in Pseudo Distributed Mode (Single Node Cluster) on Ubuntu 16.4 system.

In this post we shall see how to install hadoop on a single node cluster (pseudo distributed mode).

Hadoop software library framework allows distributed processing of large data sets across clusters of computers(can be commodity computers as well) using simple programming models. It is designed to scale up from single server to thousands of machines, each offering local computation and storage. “Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.” For more information look the following link.

  1. Hadoop YARN.
  2. Hadoop Distributed File System(HDFS).
Architecture of Hadoop

The following is the step by step procedure for installation of hadoop single node cluster.

Ubuntu system.
Hadoop tar(stable version : 2.9.1) can downloaded from the following link
which is about 345 MB.

java -version (If java is installed the we get the version of installed java in our system).
Command for installing java using apt-get
sudo apt-get install openjdk-8-jre
Step 1 : Add new user group. (We add new user/users to this group.)

sudo addgroup hadoop // new group hadoop is added .
sudo adduser — ingroup hadoop hduser
Adding a new user in hadoop group.
To login as hduser in hadoop group.

ssh-keygen -t rsa -p “ ”

we get the following prompt:

using command ssh-keygen -t rsa -p “”
using ssh localhost for password less login.
hadoop-env,sh in /usr/local/hadoop/etc/hadoop/
Create tmp directory.
change ownership
format the name node before using hadoop
Stopping all the started processes.

