Install Hadoop 3.3.1 on Ubuntu 20 Machine:

Before you start proceeding with the installation, please make sure you have installed VMWare 16, Ubuntu 20, and Java 8. If you haven’t installed it, please go through this blog to install it.

Configure Linux Before Installing Hadoop:

You need to make sure that your Linux machine is configured for Hadoop installation. So, in this blog, we are going to understand how you can configure your Linux. Then we will go through with Hadoop installation. To configure Linux follow the steps:

Login as Root

$sudo su
Enter password: provide your password.

Adding A Dedicated User Called “hduser”.

This will create a separate user for running Hadoop. But before we add a dedicated user let’s create a group first.

Create a group called “hadoop”.

root@ubuntu:/home/sumeet# sudo addgroup hadoop

Create a User "hduser".

root@ubuntu:/home/sumeet# sudo adduser hduser
Adding user `hduser' ...
Adding new group `hduser' (1004) ...
Adding new user `hduser' (1002) with group `hduser' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
New password:  (suggest to use hadoop so that you can remember it easily)
Retype new password: 
password: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
	Full Name []: 
	Room Number []: 
	Work Phone []: 
	Home Phone []: 
	Other []: 
Is the information correct? [Y/n] y

Note: It is optional to provide the above details. Just hit enter and press 'Y' and you are done

Add “hduser” to “hadoop” group:

root@ubuntu:/home/sumeet#  sudo  adduser  hduser hadoop

Add “hduser” to the sudoers list so that our “hduser” can get permission to do some activities.
```
sudo visudo
```
Go down and paste the below command:
```
hduser ALL=(ALL) ALL
```
Then Press Ctrl+X, Y to save it.

This will add our hduser and hadoop to the local machine.
Logout of your system and now, Login as “hduser”.
Now try copy-pasting any file to check if it is working or not.
Install OpenSSH on your system.
```
sudo apt-get install openssh-server
Enter Password:
```
When prompt yes/no press Y, wait till it is finished and you are done.
Generate SSH.
```
hduser@ubuntu:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa): y
```
Press Enter when asked anything to continue.

Now, it’s time to test the SSH setup by connecting the local machine with our “hduser” user.

Copy the public key to the authorized_keys file and set the permission so that ssh becomes passwordless.

hduser@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
hduser@ubuntu:~$ chmod 700 ~/.ssh/authorized_keys

Now, let’s start ssh:
```
hduser@ubuntu:~$ sudo /etc/init.d/ssh  start
```
Enter Password when asked. Use the below command if ssh is not running.
```
hduser@ubuntu:~$ sudo /etc/init.d/ssh  restart
```

Test your ssh connectivity. When asked type "yes".

hduser@ubuntu:~$ ssh  localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:EvPe69qMTCmunO9dUznIkEO8QKGpACO+vWiKz19+oAk.
Are you sure you want to continue connecting (yes/no/[fingerprint])? Yes

Disable IPv6.

It is advisable to disable IPv6 by editing sysctl.conf file.

hduser@ubuntu:~$ sudo vim /etc/sysctl.conf
[sudo] password for hduser:

Now, press 'i' to go into insert mode and paste the below lines of code:

net.ipv6.conf.all.disable_ipv6 = 1	
net.ipv6.conf.default.disable_ipv6 = 1	
net.ipv6.conf.lo.disable_ipv6 = 1

Type esc:wq to save and exit.

Check if IPv6 is disabled.

The output should show you 1. If not, then reboot your Ubuntu.
```
hduser@ubuntu:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
```
So, we have successfully configured our Linux for Hadoop. We are good to go with the installation of Hadoop.

Download and Install Hadoop on Ubuntu:

We are going to use Hadoop 3.3.1.tar.gz version in this tutorial. You can also work with other versions. So, follow the steps:

Download the file from here.

Move the file to /usr/local/ and do the below operations:

hduser@ubuntu:~$ sudo mv ~/Desktop/hadoop-3.3.1.tar.gz /usr/local/
Enter password: Enter your password

hduser@ubuntu:~$ cd /usr/local
// Extract the tar file.
hduser@ubuntu:~$ sudo tar -xvf hadoop-3.3.1.tar.gz

// Remove the zipped file after extracting it.
hduser@ubuntu:/usr/local$ sudo rm hadoop-3.3.1.tar.gz

// Create an alias name of the file.
hduser@ubuntu:/usr/local$ sudo ln -s hadoop-3.3.1 hadoop

// Change the ownership of the file.
hduser@ubuntu:/usr/local$ sudo chown -R hduser:hadoop hadoop-3.3.1

// Change the permission of the file.
hduser@ubuntu:/usr/local$ sudo chmod 777 hadoop-3.3.1

Edit hadoop-env.sh file and also configure Java.

First, open the hadoop-env.sh file with the below command:

hduser@ubuntu:/usr/local$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Then, press ‘i’ to go to insert mode and go down and paste the below commands:

// First line is used to disable IPv6 and continue to IPv4.
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true 
export HADOOP_HOME_WARN_SUPPRESS="TRUE" 
export JAVA_HOME=/usr/local/java/jdk

Type esc:wq to save and exit.

Update $HOME/.bashrc file

First, open the .bashrc file. Then type :$ to go to the last line of the file and press ‘i’ to switch into “insert mode”.

Note: make sure you don’t delete any line.

hduser@ubuntu:/usr/local$ vim ~/.bashrc

export HADOOP_HOME=/usr/local/hadoop 
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

# Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) 
export JAVA_HOME=/usr/local/java/jdk
# Some convenient aliases and functions for running Hadoop-related commands 
unaliasfs&> /dev/null 
aliasfs="hadoop fs" 
unaliashls&> /dev/null 
aliashls="fs -ls"  

export PATH=$PATH:$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin

Press esc :wq to save and exit.

Note: close the terminal and open a new terminal to make changes effective.

Update yarn-site.xml file.

// Open yarn-site.xml file.
hduser@ubuntu:/usr/local$ sudo  vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

Add the below lines between … tag.


   yarn.nodemanager.aux-services
   mapreduce_shuffle


   yarn.nodemanager.aux-services.mapreduce.shuffle.class
   org.apache.hadoop.mapred.ShuffleHandler


   yarn.nodemanager.vmem-check-enabled
   false
   Whether virtual memory limits will be enforced for containers


   yarn.nodemanager.vmem-pmem-ratio
   4
   Ratio between virtual memory to physical memory when setting memory limits for containers

Update core-site.xml file.

// Open the core-site.xml file.
hduser@ubuntu:/usr/local$ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml

Add the below lines between … tag.

 
      hadoop.tmp.dir 
      /app/hadoop/tmp  // it could be anything.
      A base for other temporary directories. 	
  	
 
       fs.default.name 				
       hdfs://localhost:9000 
       default host and port 


    hadoop.proxyuser.hduser.hosts
    *
 

    hadoop.proxyuser. hduser.groups
    *

Create the above “/app/hadoop/tmp” folder and give appropriate permissions.

sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop -R /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

Edit mapred-site.xml.

// Open mapred-site.file.
$ sudo vim  /usr/local/hadoop/etc/hadoop/mapred-site.xml

Add the below lines between … tag.


    mapreduce.framework.name
    yarn



       mapreduce.jobhistory.address
       localhost:10020
       Host and port for Job History Server (default 0.0.0.0:10020) 
       



      yarn.app.mapreduce.am.env
      HADOOP_MAPRED_HOME=${HADOOP_HOME}



        mapreduce.map.env
        HADOOP_MAPRED_HOME=${HADOOP_HOME}



        mapreduce.reduce.env
        HADOOP_MAPRED_HOME=${HADOOP_HOME}

Create a temporary directory that will be used as a base location for DFS. Also, set the ownerships.
```
sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/
```
Note: If you forget to set the ownership, you will see a java.io.IOException the time when you try to format NameNode in our next step.

Update hdfs-site.xml file.

$ sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the below lines between … tag.


    dfs.replication
    1


    dfs.namenode.name.dir
    file:/usr/local/hadoop_tmp/hdfs/namenode


    dfs.datanode.data.dir
    file:/usr/local/hadoop_tmp/hdfs/datanode

Format NameNode.

Open a New Terminal and format the hdfs cluster with the below command:
```
$ hadoop namenode -format
```
Note: If the command doesn’t execute try checking the ownership and permission of the directories or check the .bashrc file.

Start Your Single Node Cluster.

Hadoop is now ready to use. You can test it with the below commands.

$ start-dfs.sh    -- starts NN,SNN,DN  --Type Yes if anything asked for
$ start-yarn.sh   -- starts NodeManager,ResourceManager

$ start-dfs.sh && start-yarn.sh  -- In a single line
$ start-all.sh

If anything is asked type “yes”, wait for some time and you are done.

Component like “pig” depends on history server. To start and stop history server use below commands:

$mr-jobhistory-daemon.sh start historyserver   -- starts 
$mr-jobhistory-daemon.sh stop historyserver   -- stops

Make sure your Hadoop services are up and running. To check use “jps” command as shown below:
```
hduser@ubuntu:~/Desktop$ jps
```
Now we have to check if the home folder is created in hdfs or not.
```
$ hadoop fs -ls
```
Note: If you get an error that says “no such file or directory” then it means the home directory was not created. To create a directory in hdfs use the below command:
```
$ hadoop fs -mkdir -p /user/hduser --- (Deprecated – but works fine)
$ hdfs dfs -mkdir -p /user/hduser  --- (Use this)
```
Now, check again and this time you will not get any such error. And as the folder is empty you should not get any output.
Check Hadoop is accessible through the browser.

Below are the URLs of each Hadoop service:
Note: Make sure your Hadoop services are up and running.

NameNode: http://localhost:9870
Resource Manager: http://localhost:8088
MapReduce JobHistory: http://localhost:19888

This completed our Hadoop Installation and you are all set to use Hadoop now.
We also have a video on YouTube on how to install Hadoop on Ubuntu do go check it out here.

Published By : Sumeet Vishwakarma

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Comments

Jquery Comments Plugin

Hi !! Please to Comment

How To Install Hadoop On Ubuntu 20

Install Hadoop 3.3.1 on Ubuntu 20 Machine:

Configure Linux Before Installing Hadoop:

Login as Root

Adding A Dedicated User Called “hduser”.

Create a group called “hadoop”.

Create a User "hduser".

Add “hduser” to “hadoop” group:

Add “hduser” to the sudoers list so that our “hduser” can get permission to do some activities.

Logout of your system and now, Login as “hduser”.

Now try copy-pasting any file to check if it is working or not.

Install OpenSSH on your system.

Generate SSH.

Copy the public key to the authorized_keys file and set the permission so that ssh becomes passwordless.

Now, let’s start ssh:

Test your ssh connectivity. When asked type "yes".

Disable IPv6.

Check if IPv6 is disabled.

Download and Install Hadoop on Ubuntu:

Download the file from here.

Move the file to /usr/local/ and do the below operations:

Edit hadoop-env.sh file and also configure Java.

Update $HOME/.bashrc file

Update yarn-site.xml file.

Update core-site.xml file.

Create the above “/app/hadoop/tmp” folder and give appropriate permissions.

Edit mapred-site.xml.

Create a temporary directory that will be used as a base location for DFS. Also, set the ownerships.

Update hdfs-site.xml file.

Format NameNode.

Start Your Single Node Cluster.

Component like “pig” depends on history server. To start and stop history server use below commands:

Make sure your Hadoop services are up and running. To check use “jps” command as shown below:

Now we have to check if the home folder is created in hdfs or not.

Check Hadoop is accessible through the browser.

NameNode: http://localhost:9870

Resource Manager: http://localhost:8088

MapReduce JobHistory: http://localhost:19888

Comments

Follow Us