(+91) 7 999 01 02 03

Join the Most Effective Data Engineering Program 😊

(adsbygoogle = window.adsbygoogle || []).push({}); (adsbygoogle = window.adsbygoogle || []).push({}); (adsbygoogle = window.adsbygoogle || []).push({}); (adsbygoogle = window.adsbygoogle || []).push({}); (adsbygoogle = window.adsbygoogle || []).push({});

How To Install Hadoop On Ubuntu 20

Install Hadoop 3.3.1 on Ubuntu 20 Machine:

Before you start proceeding with the installation, please make sure you have installed VMWare 16, Ubuntu 20, and Java 8. If you haven’t installed it, please go through this blog to install it.

Configure Linux Before Installing Hadoop:

You need to make sure that your Linux machine is configured for Hadoop installation. So, in this blog, we are going to understand how you can configure your Linux. Then we will go through with Hadoop installation. To configure Linux follow the steps:

  1. Login as Root

    $sudo su
    Enter password: provide your password.


  2. Adding A Dedicated User Called “hduser”.

    This will create a separate user for running Hadoop. But before we add a dedicated user let’s create a group first.

  3. Create a group called “hadoop”.

    root@ubuntu:/home/sumeet# sudo addgroup hadoop
  4.  Create a User "hduser".

    root@ubuntu:/home/sumeet# sudo adduser hduser
    Adding user `hduser' ...
    Adding new group `hduser' (1004) ...
    Adding new user `hduser' (1002) with group `hduser' ...
    Creating home directory `/home/hduser' ...
    Copying files from `/etc/skel' ...
    New password:  (suggest to use hadoop so that you can remember it easily)
    Retype new password: 
    password: password updated successfully
    Changing the user information for hduser
    Enter the new value, or press ENTER for the default
    	Full Name []: 
    	Room Number []: 
    	Work Phone []: 
    	Home Phone []: 
    	Other []: 
    Is the information correct? [Y/n] y

    Note: It is optional to provide the above details. Just hit enter and press 'Y' and you are done

  5. Add “hduser” to “hadoop” group:

    root@ubuntu:/home/sumeet#  sudo  adduser  hduser hadoop
  6. Add “hduser” to the sudoers list so that our “hduser” can get permission to do some activities.

    sudo visudo

     Go down and paste the below command:

    hduser ALL=(ALL) ALL

    Then Press Ctrl+X, Y to save it.

    This will add our hduser and hadoop to the local machine.

  7. Logout of your system and now, Login as “hduser”.

  8. Now try copy-pasting any file to check if it is working or not.

  9. Install OpenSSH on your system.

    sudo apt-get install openssh-server
    Enter Password:

     When prompt yes/no press Y, wait till it is finished and you are done.

  10. Generate SSH.

    hduser@ubuntu:~$ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hduser/.ssh/id_rsa): y

     Press Enter when asked anything to continue.

    Now, it’s time to test the SSH setup by connecting the local machine with our “hduser” user.

  11. Copy the public key to the authorized_keys file and set the permission so that ssh becomes passwordless.

    hduser@ubuntu:~$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
    hduser@ubuntu:~$ chmod 700 ~/.ssh/authorized_keys
  12. Now, let’s start ssh:

    hduser@ubuntu:~$ sudo /etc/init.d/ssh  start

    Enter Password when asked. Use the below command if ssh is not running.

    hduser@ubuntu:~$ sudo /etc/init.d/ssh  restart
  13. Test your ssh connectivity. When asked type "yes".

    hduser@ubuntu:~$ ssh  localhost
    The authenticity of host 'localhost (' can't be established.
    ECDSA key fingerprint is SHA256:EvPe69qMTCmunO9dUznIkEO8QKGpACO+vWiKz19+oAk.
    Are you sure you want to continue connecting (yes/no/[fingerprint])? Yes


  14. Disable IPv6.

    It is advisable to disable IPv6 by editing sysctl.conf file.

    hduser@ubuntu:~$ sudo vim /etc/sysctl.conf
    [sudo] password for hduser:

     Now, press 'i' to go into insert mode and paste the below lines of code:

    net.ipv6.conf.all.disable_ipv6 = 1	
    net.ipv6.conf.default.disable_ipv6 = 1	
    net.ipv6.conf.lo.disable_ipv6 = 1


    Type esc:wq to save and exit.
  15. Check if IPv6 is disabled.

    The output should show you 1. If not, then reboot your Ubuntu.

    hduser@ubuntu:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

    So, we have successfully configured our Linux for Hadoop. We are good to go with the installation of Hadoop.

Download and Install Hadoop on Ubuntu: 

We are going to use Hadoop 3.3.1.tar.gz version in this tutorial. You can also work with other versions. So, follow the steps: 

  1. Download the file from here.

  2. Move the file to /usr/local/ and do the below operations:

    hduser@ubuntu:~$ sudo mv ~/Desktop/hadoop-3.3.1.tar.gz /usr/local/
    Enter password: Enter your password
    hduser@ubuntu:~$ cd /usr/local
    // Extract the tar file.
    hduser@ubuntu:~$ sudo tar -xvf hadoop-3.3.1.tar.gz
    // Remove the zipped file after extracting it.
    hduser@ubuntu:/usr/local$ sudo rm hadoop-3.3.1.tar.gz
    // Create an alias name of the file.
    hduser@ubuntu:/usr/local$ sudo ln -s hadoop-3.3.1 hadoop
    // Change the ownership of the file.
    hduser@ubuntu:/usr/local$ sudo chown -R hduser:hadoop hadoop-3.3.1
    // Change the permission of the file.
    hduser@ubuntu:/usr/local$ sudo chmod 777 hadoop-3.3.1
  3. Edit file and also configure Java.

    First, open the file with the below command:

    hduser@ubuntu:/usr/local$ sudo vim /usr/local/hadoop/etc/hadoop/

     Then, press ‘i’ to go to insert mode and go down and paste the below commands: 

    // First line is used to disable IPv6 and continue to IPv4.
    export JAVA_HOME=/usr/local/java/jdk


     Type esc:wq to save and exit.

  4. Update $HOME/.bashrc file

    First, open the .bashrc file. Then type :$ to go to the last line of the file and press ‘i’ to switch into “insert mode”.

    Note: make sure you don’t delete any line.

    hduser@ubuntu:/usr/local$ vim ~/.bashrc
    export HADOOP_HOME=/usr/local/hadoop 
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    # Native Path
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
    # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) 
    export JAVA_HOME=/usr/local/java/jdk
    # Some convenient aliases and functions for running Hadoop-related commands 
    unaliasfs&> /dev/null 
    aliasfs="hadoop fs" 
    unaliashls&> /dev/null 
    aliashls="fs -ls"  

    Press esc :wq to save and exit.

    Note: close the terminal and open a new terminal to make changes effective.

  5. Update yarn-site.xml file.

    // Open yarn-site.xml file.
    hduser@ubuntu:/usr/local$ sudo  vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

     Add the below lines between … tag.

       Whether virtual memory limits will be enforced for containers
       Ratio between virtual memory to physical memory when setting memory limits for containers


  6.  Update core-site.xml file.

    // Open the core-site.xml file.
    hduser@ubuntu:/usr/local$ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml

     Add the below lines between … tag. 

          /app/hadoop/tmp  // it could be anything.
          A base for other temporary directories. 	
           default host and port 
        hadoop.proxyuser. hduser.groups


  7.  Create the above “/app/hadoop/tmp” folder and give appropriate permissions. 

    sudo mkdir -p /app/hadoop/tmp
    sudo chown hduser:hadoop -R /app/hadoop/tmp
    sudo chmod 750 /app/hadoop/tmp
  8. Edit mapred-site.xml.

    // Open mapred-site.file.
    $ sudo vim  /usr/local/hadoop/etc/hadoop/mapred-site.xml

     Add the below lines between … tag.

           Host and port for Job History Server (default 



  9. Create a temporary directory that will be used as a base location for DFS. Also, set the ownerships. 

    sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
    sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
    sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/


    Note: If you forget to set the ownership, you will see a the time when you try to format NameNode in our next step. 

  10. Update hdfs-site.xml file.

    $ sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

     Add the below lines between … tag.



  11. Format NameNode. 

    Open a New Terminal and format the hdfs cluster with the below command: 

    $ hadoop namenode -format

     Note: If the command doesn’t execute try checking the ownership and permission of the directories or check the .bashrc file.

  12. Start Your Single Node Cluster.

     Hadoop is now ready to use. You can test it with the below commands.

    $    -- starts NN,SNN,DN  --Type Yes if anything asked for
    $   -- starts NodeManager,ResourceManager
    $ &&  -- In a single line

     If anything is asked type “yes”, wait for some time and you are done.

  13. Component like “pig” depends on history server. To start and stop history server use below commands: 

    $ start historyserver   -- starts 
    $ stop historyserver   -- stops


  14. Make sure your Hadoop services are up and running. To check use “jps” command as shown below:

    hduser@ubuntu:~/Desktop$ jps


  15. Now we have to check if the home folder is created in hdfs or not. 

    $ hadoop fs -ls

    Note: If you get an error that says “no such file or directory” then it means the home directory was not created. To create a directory in hdfs use the below command:

    $ hadoop fs -mkdir -p /user/hduser --- (Deprecated – but works fine)
    $ hdfs dfs -mkdir -p /user/hduser  --- (Use this)

     Now, check again and this time you will not get any such error. And as the folder is empty you should not get any output.


  16. Check Hadoop is accessible through the browser.

    Below are the URLs of each Hadoop service:

    Note: Make sure your Hadoop services are up and running.
Published By : Sumeet Vishwakarma
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.