admin@onlinelearningcenter.in (+91) 7 999 01 02 03

How To Install Hadoop On Ubuntu 20

Install Hadoop 3.3.1 on Ubuntu 20 Machine:

Before you start proceeding with the installation, please make sure you have installed VMWare 16, Ubuntu 20, and Java 8. If you haven’t installed it, please go through this blog to install it.

Configure Linux Before Installing Hadoop:

You need to make sure that your Linux machine is configured for Hadoop installation. So, in this blog, we are going to understand how you can configure your Linux. Then we will go through with Hadoop installation. To configure Linux follow the steps:

  1. Login as Root

    $sudo su
    Enter password: provide your password.

     

  2. Adding A Dedicated User Called “hduser”.

    This will create a separate user for running Hadoop. But before we add a dedicated user let’s create a group first.

  3. Create a group called “hadoop”.

    root@ubuntu:/home/sumeet# sudo addgroup hadoop
  4.  Create a User "hduser".

    root@ubuntu:/home/sumeet# sudo adduser hduser
    Adding user `hduser' ...
    Adding new group `hduser' (1004) ...
    Adding new user `hduser' (1002) with group `hduser' ...
    Creating home directory `/home/hduser' ...
    Copying files from `/etc/skel' ...
    New password:  (suggest to use hadoop so that you can remember it easily)
    Retype new password: 
    password: password updated successfully
    Changing the user information for hduser
    Enter the new value, or press ENTER for the default
    	Full Name []: 
    	Room Number []: 
    	Work Phone []: 
    	Home Phone []: 
    	Other []: 
    Is the information correct? [Y/n] y

    Note: It is optional to provide the above details. Just hit enter and press 'Y' and you are done

  5. Add “hduser” to “hadoop” group:

    root@ubuntu:/home/sumeet#  sudo  adduser  hduser hadoop
  6. Add “hduser” to the sudoers list so that our “hduser” can get permission to do some activities.

    sudo visudo

     Go down and paste the below command:

    hduser ALL=(ALL) ALL

    Then Press Ctrl+X, Y to save it.

    This will add our hduser and hadoop to the local machine.

  7. Logout of your system and now, Login as “hduser”.

  8. Now try copy-pasting any file to check if it is working or not.

  9. Install OpenSSH on your system.

    sudo apt-get install openssh-server
    Enter Password:

     When prompt yes/no press Y, wait till it is finished and you are done.

     
  10. Generate SSH.

    hduser@ubuntu:~$ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hduser/.ssh/id_rsa): y

     Press Enter when asked anything to continue.

    Now, it’s time to test the SSH setup by connecting the local machine with our “hduser” user.

  11. Copy the public key to the authorized_keys file and set the permission so that ssh becomes passwordless.

    hduser@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    hduser@ubuntu:~$ chmod 700 ~/.ssh/authorized_keys
  12. Now, let’s start ssh:

    hduser@ubuntu:~$ sudo /etc/init.d/ssh  start

    Enter Password when asked. Use the below command if ssh is not running.

    hduser@ubuntu:~$ sudo /etc/init.d/ssh  restart
  13. Test your ssh connectivity. When asked type "yes".

    hduser@ubuntu:~$ ssh  localhost
    The authenticity of host 'localhost (127.0.0.1)' can't be established.
    ECDSA key fingerprint is SHA256:EvPe69qMTCmunO9dUznIkEO8QKGpACO+vWiKz19+oAk.
    Are you sure you want to continue connecting (yes/no/[fingerprint])? Yes

     

  14. Disable IPv6.

    It is advisable to disable IPv6 by editing sysctl.conf file.

    hduser@ubuntu:~$ sudo vim /etc/sysctl.conf
    [sudo] password for hduser:

     Now, press 'i' to go into insert mode and paste the below lines of code:

    net.ipv6.conf.all.disable_ipv6 = 1	
    net.ipv6.conf.default.disable_ipv6 = 1	
    net.ipv6.conf.lo.disable_ipv6 = 1

     

    Type esc:wq to save and exit.
  15. Check if IPv6 is disabled.

    The output should show you 1. If not, then reboot your Ubuntu.

    hduser@ubuntu:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
    1

    So, we have successfully configured our Linux for Hadoop. We are good to go with the installation of Hadoop.

Download and Install Hadoop on Ubuntu: 

We are going to use Hadoop 3.3.1.tar.gz version in this tutorial. You can also work with other versions. So, follow the steps: 

  1. Download the file from here.

  2. Move the file to /usr/local/ and do the below operations:

    hduser@ubuntu:~$ sudo mv ~/Desktop/hadoop-3.3.1.tar.gz /usr/local/
    Enter password: Enter your password
    
    hduser@ubuntu:~$ cd /usr/local
    // Extract the tar file.
    hduser@ubuntu:~$ sudo tar -xvf hadoop-3.3.1.tar.gz
    
    // Remove the zipped file after extracting it.
    hduser@ubuntu:/usr/local$ sudo rm hadoop-3.3.1.tar.gz
    
    // Create an alias name of the file.
    hduser@ubuntu:/usr/local$ sudo ln -s hadoop-3.3.1 hadoop
    
    // Change the ownership of the file.
    hduser@ubuntu:/usr/local$ sudo chown -R hduser:hadoop hadoop-3.3.1
    
    // Change the permission of the file.
    hduser@ubuntu:/usr/local$ sudo chmod 777 hadoop-3.3.1
  3. Edit hadoop-env.sh file and also configure Java.

    First, open the hadoop-env.sh file with the below command:

    hduser@ubuntu:/usr/local$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

     Then, press ‘i’ to go to insert mode and go down and paste the below commands: 

    // First line is used to disable IPv6 and continue to IPv4.
    export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true 
    export HADOOP_HOME_WARN_SUPPRESS="TRUE" 
    export JAVA_HOME=/usr/local/java/jdk

     

     Type esc:wq to save and exit.

  4. Update $HOME/.bashrc file

    First, open the .bashrc file. Then type :$ to go to the last line of the file and press ‘i’ to switch into “insert mode”.

    Note: make sure you don’t delete any line.

    hduser@ubuntu:/usr/local$ vim ~/.bashrc
    
    export HADOOP_HOME=/usr/local/hadoop 
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export HADOOP_YARN_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    
    # Native Path
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
    
    # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) 
    export JAVA_HOME=/usr/local/java/jdk
    # Some convenient aliases and functions for running Hadoop-related commands 
    unaliasfs&> /dev/null 
    aliasfs="hadoop fs" 
    unaliashls&> /dev/null 
    aliashls="fs -ls"  
    
    export PATH=$PATH:$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin

    Press esc :wq to save and exit.

    Note: close the terminal and open a new terminal to make changes effective.

  5. Update yarn-site.xml file.

    // Open yarn-site.xml file.
    hduser@ubuntu:/usr/local$ sudo  vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

     Add the below lines between … tag.

    
       yarn.nodemanager.aux-services
       mapreduce_shuffle
    
    
       yarn.nodemanager.aux-services.mapreduce.shuffle.class
       org.apache.hadoop.mapred.ShuffleHandler
    
    
       yarn.nodemanager.vmem-check-enabled
       false
       Whether virtual memory limits will be enforced for containers
    
    
       yarn.nodemanager.vmem-pmem-ratio
       4
       Ratio between virtual memory to physical memory when setting memory limits for containers
    
    

      

  6.  Update core-site.xml file.

    // Open the core-site.xml file.
    hduser@ubuntu:/usr/local$ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml

     Add the below lines between … tag. 

     
          hadoop.tmp.dir 
          /app/hadoop/tmp  // it could be anything.
          A base for other temporary directories. 	
      	
     
           fs.default.name 				
           hdfs://localhost:9000 
           default host and port 
    
    
        hadoop.proxyuser.hduser.hosts
        *
     
    
        hadoop.proxyuser. hduser.groups
        *
    

     

  7.  Create the above “/app/hadoop/tmp” folder and give appropriate permissions. 

    sudo mkdir -p /app/hadoop/tmp
    sudo chown hduser:hadoop -R /app/hadoop/tmp
    sudo chmod 750 /app/hadoop/tmp
  8. Edit mapred-site.xml.

    // Open mapred-site.file.
    $ sudo vim  /usr/local/hadoop/etc/hadoop/mapred-site.xml

     Add the below lines between … tag.

    
        mapreduce.framework.name
        yarn
    
    
    
           mapreduce.jobhistory.address
           localhost:10020
           Host and port for Job History Server (default 0.0.0.0:10020) 
           
    
    
    
          yarn.app.mapreduce.am.env
          HADOOP_MAPRED_HOME=${HADOOP_HOME}
    
    
    
            mapreduce.map.env
            HADOOP_MAPRED_HOME=${HADOOP_HOME}
    
    
    
            mapreduce.reduce.env
            HADOOP_MAPRED_HOME=${HADOOP_HOME}
    
    

      

  9. Create a temporary directory that will be used as a base location for DFS. Also, set the ownerships. 

    sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
    sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
    sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

     

    Note: If you forget to set the ownership, you will see a java.io.IOException the time when you try to format NameNode in our next step. 

  10. Update hdfs-site.xml file.

    $ sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

     Add the below lines between … tag.

    
        dfs.replication
        1
    
    
        dfs.namenode.name.dir
        file:/usr/local/hadoop_tmp/hdfs/namenode
    
    
        dfs.datanode.data.dir
        file:/usr/local/hadoop_tmp/hdfs/datanode
    

     

  11. Format NameNode. 

    Open a New Terminal and format the hdfs cluster with the below command: 

    $ hadoop namenode -format

     Note: If the command doesn’t execute try checking the ownership and permission of the directories or check the .bashrc file.

  12. Start Your Single Node Cluster.

     Hadoop is now ready to use. You can test it with the below commands.

    $ start-dfs.sh    -- starts NN,SNN,DN  --Type Yes if anything asked for
    $ start-yarn.sh   -- starts NodeManager,ResourceManager
    
    $ start-dfs.sh && start-yarn.sh  -- In a single line
    $ start-all.sh

     If anything is asked type “yes”, wait for some time and you are done.

  13. Component like “pig” depends on history server. To start and stop history server use below commands: 

    $mr-jobhistory-daemon.sh start historyserver   -- starts 
    $mr-jobhistory-daemon.sh stop historyserver   -- stops

     

  14. Make sure your Hadoop services are up and running. To check use “jps” command as shown below:

    hduser@ubuntu:~/Desktop$ jps

     

  15. Now we have to check if the home folder is created in hdfs or not. 

    $ hadoop fs -ls

    Note: If you get an error that says “no such file or directory” then it means the home directory was not created. To create a directory in hdfs use the below command:

    $ hadoop fs -mkdir -p /user/hduser --- (Deprecated – but works fine)
    $ hdfs dfs -mkdir -p /user/hduser  --- (Use this)

     Now, check again and this time you will not get any such error. And as the folder is empty you should not get any output.

      

  16. Check Hadoop is accessible through the browser.

    Below are the URLs of each Hadoop service:

    Note: Make sure your Hadoop services are up and running.
Published By : Sumeet Vishwakarma
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Comments

Jquery Comments Plugin