YARN (Yet Another Resource Negotiator) was introduced in Hadoop 2.x whose job is to allocate resources for various jobs and schedule tasks. Earlier in Hadoop 1.x MapReduce was the only framework that can be executed over the Hadoop cluster.
In Hadoop YARN, couple of Daemons were replaced i.e.,
- Secondary NameNode
- Resource Manager (Replacement of Job Tracker)
- Node Manager (Replacement of Task Tracker)
- Data Node
So, what is a Resource Manager and Node Manager?
Resource Manager is the master daemons that run on master or NameNode machines. it processes the request and then passes the blocks of requests to its corresponding node manager accordingly.
Node Manager is a daemon that runs on a slave machine/DataNode. is responsible for executing the tasks on every single Data Node.
Features of YARN:
Compatibility: Hadoop 1 MapReduce applications can easily run-on YARN without any disruption.
Scalability: Whenever there is a requirement to increase the nodes in a Hadoop cluster, Hadoop allows the Resource Manager to extend and manage it as well.
Multi-Tenancy: Different engines that access data on the Hadoop cluster can work in sync efficiently.
Cluster Utilization: It enables better cluster utilization since YARN supports the dynamic utilization of clusters in Hadoop.
Components of Hadoop YARN:
Following are the information of YARN components:
It submits a MapReduce job.
It accepts the job submitted by the client. It assigns and manages resources for all the applications. It has its own 2 components:
- Scheduler: It helps to allocate resources to the applications. It doesn’t perform monitoring or tracking of any status of the application. If any task fails it doesn’t even care to provide any services.
- Application Manager: It is responsible for accepting the job submissions and negotiating the first container from the resource manager. It launches the Application Master in the slave machine and monitoring for progress. It re-launches the job in case of failure.
It is a daemon that runs on slave machines. Every slave machine has a node manager running. It monitors the operation of the container and sends a report to the resource manager.
Every job submission to the framework is an application. It negotiates resources from the Resource Manager along with monitors and tracks the progress of the tasks. It also sends the health report periodically to the resource manager.
It is just a JVM wherein all the jobs are executed. It is a collection of physical resources- RAM, cores, etc. Whenever a requirement comes to the resource manager it launches a container. As it is launched, the application master will execute the task within the container the result would be sent to the client.