Home / Data Science / YARN Architecture

YARN Architecture

Apache YARN is one of the core component of Hadoop. YARN (Yet Another Resource Negotiator) is the Resource Management Layer of Hadoop Architecture. It was introduced in Hadoop 2.x. To run and process the data stored in HDFS, YARN allows different data processing engines like

  • Graph Processing

  • Interactive Processing

  • Stream Processing

  • Batch Processing

YARN provides job scheduling and it is considered as data operating system for Hadoop 2.x. YARN allows different frameworks running on the same hardware where Hadoop is deployed.

YARN Architecture:-

YARN Components:-

Apache YARN has three core components, they are:

  1. Resource Manager

  2. Application Master

  3. Node Manager

Resource Manager:-

Resource Manager is the master component that mediates all the available cluster resources. Hence it helps to manage the distributed applications running on the YARN system. Resource Manger works with other components, Application Master and Node Manager.

Resource Manager has two main sub components, Application Manager and Resource Scheduler.

  1. Application Manager:-

Application Manager is responsible for managing all the submitted applications. Application Manager validates the application specifications and rejects the application, if that requests unsatisfiable resources for its application master. It forwards the admitted application to the scheduler. Application Manager responsible for recording and managing all the finished applications for a while before they completely evacuated from the Resource Manager’s memory.

  1. Resource Scheduler:-

Resource Scheduler is responsible for allocating the resources to the running applications. The Scheduler is pure Scheduler, it does not monitor or track the applications progress.

Application Master:-

Application Master manages Node managers and containers for a specific job. It runs at one of the slave machines. For each separate job, one Application Master will be engaged. High availability is available in Hadoop 2 for both storage and processing, because Active/Standby architecture at any point in time, one in the masters is Active, and other Resource Managers are in Standby mode. They are waiting to take over when anything happens to the Active.

From version 2 of Hadoop, YARN has two Resource managers namely Resource Manager and HARM (High Availability Resource Manager). These managers operate continuously without interruption for Hadoop because when Resource Manager is down automatically HARM will be operative & vise versa.

Node Manager:-

Node Manager is responsible to execute tasks, to stop/start Containers. YARN Node Manager tracks the health of the node on which it is running. It monitors their resource usage and report the same to the Resource Manager.

The application startup process:-

  1. Client submits a job.

  2. A request goes to Resource Manager.

  3. Resource Manager will handover this responsibility of job execution to Application Manager.

  4. Application Manager will fetch metadata of input file from Name node.

  5. Application Manager will identify one optimized closest slave node to all other slaves.

  6. Application manager will order Node Manager of the slave to start Application Master and

handovers job responsibilities to Application Master.

  1. Application Master will divide the job into two tasks.

  2. Application Master select Node Managers (slave ) for each task.

  3. Application Master will estimate container capacity under each name node.

  4. Application Master will send an estimated proposal for Containers to Resource Scheduler.

  5. Once Containers are sanctioned, Application Master orders selected Node Managers to start containers

  6. Node Manager starts their Containers.

  7. Application Master informs to Application Manager.

  8. After approval of Application Manager, Application Master will order Node Managers to execute (initiate and execute) their tasks.

About COEPD

COEPD is a primarily a community of Business Analysts. Objective of COEPD is to minimize project failures by contributing in the areas of Business Analysis. All BAs who are committed towards this cause, gathered and formed this COEPD Community. Through COEPD, we are striving to bring awareness of Business Analyst role and also the benefits of having a BA in project. As a part of this, we are imparting Business Analysis knowledge to all enthusiastic professionals, who are keen in getting into this BA role.

Leave a Reply

Watch Dragon ball super