Skip to main content

YARN & Resource Management

YARN (Yet Another Resource Negotiator), introduced in Hadoop 2.x, decouples resource management from data processing. This allows Hadoop clusters to run MapReduce, Apache Spark, Flink, Tez, and other frameworks side-by-side.

YARN Architecture

ComponentRole
ResourceManager (RM)Master daemon; allocates cluster resources globally
NodeManager (NM)Per-machine daemon; manages containers and reports health
ApplicationMaster (AM)Per-job daemon; negotiates resources and monitors tasks
ContainerA unit of resource allocation (CPU cores + memory)

How a Job is Submitted

  1. Client submits application to the ResourceManager.
  2. RM allocates a container to start the ApplicationMaster.
  3. AM requests containers from the RM for individual tasks.
  4. RM grants containers; AM launches tasks on the corresponding NodeManagers.
  5. Tasks run, report progress to AM, and write output to HDFS.
  6. AM reports completion to RM; resources are released.

Resource Configuration

Key settings in yarn-site.xml:

<configuration>
<!-- Total memory per NodeManager (MB) -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>

<!-- Maximum memory a single container can request -->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>

<!-- Total vCores per NodeManager -->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
</configuration>

YARN Schedulers

Capacity Scheduler (default)

The cluster is divided into queues, each with a guaranteed capacity. Useful for multi-tenant clusters shared across teams.

Fair Scheduler

All jobs get an equal share of resources over time. Short jobs finish quickly even if a large job is running.

Useful YARN Commands

# List running applications
yarn application -list

# Check application status
yarn application -status <application_id>

# Kill a running application
yarn application -kill <application_id>

# View application logs
yarn logs -applicationId <application_id>

# Check cluster node status
yarn node -list

YARN Web UI

Access the ResourceManager UI at http://localhost:8088 to:

  • Monitor running and completed jobs
  • Drill into individual task attempt logs
  • View per-node resource utilization

Next Steps

Explore The Hadoop Ecosystem to learn about the higher-level tools built on HDFS and YARN.