YARN & Resource Management
YARN (Yet Another Resource Negotiator), introduced in Hadoop 2.x, decouples resource management from data processing. This allows Hadoop clusters to run MapReduce, Apache Spark, Flink, Tez, and other frameworks side-by-side.
YARN Architecture
| Component | Role |
|---|---|
| ResourceManager (RM) | Master daemon; allocates cluster resources globally |
| NodeManager (NM) | Per-machine daemon; manages containers and reports health |
| ApplicationMaster (AM) | Per-job daemon; negotiates resources and monitors tasks |
| Container | A unit of resource allocation (CPU cores + memory) |
How a Job is Submitted
- Client submits application to the ResourceManager.
- RM allocates a container to start the ApplicationMaster.
- AM requests containers from the RM for individual tasks.
- RM grants containers; AM launches tasks on the corresponding NodeManagers.
- Tasks run, report progress to AM, and write output to HDFS.
- AM reports completion to RM; resources are released.
Resource Configuration
Key settings in yarn-site.xml:
<configuration>
<!-- Total memory per NodeManager (MB) -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<!-- Maximum memory a single container can request -->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!-- Total vCores per NodeManager -->
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
</configuration>
YARN Schedulers
Capacity Scheduler (default)
The cluster is divided into queues, each with a guaranteed capacity. Useful for multi-tenant clusters shared across teams.
Fair Scheduler
All jobs get an equal share of resources over time. Short jobs finish quickly even if a large job is running.
Useful YARN Commands
# List running applications
yarn application -list
# Check application status
yarn application -status <application_id>
# Kill a running application
yarn application -kill <application_id>
# View application logs
yarn logs -applicationId <application_id>
# Check cluster node status
yarn node -list
YARN Web UI
Access the ResourceManager UI at http://localhost:8088 to:
- Monitor running and completed jobs
- Drill into individual task attempt logs
- View per-node resource utilization
Next Steps
Explore The Hadoop Ecosystem to learn about the higher-level tools built on HDFS and YARN.