Skip to main content

High Availability

A single NameNode is a single point of failure in Hadoop. Hadoop 2.x introduced NameNode High Availability (HA) using two NameNodes — an Active and a Standby — to eliminate this risk.

How HA Works

The Active NameNode handles all client operations. The Standby NameNode keeps its state synchronized via JournalNodes (a quorum-based shared edit log). If the Active NameNode fails, the Standby is promoted automatically using ZooKeeper for leader election.

ZooKeeper Cluster (3+ nodes)
|
|--- Active NameNode ---|
| |--- JournalNode 1 ---|
|--- Standby NameNode --|--- JournalNode 2 ----| (edit log quorum)
--- JournalNode 3 ---|

Configuring NameNode HA

In hdfs-site.xml:

<configuration>
<!-- Logical name for the pair -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>

<!-- The two NameNode IDs -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>

<!-- RPC addresses -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>namenode1.example.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>namenode2.example.com:8020</value>
</property>

<!-- JournalNode edit log URI -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://jn1:8485;jn2:8485;jn3:8485/mycluster</value>
</property>

<!-- Enable automatic failover -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>

Manual Failover

If automatic failover is not enabled, you can switch the Active NameNode manually:

hdfs haadmin -failover nn1 nn2

Checking HA Status

hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

ResourceManager HA

YARN's ResourceManager also supports HA. The configuration pattern is similar, using ZooKeeper for state store and leader election:

<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>