Upgrading from Hadoop 2 to Hadoop 3: A Complete How-To

April 24, 2026 · 5 min read

Big Data Engineers

Hadoop 3.x introduced erasure coding, YARN Timeline Service v2, multiple NameNode support, and significant performance improvements. If you're still running Hadoop 2.x, this guide walks through a safe, rolling upgrade path — without losing data or taking extended downtime.

What Changed in Hadoop 3.x

Before upgrading, understand the key differences:

Area	Hadoop 2.x	Hadoop 3.x
HDFS replication default	3x replication	Erasure Coding option
NameNodes (HA)	1 active + 1 standby	Up to 5 NameNodes
Minimum Java	Java 7	Java 8
YARN Timeline Service	v1	v2 (HBase-backed)
Shell scripts	Common scripts	Reworked, cleaner separation
Ports	50070, 8020	9870, 9000 (changed)

The port changes alone can break existing monitoring, firewall rules, and client configs — plan for those carefully.

Pre-Upgrade Checklist

Before touching a single config file:

Audit all client applications for hardcoded ports (50070, 8020, 50010, etc.)
Check Java version — every node must run Java 8 or higher
Review deprecated APIs — several mapred and dfs shell commands were removed

Back up namenode metadata:

hdfs dfsadmin -saveNamespace
cp -r /path/to/namenode/current /backup/namenode-$(date +%Y%m%d)

Snapshot your HDFS data directories on each DataNode if possible
Read the release notes for your specific target version (3.3.x or 3.4.x)

Step 1: Upgrade HDFS Metadata

The NameNode metadata format must be finalized before DataNodes are upgraded.

1.1 — Put NameNode in safemode and save namespace

hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave

1.2 — Stop all services in order

stop-yarn.sh
stop-dfs.sh

Stop MapReduce Job History Server last:

mapred --daemon stop historyserver

1.3 — Upgrade NameNode

Replace the Hadoop binaries on the NameNode host with the 3.x release, then run:

hdfs namenode -upgrade

This writes a new metadata layout while preserving the old layout in a previous/ directory — allowing rollback if needed.

Step 2: Upgrade DataNodes

With the NameNode upgraded and running, start each DataNode with the new binaries:

hdfs --daemon start datanode

DataNodes are backward-compatible during the rolling upgrade window. You can upgrade them one at a time and keep HDFS serving data throughout.

Monitor upgrade progress:

hdfs dfsadmin -upgradeProgress status

Step 3: Upgrade YARN

ResourceManager and NodeManagers can be rolled independently in Hadoop 3.x thanks to the work-preserving restart feature.

3.1 — Upgrade ResourceManager

yarn --daemon stop resourcemanager
# Replace binaries
yarn --daemon start resourcemanager

3.2 — Rolling NodeManager upgrade

# On each node, one at a time:
yarn --daemon stop nodemanager
# Replace binaries
yarn --daemon start nodemanager

Running containers are preserved across NodeManager restarts (work-preserving upgrade).

Step 4: Update Configuration Files

Hadoop 3.x uses different default ports. Update core-site.xml, hdfs-site.xml, and any clients pointing to old ports:

Old → New port mappings:

NameNode RPC:       8020  → 9000 (or keep 8020 with explicit config)
NameNode Web UI:    50070 → 9870
Secondary NN:       50090 → 9868
DataNode Web UI:    50075 → 9864
DataNode transfer:  50010 → 9866
DataNode IPC:       50020 → 9867

Update core-site.xml if using the old port:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://namenode-host:9000</value>
</property>

Step 5: Finalize the Upgrade

Once you've validated that everything is working correctly, finalize the upgrade to reclaim the space used by the previous/ layout backup:

hdfs dfsadmin -finalizeUpgrade

Warning: After finalization, rollback is no longer possible.

Rollback Procedure (if needed)

If you encounter critical issues before finalization:

stop-dfs.sh
hdfs namenode -rollback
start-dfs.sh

This reverts the NameNode metadata to the Hadoop 2.x layout. DataNodes also need to be rolled back with the 2.x binaries.

Common Upgrade Issues

Shell Script Changes

Hadoop 3.x reworked the shell scripts. Commands like hadoop-daemon.sh are deprecated in favor of:

# Old (2.x)
hadoop-daemon.sh start datanode
# New (3.x)
hdfs --daemon start datanode

Classpath Changes

Third-party tools (Hive, HBase, Spark) that relied on Hadoop's classpath may need updated versions compatible with Hadoop 3.x. Check each ecosystem component's compatibility matrix.

YARN Timeline Service v2

YARN Timeline Service v2 requires HBase as a backend. If you relied on Timeline Service v1, plan the HBase deployment before enabling v2:

<!-- yarn-site.xml -->
<property>
  <name>yarn.timeline-service.version</name>
  <value>2.0f</value>
</property>

Post-Upgrade Verification

# Verify HDFS health
hdfs dfsadmin -report
hdfs fsck / -summary

# Check YARN cluster
yarn node -list
yarn application -list

# Run a test job
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100

A successful Pi estimation job confirms that HDFS and YARN are both operational end-to-end.

Summary

Phase	Action
Pre-upgrade	Backup metadata, check Java 8+, audit ports
Step 1	Save namespace, stop services, upgrade NameNode
Step 2	Roll DataNodes one at a time
Step 3	Roll ResourceManager and NodeManagers
Step 4	Update config files for new default ports
Step 5	Finalize upgrade (reclaims rollback space)

Upgrading Hadoop 2 to 3 is operationally straightforward when done in order. The biggest surprises tend to come from port changes and ecosystem tool compatibility — audit those before you start and the rest is mechanical.

What Changed in Hadoop 3.x​

Pre-Upgrade Checklist​

Step 1: Upgrade HDFS Metadata​

1.1 — Put NameNode in safemode and save namespace​

1.2 — Stop all services in order​

1.3 — Upgrade NameNode​

Step 2: Upgrade DataNodes​

Step 3: Upgrade YARN​

3.1 — Upgrade ResourceManager​

3.2 — Rolling NodeManager upgrade​

Step 4: Update Configuration Files​

Step 5: Finalize the Upgrade​

Rollback Procedure (if needed)​

Common Upgrade Issues​

Shell Script Changes​

Classpath Changes​

YARN Timeline Service v2​

Post-Upgrade Verification​

Summary​