Upgrading from Hadoop 2 to Hadoop 3: A Complete How-To
Hadoop 3.x introduced erasure coding, YARN Timeline Service v2, multiple NameNode support, and significant performance improvements. If you're still running Hadoop 2.x, this guide walks through a safe, rolling upgrade path — without losing data or taking extended downtime.
What Changed in Hadoop 3.x
Before upgrading, understand the key differences:
| Area | Hadoop 2.x | Hadoop 3.x |
|---|---|---|
| HDFS replication default | 3x replication | Erasure Coding option |
| NameNodes (HA) | 1 active + 1 standby | Up to 5 NameNodes |
| Minimum Java | Java 7 | Java 8 |
| YARN Timeline Service | v1 | v2 (HBase-backed) |
| Shell scripts | Common scripts | Reworked, cleaner separation |
| Ports | 50070, 8020 | 9870, 9000 (changed) |
The port changes alone can break existing monitoring, firewall rules, and client configs — plan for those carefully.
Pre-Upgrade Checklist
Before touching a single config file:
- Audit all client applications for hardcoded ports (50070, 8020, 50010, etc.)
- Check Java version — every node must run Java 8 or higher
- Review deprecated APIs — several
mapredanddfsshell commands were removed - Back up namenode metadata:
hdfs dfsadmin -saveNamespace
cp -r /path/to/namenode/current /backup/namenode-$(date +%Y%m%d) - Snapshot your HDFS data directories on each DataNode if possible
- Read the release notes for your specific target version (3.3.x or 3.4.x)
Step 1: Upgrade HDFS Metadata
The NameNode metadata format must be finalized before DataNodes are upgraded.
1.1 — Put NameNode in safemode and save namespace
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hdfs dfsadmin -safemode leave
1.2 — Stop all services in order
stop-yarn.sh
stop-dfs.sh
Stop MapReduce Job History Server last:
mapred --daemon stop historyserver
1.3 — Upgrade NameNode
Replace the Hadoop binaries on the NameNode host with the 3.x release, then run:
hdfs namenode -upgrade
This writes a new metadata layout while preserving the old layout in a previous/ directory — allowing rollback if needed.
Step 2: Upgrade DataNodes
With the NameNode upgraded and running, start each DataNode with the new binaries:
hdfs --daemon start datanode
DataNodes are backward-compatible during the rolling upgrade window. You can upgrade them one at a time and keep HDFS serving data throughout.
Monitor upgrade progress:
hdfs dfsadmin -upgradeProgress status
Step 3: Upgrade YARN
ResourceManager and NodeManagers can be rolled independently in Hadoop 3.x thanks to the work-preserving restart feature.
3.1 — Upgrade ResourceManager
yarn --daemon stop resourcemanager
# Replace binaries
yarn --daemon start resourcemanager
3.2 — Rolling NodeManager upgrade
# On each node, one at a time:
yarn --daemon stop nodemanager
# Replace binaries
yarn --daemon start nodemanager
Running containers are preserved across NodeManager restarts (work-preserving upgrade).
Step 4: Update Configuration Files
Hadoop 3.x uses different default ports. Update core-site.xml, hdfs-site.xml, and any clients pointing to old ports:
Old → New port mappings:
NameNode RPC: 8020 → 9000 (or keep 8020 with explicit config)
NameNode Web UI: 50070 → 9870
Secondary NN: 50090 → 9868
DataNode Web UI: 50075 → 9864
DataNode transfer: 50010 → 9866
DataNode IPC: 50020 → 9867
Update core-site.xml if using the old port:
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode-host:9000</value>
</property>
Step 5: Finalize the Upgrade
Once you've validated that everything is working correctly, finalize the upgrade to reclaim the space used by the previous/ layout backup:
hdfs dfsadmin -finalizeUpgrade
Warning: After finalization, rollback is no longer possible.
Rollback Procedure (if needed)
If you encounter critical issues before finalization:
stop-dfs.sh
hdfs namenode -rollback
start-dfs.sh
This reverts the NameNode metadata to the Hadoop 2.x layout. DataNodes also need to be rolled back with the 2.x binaries.
Common Upgrade Issues
Shell Script Changes
Hadoop 3.x reworked the shell scripts. Commands like hadoop-daemon.sh are deprecated in favor of:
# Old (2.x)
hadoop-daemon.sh start datanode
# New (3.x)
hdfs --daemon start datanode
Classpath Changes
Third-party tools (Hive, HBase, Spark) that relied on Hadoop's classpath may need updated versions compatible with Hadoop 3.x. Check each ecosystem component's compatibility matrix.
YARN Timeline Service v2
YARN Timeline Service v2 requires HBase as a backend. If you relied on Timeline Service v1, plan the HBase deployment before enabling v2:
<!-- yarn-site.xml -->
<property>
<name>yarn.timeline-service.version</name>
<value>2.0f</value>
</property>
Post-Upgrade Verification
# Verify HDFS health
hdfs dfsadmin -report
hdfs fsck / -summary
# Check YARN cluster
yarn node -list
yarn application -list
# Run a test job
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100
A successful Pi estimation job confirms that HDFS and YARN are both operational end-to-end.
Summary
| Phase | Action |
|---|---|
| Pre-upgrade | Backup metadata, check Java 8+, audit ports |
| Step 1 | Save namespace, stop services, upgrade NameNode |
| Step 2 | Roll DataNodes one at a time |
| Step 3 | Roll ResourceManager and NodeManagers |
| Step 4 | Update config files for new default ports |
| Step 5 | Finalize upgrade (reclaims rollback space) |
Upgrading Hadoop 2 to 3 is operationally straightforward when done in order. The biggest surprises tend to come from port changes and ecosystem tool compatibility — audit those before you start and the rest is mechanical.
