Hadoop and Java: A Version Compatibility Guide
Picking the wrong Java version for your Hadoop cluster is one of the most common causes of cryptic build failures, runtime exceptions, and upgrade blockers. This guide maps Hadoop releases to their supported Java versions, explains what changed between Java versions, and offers practical recommendations for 2025.
Quick Reference: Hadoop ↔ Java Compatibility Matrix
| Hadoop Version | Java 8 | Java 11 | Java 17 | Java 21 |
|---|---|---|---|---|
| 2.10.x | Required | Not supported | Not supported | Not supported |
| 3.2.x | Supported | Supported | Not supported | Not supported |
| 3.3.x | Supported | Supported | Limited (3.3.5+) | Not supported |
| 3.4.x | Supported | Supported | Supported | Preview |
| 3.5.x (planned) | Deprecated | Supported | Supported | Supported |
Rule of thumb for 2025: Run Java 11 for stability. Java 17 with Hadoop 3.4.x is becoming viable. Java 8 is approaching EOL and should be phased out.
Java 8: The Legacy Standard
Hadoop 3.x requires Java 8 as the minimum. Java 8 was the de facto standard for Hadoop workloads from 2014 through about 2022.
Current status: Oracle Java 8 reached end-of-life for public updates in March 2022. Adoptium (Temurin) and Amazon Corretto still provide free, maintained Java 8 builds, but the ecosystem is moving on.
When to use: Only if your organization has a hard dependency on a third-party library or Hadoop ecosystem component that isn't yet compatible with Java 11+.
Install Temurin 8 (Ubuntu/Debian)
sudo apt-get install -y wget apt-transport-https
wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo apt-key add -
echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | sudo tee /etc/apt/sources.list.d/adoptium.list
sudo apt-get update
sudo apt-get install -y temurin-8-jdk
Java 11: The Recommended Choice (LTS)
Java 11, released in 2018, is the most tested Java version with Hadoop 3.x as of 2025. All major ecosystem tools (Spark, Hive, HBase, Flink) support Java 11.
Key changes from Java 8
- Module system (Project Jigsaw) —
--add-opensflags needed for reflective access - G1GC is now default — better pause times for large heaps (critical for NameNode with hundreds of millions of files)
javax.*→jakarta.*— does not affect Hadoop itself, but affects some ecosystem tools- Removed deprecated API —
sun.misc.Unsafeusage patterns changed
Required JVM flags for Hadoop on Java 11+
Hadoop's internal code (and many dependencies) use reflective access that Java 11's module system restricts by default. Add these to hadoop-env.sh:
export HADOOP_OPTS="$HADOOP_OPTS \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED \
--add-opens=java.base/sun.net.dns=ALL-UNNAMED \
--add-opens=java.base/sun.net.util=ALL-UNNAMED"
Hadoop 3.3.0 and later automatically adds most of these — check your version before manually adding them.
Java 17: Stricter Modules, Better Performance
Java 17 (LTS, 2021) tightened the module encapsulation started in Java 11. It fully enforces strong encapsulation of JDK internals that Java 11 only warned about.
Hadoop compatibility
- Hadoop 3.3.4 and earlier: Not supported — too many internal reflection violations
- Hadoop 3.3.5+: Experimental support
- Hadoop 3.4.0+: Officially supported
What required fixing for Java 17
The main breaking changes were in sun.misc.Unsafe usage and java.lang.reflect access patterns throughout Hadoop's RPC and serialization code. The community patched these across multiple JIRAs (HADOOP-17975, HADOOP-18079, and others).
GC improvements relevant to Hadoop
Java 17 includes ZGC and Shenandoah as production-ready collectors (not just experimental as in Java 11). For NameNode workloads with large heaps (32–512GB), ZGC's sub-millisecond pauses can dramatically improve NameNode responsiveness:
# hadoop-env.sh — NameNode with ZGC
export HDFS_NAMENODE_OPTS="-Xms64g -Xmx64g -XX:+UseZGC $HDFS_NAMENODE_OPTS"
Java 21: The Future LTS
Java 21 (LTS, 2023) is the current long-term support release. It brings virtual threads (Project Loom), pattern matching, and record patterns.
Hadoop compatibility
Full support for Java 21 is still in progress as of early 2025. Hadoop 3.4.x has preliminary support — test thoroughly before using in production.
Virtual threads are particularly interesting for Hadoop's IPC layer, which handles thousands of concurrent RPC connections. Future Hadoop releases may leverage virtual threads to reduce thread-pool overhead on NameNodes and ResourceManagers.
Setting JAVA_HOME for Hadoop
Hadoop reads JAVA_HOME from hadoop-env.sh. Always set it explicitly rather than relying on the system default — different users or shell environments may have different defaults:
# /etc/hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
# Or use java_home helper on macOS:
# export JAVA_HOME=$(/usr/libexec/java_home -v 11)
Verify that every node in the cluster uses the same Java version and JAVA_HOME path to avoid subtle serialization incompatibilities.
Multi-JDK Environments: SDKMAN!
If you maintain multiple Hadoop environments with different Java requirements, SDKMAN! makes switching trivial:
# Install SDKMAN!
curl -s "https://get.sdkman.io" | bash
# Install multiple JDKs
sdk install java 11.0.22-tem
sdk install java 17.0.10-tem
sdk install java 21.0.2-tem
# Switch for a session
sdk use java 11.0.22-tem
# Set a default
sdk default java 11.0.22-tem
Checking Your Current Setup
# Check Java version
java -version
# Check what Hadoop thinks JAVA_HOME is
hadoop version
# Check which JVM is actually running the NameNode
ps aux | grep NameNode
# Find the PID, then:
ls -la /proc/<PID>/exe
Summary Recommendations
| Scenario | Recommended Java |
|---|---|
| Hadoop 2.10.x (legacy) | Java 8 |
| Hadoop 3.2.x production | Java 11 |
| Hadoop 3.3.x production | Java 11 (stable), Java 17 (3.3.5+ experimental) |
| Hadoop 3.4.x production | Java 11 or Java 17 |
| New deployments in 2025 | Java 11 (safe), Java 17 (forward-looking) |
| Future / Hadoop 3.5.x | Java 17 or Java 21 |
The Java version you run affects not just Hadoop but every ecosystem tool layered on top: Spark, Hive, HBase, Kafka. Coordinate your Java version across the entire platform before upgrading, and always test with your actual workloads — JVM GC behavior at scale can differ significantly from simple benchmarks.
