Skip to main content

Hadoop and Java: A Version Compatibility Guide

· 5 min read
Hadoop.so Editorial Team
Big Data Engineers

Picking the wrong Java version for your Hadoop cluster is one of the most common causes of cryptic build failures, runtime exceptions, and upgrade blockers. This guide maps Hadoop releases to their supported Java versions, explains what changed between Java versions, and offers practical recommendations for 2025.

Quick Reference: Hadoop ↔ Java Compatibility Matrix

Hadoop VersionJava 8Java 11Java 17Java 21
2.10.xRequiredNot supportedNot supportedNot supported
3.2.xSupportedSupportedNot supportedNot supported
3.3.xSupportedSupportedLimited (3.3.5+)Not supported
3.4.xSupportedSupportedSupportedPreview
3.5.x (planned)DeprecatedSupportedSupportedSupported

Rule of thumb for 2025: Run Java 11 for stability. Java 17 with Hadoop 3.4.x is becoming viable. Java 8 is approaching EOL and should be phased out.


Java 8: The Legacy Standard

Hadoop 3.x requires Java 8 as the minimum. Java 8 was the de facto standard for Hadoop workloads from 2014 through about 2022.

Current status: Oracle Java 8 reached end-of-life for public updates in March 2022. Adoptium (Temurin) and Amazon Corretto still provide free, maintained Java 8 builds, but the ecosystem is moving on.

When to use: Only if your organization has a hard dependency on a third-party library or Hadoop ecosystem component that isn't yet compatible with Java 11+.

Install Temurin 8 (Ubuntu/Debian)

sudo apt-get install -y wget apt-transport-https
wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo apt-key add -
echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | sudo tee /etc/apt/sources.list.d/adoptium.list
sudo apt-get update
sudo apt-get install -y temurin-8-jdk

Java 11, released in 2018, is the most tested Java version with Hadoop 3.x as of 2025. All major ecosystem tools (Spark, Hive, HBase, Flink) support Java 11.

Key changes from Java 8

  • Module system (Project Jigsaw)--add-opens flags needed for reflective access
  • G1GC is now default — better pause times for large heaps (critical for NameNode with hundreds of millions of files)
  • javax.*jakarta.* — does not affect Hadoop itself, but affects some ecosystem tools
  • Removed deprecated APIsun.misc.Unsafe usage patterns changed

Required JVM flags for Hadoop on Java 11+

Hadoop's internal code (and many dependencies) use reflective access that Java 11's module system restricts by default. Add these to hadoop-env.sh:

export HADOOP_OPTS="$HADOOP_OPTS \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED \
--add-opens=java.base/sun.net.dns=ALL-UNNAMED \
--add-opens=java.base/sun.net.util=ALL-UNNAMED"

Hadoop 3.3.0 and later automatically adds most of these — check your version before manually adding them.


Java 17: Stricter Modules, Better Performance

Java 17 (LTS, 2021) tightened the module encapsulation started in Java 11. It fully enforces strong encapsulation of JDK internals that Java 11 only warned about.

Hadoop compatibility

  • Hadoop 3.3.4 and earlier: Not supported — too many internal reflection violations
  • Hadoop 3.3.5+: Experimental support
  • Hadoop 3.4.0+: Officially supported

What required fixing for Java 17

The main breaking changes were in sun.misc.Unsafe usage and java.lang.reflect access patterns throughout Hadoop's RPC and serialization code. The community patched these across multiple JIRAs (HADOOP-17975, HADOOP-18079, and others).

GC improvements relevant to Hadoop

Java 17 includes ZGC and Shenandoah as production-ready collectors (not just experimental as in Java 11). For NameNode workloads with large heaps (32–512GB), ZGC's sub-millisecond pauses can dramatically improve NameNode responsiveness:

# hadoop-env.sh — NameNode with ZGC
export HDFS_NAMENODE_OPTS="-Xms64g -Xmx64g -XX:+UseZGC $HDFS_NAMENODE_OPTS"

Java 21: The Future LTS

Java 21 (LTS, 2023) is the current long-term support release. It brings virtual threads (Project Loom), pattern matching, and record patterns.

Hadoop compatibility

Full support for Java 21 is still in progress as of early 2025. Hadoop 3.4.x has preliminary support — test thoroughly before using in production.

Virtual threads are particularly interesting for Hadoop's IPC layer, which handles thousands of concurrent RPC connections. Future Hadoop releases may leverage virtual threads to reduce thread-pool overhead on NameNodes and ResourceManagers.


Setting JAVA_HOME for Hadoop

Hadoop reads JAVA_HOME from hadoop-env.sh. Always set it explicitly rather than relying on the system default — different users or shell environments may have different defaults:

# /etc/hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64

# Or use java_home helper on macOS:
# export JAVA_HOME=$(/usr/libexec/java_home -v 11)

Verify that every node in the cluster uses the same Java version and JAVA_HOME path to avoid subtle serialization incompatibilities.


Multi-JDK Environments: SDKMAN!

If you maintain multiple Hadoop environments with different Java requirements, SDKMAN! makes switching trivial:

# Install SDKMAN!
curl -s "https://get.sdkman.io" | bash

# Install multiple JDKs
sdk install java 11.0.22-tem
sdk install java 17.0.10-tem
sdk install java 21.0.2-tem

# Switch for a session
sdk use java 11.0.22-tem

# Set a default
sdk default java 11.0.22-tem

Checking Your Current Setup

# Check Java version
java -version

# Check what Hadoop thinks JAVA_HOME is
hadoop version

# Check which JVM is actually running the NameNode
ps aux | grep NameNode
# Find the PID, then:
ls -la /proc/<PID>/exe

Summary Recommendations

ScenarioRecommended Java
Hadoop 2.10.x (legacy)Java 8
Hadoop 3.2.x productionJava 11
Hadoop 3.3.x productionJava 11 (stable), Java 17 (3.3.5+ experimental)
Hadoop 3.4.x productionJava 11 or Java 17
New deployments in 2025Java 11 (safe), Java 17 (forward-looking)
Future / Hadoop 3.5.xJava 17 or Java 21

The Java version you run affects not just Hadoop but every ecosystem tool layered on top: Spark, Hive, HBase, Kafka. Coordinate your Java version across the entire platform before upgrading, and always test with your actual workloads — JVM GC behavior at scale can differ significantly from simple benchmarks.