Rack Awareness
What Is Rack Awareness?
In a data center, servers are organized into racks — physical enclosures sharing a top-of-rack (ToR) switch. Network bandwidth within a rack is much higher than bandwidth between racks.
Without rack awareness, HDFS places replicas randomly. With rack awareness configured, HDFS places replicas to maximize both data locality and fault tolerance:
Default 3-replica placement:
Replica 1 → Same node as writer (maximum locality)
Replica 2 → Different rack (fault tolerance)
Replica 3 → Same rack as Replica 2, different node (bandwidth efficiency)
This means the cluster survives an entire rack failure without data loss, while keeping at least one replica close to clients for fast reads.
How It Works
HDFS determines rack location by calling an external topology script. The NameNode calls this script with a DataNode's IP address and expects a rack path string in return:
/datacenter1/rack01
/datacenter1/rack02
/datacenter2/rack01
If no script is configured, all nodes are placed in the default rack (/default-rack), which disables rack-aware placement.
Writing a Topology Script
Create /etc/hadoop/topology.sh:
#!/bin/bash
# Maps IP addresses to rack paths
# Called by HDFS with one or more IPs as arguments
RACK_MAP=(
"10.0.1.0/24=/dc1/rack01"
"10.0.2.0/24=/dc1/rack02"
"10.0.3.0/24=/dc2/rack01"
"10.0.4.0/24=/dc2/rack02"
)
for ip in "$@"; do
rack="/default-rack"
for entry in "${RACK_MAP[@]}"; do
subnet="${entry%%=*}"
rack_path="${entry##*=}"
# Simple CIDR match using ipcalc or manual comparison
if [[ $(ipcalc -n "$ip/$subnet" 2>/dev/null | grep -c "NETWORK=") -eq 1 ]]; then
rack="$rack_path"
break
fi
done
echo "$rack"
done
Make it executable:
chmod +x /etc/hadoop/topology.sh
Test it before deploying
/etc/hadoop/topology.sh 10.0.1.11 10.0.2.45 10.0.3.22
# Expected output:
# /dc1/rack01
# /dc1/rack02
# /dc2/rack01
Configuration
core-site.xml
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/topology.sh</value>
</property>
<!-- Maximum args passed to the script per invocation (default: 100) -->
<property>
<name>net.topology.script.number.args</name>
<value>100</value>
</property>
Restart the NameNode after changing topology configuration:
hdfs --daemon stop namenode
hdfs --daemon start namenode
Verifying Rack Assignment
# Show rack topology for all DataNodes
hdfs dfsadmin -report | grep -E "Name:|Rack:"
# Display topology tree
hdfs dfsadmin -printTopology
Example output:
Rack: /dc1/rack01
10.0.1.11:9866 (dn01.example.com)
10.0.1.12:9866 (dn02.example.com)
Rack: /dc1/rack02
10.0.2.21:9866 (dn03.example.com)
10.0.2.22:9866 (dn04.example.com)
Rack: /dc2/rack01
10.0.3.31:9866 (dn05.example.com)
10.0.3.32:9866 (dn06.example.com)
Using a Static Topology File
For clusters where IPs are stable, a Python script reading a static map file is simpler than subnet matching:
/etc/hadoop/topology.data:
10.0.1.11 /dc1/rack01
10.0.1.12 /dc1/rack01
10.0.2.21 /dc1/rack02
10.0.2.22 /dc1/rack02
10.0.3.31 /dc2/rack01
/etc/hadoop/topology.py:
#!/usr/bin/env python3
import sys
topology = {}
with open("/etc/hadoop/topology.data") as f:
for line in f:
parts = line.strip().split()
if len(parts) == 2:
topology[parts[0]] = parts[1]
for ip in sys.argv[1:]:
print(topology.get(ip, "/default-rack"))
chmod +x /etc/hadoop/topology.py
Then set net.topology.script.file.name to /etc/hadoop/topology.py.
Impact on YARN and MapReduce
Rack awareness also benefits YARN scheduling:
- YARN attempts to launch Map tasks on the node storing the input split (data-local).
- If unavailable, it tries a node in the same rack (rack-local).
- Only as a last resort does it schedule on a remote rack.
This locality preference dramatically reduces cross-rack network traffic for MapReduce and Spark jobs.
Multi-Datacenter Topology
For clusters spanning datacenters, use a three-level path:
/datacenter1/rack01
/datacenter1/rack02
/datacenter2/rack01
HDFS will prefer intra-datacenter placement for performance while guaranteeing at least one replica in the remote datacenter for disaster recovery — provided you have enough replication factor (typically dfs.replication=3 or higher).
Summary
| Scenario | Rack Path Format | Benefit |
|---|---|---|
| Single DC, multiple racks | /rack01 | Rack failure tolerance |
| Multiple DCs | /dc1/rack01 | DC failure tolerance |
| All on one rack | /default-rack | No benefit (avoid this in production) |