HDFS Deep Dive

HDFS (Hadoop Distributed File System) is the primary storage layer of Hadoop. It is designed to run on commodity hardware and reliably store very large files — from gigabytes to petabytes.

Architecture

HDFS uses a master/worker architecture:

NameNode — Manages the filesystem namespace (file tree and metadata). There is one active NameNode (plus an optional Standby for HA).
DataNode — Stores actual data blocks. There are many DataNodes spread across the cluster.
Secondary NameNode — Periodically merges the NameNode's edit log with the filesystem image (not a hot standby).

Client
  ├─► NameNode (metadata: where is block X?)
  └─► DataNode 1, DataNode 2, DataNode 3 (actual data blocks)

How Replication Works

By default, each block is replicated 3 times across different DataNodes (and different racks). If a DataNode fails, the NameNode detects missing replicas and instructs another DataNode to create a copy.

Basic HDFS Commands

# List root directory
hdfs dfs -ls /

# Create a directory
hdfs dfs -mkdir -p /user/hadoop/data

# Upload a local file
hdfs dfs -put localfile.txt /user/hadoop/data/

# Download a file from HDFS
hdfs dfs -get /user/hadoop/data/localfile.txt ./output.txt

# View file contents
hdfs dfs -cat /user/hadoop/data/localfile.txt

# Check disk usage
hdfs dfs -du -h /user/hadoop/

# Delete a file
hdfs dfs -rm /user/hadoop/data/localfile.txt

# Check filesystem health
hdfs fsck / -files -blocks

Block Size

The default HDFS block size is 128 MB (configurable). Large block sizes reduce NameNode memory usage and network overhead for sequential reads.

hdfs dfs -D dfs.blocksize=256m -put bigfile.csv /data/

Safe Mode

After startup, HDFS enters safe mode while DataNodes report their blocks to the NameNode. Writes are blocked until a minimum replication threshold is met.

hdfs dfsadmin -safemode get
hdfs dfsadmin -safemode leave

Next Steps

Move on to MapReduce Fundamentals to learn how to process the data you store.

Architecture​

How Replication Works​

Basic HDFS Commands​

Block Size​

Safe Mode​

Next Steps​