MapReduce Fundamentals

MapReduce is Hadoop's built-in programming model for processing large datasets in parallel. A job is divided into two phases: Map and Reduce.

The MapReduce Model

Input Data (HDFS blocks)
     |
     v
  Mapper (runs on each block locally) — emits (key, value) pairs
     |
     v
  Shuffle & Sort (framework groups values by key)
     |
     v
  Reducer (aggregates values per key)
     |
     v
Output Data (written back to HDFS)

Classic Example: Word Count

Mapper

public class TokenizerMapper
    extends Mapper<Object, Text, Text, IntWritable> {

  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();

  public void map(Object key, Text value, Context context)
      throws IOException, InterruptedException {
    StringTokenizer itr = new StringTokenizer(value.toString());
    while (itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      context.write(word, one); // emit (word, 1)
    }
  }
}

Reducer

public class IntSumReducer
    extends Reducer<Text, IntWritable, Text, IntWritable> {

  public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptedException {
    int sum = 0;
    for (IntWritable val : values) {
      sum += val.get();
    }
    context.write(key, new IntWritable(sum));
  }
}

Running the Job

hdfs dfs -mkdir -p /input
hdfs dfs -put *.txt /input/

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
  wordcount /input /output

hdfs dfs -cat /output/part-r-00000

Key Concepts

Concept	Description
InputSplit	A logical chunk of input assigned to one Mapper
Combiner	Optional mini-Reducer that runs after Map to reduce shuffle data
Partitioner	Determines which Reducer receives each key

Combiner Optimization

job.setCombinerClass(IntSumReducer.class);

Monitoring Jobs

mapred job -list
mapred job -status <job_id>
mapred job -kill <job_id>

Next Steps

See YARN & Resource Management to understand how Hadoop schedules and manages jobs.

The MapReduce Model​

Classic Example: Word Count​

Mapper​

Reducer​

Running the Job​

Key Concepts​

Combiner Optimization​

Monitoring Jobs​

Next Steps​