Skip to main content

Next Steps

You now understand the four pillars of Apache Hadoop — HDFS, MapReduce, YARN, and the Ecosystem. Here’s how to continue your journey:

Deepen Your Knowledge

Try Real Data

Download a public dataset and experiment:

# Wikipedia pagecount data (a classic Hadoop dataset)
wget https://dumps.wikimedia.org/other/pagecounts-raw/2016/2016-01/pagecounts-20160101-000000.gz
hdfs dfs -put pagecounts-20160101-000000.gz /data/wikipedia/

# Run a word count job on it
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
wordcount /data/wikipedia/ /output/wiki-wordcount

Explore the Ecosystem

Once comfortable with the core, explore these projects:

Next StepWhy
Apache HiveQuery HDFS data with SQL
Apache SparkFaster, more flexible processing
Apache HBaseRandom-access storage on HDFS
Apache KafkaReal-time event streaming

Join the Community