Skip to main content

About hadoop.so

hadoop.so is a community-driven reference site for Apache Hadoop and the broader big data ecosystem.

Our goal is to provide clear, up-to-date documentation and practical guides for data engineers, developers, and architects working with Hadoop at any scale.

What We Cover

  • Apache Hadoop Core — HDFS, MapReduce, YARN
  • The Hadoop Ecosystem — Hive, Spark, HBase, Kafka, Oozie, ZooKeeper, and more
  • Operations — Cluster setup, high availability, security (Kerberos), and performance tuning
  • Cloud Integration — Running Hadoop on AWS, GCP, and Azure; HDFS vs S3 trade-offs

Contributing

Found an error or want to add a guide? Visit the Apache Hadoop project or reach out via Stack Overflow.