About hadoop.so

hadoop.so is a community-driven reference site for Apache Hadoop and the broader big data ecosystem.

Our goal is to provide clear, up-to-date documentation and practical guides for data engineers, developers, and architects working with Hadoop at any scale.

What We Cover

Apache Hadoop Core — HDFS, MapReduce, YARN
The Hadoop Ecosystem — Hive, Spark, HBase, Kafka, Oozie, ZooKeeper, and more
Operations — Cluster setup, high availability, security (Kerberos), and performance tuning
Cloud Integration — Running Hadoop on AWS, GCP, and Azure; HDFS vs S3 trade-offs

Contributing

Found an error or want to add a guide? Visit the Apache Hadoop project or reach out via Stack Overflow.

What We Cover​

Contributing​

What We Cover

Contributing