About hadoop.so
hadoop.so is a community-driven reference site for Apache Hadoop and the broader big data ecosystem.
Our goal is to provide clear, up-to-date documentation and practical guides for data engineers, developers, and architects working with Hadoop at any scale.
What We Cover
- Apache Hadoop Core — HDFS, MapReduce, YARN
- The Hadoop Ecosystem — Hive, Spark, HBase, Kafka, Oozie, ZooKeeper, and more
- Operations — Cluster setup, high availability, security (Kerberos), and performance tuning
- Cloud Integration — Running Hadoop on AWS, GCP, and Azure; HDFS vs S3 trade-offs
Contributing
Found an error or want to add a guide? Visit the Apache Hadoop project or reach out via Stack Overflow.