Skip to main content

4 posts tagged with "Release"

Hadoop release notes and highlights

View All Tags

Apache Spark 4.0 for Big Data Engineering: What's New and Why It Matters

· 7 min read
Bryan
Big Data Practitioner

Apache Spark 4.0 is the biggest leap for the project in years — and it's squarely aimed at the people who build and operate big data pipelines. The release sharpens four areas at once: SQL and workflow authoring, data types and observability, the Python/PySpark experience, and how clients connect to Spark. If you spin up a cluster on Databricks Runtime 17.0, these capabilities are available out of the box.

This article is an original, engineer-focused tour of what changed in Spark 4.0 and why each change matters in practice. If you want the fundamentals first, see our primers on Spark's key components and how Spark supports big data processing.

Hadoop 3 Features and Enhancements: A Deep Dive (2026)

· 12 min read
Hadoop.so Editorial Team
Big Data Engineers

Apache Hadoop 3 was the first release in nearly a decade that made operators rethink how they buy storage. Erasure coding cut disk overhead from 200% to 50%. The NameNode HA cap doubled, then more. The MapReduce shuffle path moved into native code. YARN learned to manage long-running services and Docker workloads. And every default port that lived in the Linux ephemeral range was moved out of it.

Several years after the 3.0 GA, Hadoop 3.3 and 3.4 lines are the de-facto on-prem standard, and most cloud Hadoop distributions (EMR, Dataproc, HDInsight, CDP) ship a 3.x core. This deep dive walks through every major feature in the Hadoop 3 line — what changed, why it matters, and where the tradeoffs hide — and ends with a side-by-side Hadoop 2.x vs 3.x comparison table.