Hadoop Version - Understanding Hadoop Versions
Apache Hadoop is the core engine of open source large scale data processing and has seen massive growth during last few years. Most of the big data technologies and systems are based on hadoop and its ecosystem.
This article describe different branches of hadoop releases and their associated latest version along-with core features of different hadoop versions.
Main Hadoop Versions and Release Dates
There are three active Hadoop release versions based on their major version - 2.7, 2.6, and 2.5. There is also a 3.0 release getting ready, but at this stage it is in alpha stage.
Release Branch | Version | Release Date |
---|---|---|
3.0 release | alpha | Sep 3, 2016 |
2.7 release | 2.7.3 | Aug 25, 2016 |
2.6 release | 2.6.5 | Oct 8, 2016 |
2.5 release | 2.5.2 | Nov 19, 2014 |
Which Hadoop Version to Use
As of Jan 2017, 2.7 branch of hadoop is the most stable and widely used. So if you are starting fresh, then it is recommended to use latest release in 2.7 branch - i.e., 2.7.3.
If you are on 2.6 branch, it is recommended to upgrade to version 2.6.5.
If you are 2.5 branch, it is recommended to upgrade to 2.7 branch.
Hadoop Versions - Major Features
Version 3.0 - Major features
Currently in alpha state
- Minimum required Java version increased from Java 7 to Java 8
- Support for erasure encoding in HDFS
- YARN Timeline Service v.2
- Hadoop Shell script rewrite
- MapReduce task-level native optimization
- Support for more than 2 NameNodes.
- Default ports of multiple services have been changed.
- Support for Microsoft Azure Data Lake filesystem connector
- Intra-datanode balancer
- Reworked daemon and task heap management
Version 2.7 - Major features
2.7.3 version
- 221 bug fixes
2.7.2 version
- 155 bug fixes
2.7.1 version
- 131 bug fixes
2.7.0 version - New features
- HADOOP-9629 - Support Windows Azure Storage - Blob as a file system in Hadoop.
- HDFS (HDFS-3107) - Support for file truncate
- HDFS (HDFS-7584) - Support for quotas per storage type
- HDFS (HDFS-3689) - Support for files with variable-length blocks
- YARN (YARN-3100) - Make YARN authorization pluggable
- YARN (YARN-1492) - Automatic shared, global caching of YARN localized resources (beta)
- MAPREDUCE-5583 - Ability to limit running Map/Reduce tasks of a job
- MAPREDUCE-4815 - Speed up FileOutputCommitter for very large jobs with many output files.