Apache Hadoop is the leading Big Data platform that allows to process large datasets efficiently and at low cost. Other Big Data platforms are MongoDB, Cassandra, and CouchDB. This section describes Apache Hadoop core concepts and its ecosystem.
The following image shows core Hadoop components:
At the core, Hadoop has two key components:
For example, say we need to store a large file of 1 TB in size and we only have some commodity servers each with limited storage. Hadoop Distributed File System can help here. We first install ...