Chapter 4Hadoop

Hadoop is a hot and trendy topic that is growing in popularity as an emerging, modern technology to tackle big data and the increasing amount of data being generated. It is a new approach to manage all of your structured and semi-structured data, analyze, and deliver results. You can find many books, blogs, articles, and websites about Hadoop. In addition, there are many conferences focusing on Hadoop, and many vendors are jumping on the Hadoop bandwagon to develop, integrate, and implement this technology. Customers and prospects are excited because of its open source and claim to be low cost. I will modestly cover Hadoop at a high level in its simplistic form—how it is related to analytics and data management, and how it can fit into your architecture. If your desire is to have a more in-depth understanding of Hadoop, I kindly suggest additional resources from the Internet and software vendors such as Hortonworks or Cloudera. The following topics will be covered in this chapter:

  • What is Hadoop?
  • Why is Hadoop in the big data environment?
  • How does Hadoop fit in the modern architecture?
  • What are some best practices?
  • What are some use cases and success stories?
  • What are the benefits of using Hadoop in big data?

BACKGROUND

The history of Hadoop is fluid. According to my research, the fundamental technology behind Hadoop was invented by Google and had a tremendous amount of influence from Yahoo!. The underlying concept for this technology is to conveniently index ...

Get Leaders and Innovators now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.