CHAPTER 15

image

Data Science with Hadoop

“Data science” is a broad term that describes the study of extracting knowledge from data. Data science is interdisciplinary in that it usually requires expertise in a variety of fields. Some examples include these:

  • Business domain expertise
  • Mathematics and statistics
  • Scientific method
  • Computer science and data engineering
  • Visualization

The specific business domain and nature of the data strongly influence the techniques needed to solve specific problems. There is no one-size-fits-all solution for data science. Some problems of interest include these:

  • Text mining and natural language processing
  • Machine learning ...

Get Pro Apache Hadoop, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.