Summary

In this chapter, we examined the sources of unstructured data and the motivation behind analyzing the unstructured data. We explained various techniques that are required in pre-processing unstructured data and how Spark provides most of these tools out of the box. We also covered some of the algorithms supported by Spark that can be used in text analytics.

In the next chapter, we will go through different types of visualization techniques that are insightful in different stages of data analytics lifecycle.

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.