Chapter 8. Spark SQL

Spark SQL provides an important feature in the Spark ecosystem, that is, integration with different data sources as well as the capability to interact with other subsystems, such as visualization. As we know, in modern data stacks, no stack is an island by itself, and in many ways, the versatility of integration with other components is an important capability. Obviously, the role of Spark SQL is not to replace SQL databases. We see it more as a versatile query interface for Spark data that complements the data wrangling and input capabilities of Spark. The ability to scale complex data operations makes sense only when one can utilize the results in flexible ways, and Spark SQL achieves that. We'll cover the following topics ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.