Summary

This is an interesting chapter where we discussed the broader and wider picture of where Spark fits in the big data and analytics ecosystem. First, we looked at the Datasets that accompany this book as well as some interesting IDEs. We then discussed the role of data scientists and what they expect from a Spark stack, which led to our discussion to the Spark-based Data Lake architecture and then the Spark stack. We also looked at Parquest as an efficient storage format.

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.