Loading DataFrames and setup from an external source

In this recipe, we examine data manipulation using SQL. Spark's approach to provide both a pragmatic and SQL interface works very well in production settings in which we not only require machine learning, but also access to existing data sources using SQL to ensure compatibility and familiarity with existing SQL-based systems. DataFrame with SQL makes for an elegant process toward integration in real-life settings.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.