How it works...

In this recipe, we first discover the underlying schema for a person object using a quick method (using a JSON object) as described in step 6. The resulting DataFrame will know the schema that we subsequently impose on the streaming input (simulated via streaming a file) and treated as a streaming DataFrame as seen in step 7.

The ability to treat the stream as a DataFrame and act on it using a functional or SQL paradigm is a powerful concept that can be seen in step 8. We then proceed to output the result using writestream() with append mode and a 1-second batch interval trigger.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.