Summary

It is important to note that Structured Streaming is currently (at the time of writing) not production-ready. It is, however, a paradigm shift in Spark that will hopefully make it easier for data scientists and data engineers to build continuous applications. While not explicitly called out in the previous sections, when working with streaming applications, there are many potential problems that you will need to design for, such as late events, partial outputs, state recovery on failure, distributed reads and writes, and so on. With structured streaming, many of these issues will be abstracted away to make it easier for you to build continuous applications.

We encourage you to try Spark Structured Streaming so you will be able to easily ...

Get Learning PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.