Chapter 6. Ingesting data with Spark Streaming

This chapter covers

  • Using discretized streams
  • Saving computation state over time
  • Using window operations
  • Reading from and writing to Kafka
  • Obtaining good performance

Real-time data ingestion, in today’s high-paced, interconnected world, is getting increasingly important. There is much talk today about the so-called Internet of Things or, in other words, a world of devices in use in our daily lives, which continually stream data to the internet and to each other and make our lives easier (in theory, at least). Even without those micro-devices overwhelming our networks with their data, many companies today need to receive data in real-time, learn from it, and act on it immediately. After all, ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.