Spark is a general purpose, in-memory, distributed computation engine. The Spark Streaming API is an extension of the core Spark library which was designed with scalability, high throughput, and fault tolerance for streaming (unbounded) data goals in mind. Spark Streaming integrates with a variety of data sources such as TCP network sockets, HTTP server logs, kafka producers, social media streams, and so on.
The streams and complex events are processed with generic operations such as MapReduce, join, and windowing. The data in motion can be analysed, aggregated, filtered, and sent to downstream applications, persistent storage, or live dashboards. Machine learning and graph processing algorithms and APIs can be applied to ...