Storage

The preferable format of a stream is key-value pair. This format is very well represented by the JSON and Avro formats. The preferred storage to persist key-value type data is NoSQL data stores such as HBase and Cassandra. There are in total 100 NoSQL open source databases in the market these days. It's very challenging to choose the right database, one which supports storage to real-time events, because all these databases offer some unique features for data persistence. A few examples are schema agnostic, highly distributable, commodity hardware support, data replication, and so on.

The following image explains all stream processing components:

In this chapter we will talk about message queue and stream processing frameworks in ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.