Chapter 6Storing Streaming Data

One of the primary reasons for building a streaming data system is to allow decoupled communication and access between different aspects of the system. A key system is the storage and backup mechanism for both raw data as well as data that has been processed by one or more of the processing environments covered in the previous chapter.

Processing the data is one thing, but for it to be delivered to the end user it needs to be stored somewhere. That storage location could be the processing system, using something like Storm's Distributed Remote Procedure Calls (DRPC) and in-bolt memory storage. However, in a production environment this simply isn't practical. First, the data usually need to persist for a time, which means the memory requirements become prohibitive. Second, it means that maintenance for the processing system necessitates an outage of any external interfaces, despite the fact that the two have nothing to do with each other. Finally, it is usually desirable to persist results to tertiary storage (disks or “cloud” storage devices) so that the data may be more easily analyzed for long-term trends.

This chapter considers how to store data after it has been processed. There are a number of storage options available for processing systems that need to deliver their data to some sort of front-end interface, typically either an application programming interface (API) or a user interface (UI). Although there are dozens of potential options, ...

Get Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.