Chapter 5. Processing Streaming Data for Visualization

Processing data is the most common operation mentioned in this book. There are specific considerations to bear in mind when processing streaming data to be visualized.

Batch Processing

Batch processing is the most common approach for handling high volumes of data. The process of batching means that data will be cached somewhere to be processed at intervals. The processing interval is chosen according to the data’s significance and the ability to take actions on it. Processing daily batches overnight is by far the most common approach, but daily batch processing falls short when there are significant events that may have occurred almost 24 hours earlier by the time the report is reviewed by a person. An indicator that your brand has been mimicked publicly for malicious purposes would be an instance where every minute counts. In order to deal with this, hourly batch processing is often used. Most applications will not process batches more often than hourly because of perceived limitations in being able to act on the data any faster. Another reason for not processing batches too often is that it’s a complex process and has the potential to not finish before processing of the next batch begins, causing a backlog.

The process that runs at the chosen interval will query the data from where it’s stored in order to create ...

Get Visualizing Streaming Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.