O'Reilly logo

Building Real-Time Data Pipelines by Steven Camina, Kevin White, Conor Doherty, Gary Orenstein

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Moving from Data Silos to Real-Time Data Pipelines

Providing a modern user experience at scale requires a streamlined data processing infrastructure. Users expect tailored content, short load times, and information to always be up-to-date. Framing business operations with these same guiding principles can improve their effectiveness. For example, publishers, advertisers, and retailers can drive higher conversion by targeting display media and recommendations based on users’ history and demographic information. Applications like real-time personalization create problems for legacy data processing systems with separate operational and analytical data silos.

The Enterprise Architecture Gap

A traditional data architecture uses an OLTP-optimized database for operational data processing and a separate OLAP-optimized data warehouse for business intelligence and other analytics. In practice, these systems are often very different from one another and likely come from different vendors. Transferring data between systems requires ETL (extract, transform, load) (Figure 3-1).

Legacy operational databases and data warehouses ingest data differently. In particular, legacy data warehouses cannot efficiently handle one-off inserts and updates. Instead, data must be organized into large batches and loaded all at once. Generally, due to batch size and rate of loading, this is not an online operation and runs overnight or at the end of the week. 

Figure 3-1. Legacy data processing model ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required