Chapter 6. Architecting Multipurpose Infrastructure

As data processing technology has matured, enterprise developers and architects have realized that one of the keys to scaling effectively is minimizing complexity. In the interest of limiting complexity, the trend in enterprise data architecture is moving toward using fewer, more versatile systems, rather than many narrow-purpose systems. In addition to complexity, adding systems requires more administration, hiring developers and administrators with more specialized skillsets, and more development work to glue all of the systems together.

The rise of NoSQL grew out of limitations in legacy RDBMS technology, specifically the lack of scalability and inability to handle semi-structured data. Suppose you’re an AdTech company and you manage your business operations, like keeping track of funds available in active campaigns, in a relational database, but track clickstream data, whether or not a user clicked on an ad, in a NoSQL key-value store or document store.

Now suppose you want to analyze the effectiveness of a given campaign. In order to correlate dollars spent with conversions, you need to synthesize data coming from two different sources. This requires an additional aggregation layer, probably in your application. In addition to adding latency due to data transfer, this architecture requires writing a custom aggregation layer and, potentially, additional custom code for synchronizing data between the separate stores.

While ...

Get Building Real-Time Data Pipelines now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Building Real-Time Data Pipelines by Gary Orenstein, Conor Doherty, Kevin White, Steven Camina

Chapter 6. Architecting Multipurpose Infrastructure

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly