Chapter 4. The Goal and Architecture of a Customer Event Hub

Modern data infrastructures operate on vast volumes of data generated continuously and by independent channels. Enterprises such as consumer banks, which have many such channels, are beginning to implement a single view of customers that can power all points of customer contact.

In a session at Strata + Hadoop World New York 2015, Arvind Prabhakar, CTO at data integration company StreamSets, presented an architectural approach for implementing a customer event hub. He also discussed the key challenges and solutions to overcome them.

What Is a Customer Event Hub?

The Customer Event Hub (CEH) makes it possible for organizations to combine data from disparate sources in order to create a single view of customer information. This centralized information can be used across departments and systems to gain a greater understanding of the customer. “It’s the next logical step from what has traditionally been called a 360 degree customer view in the enterprise,” said Prabhakar. “But it differs greatly from the 360 degree in that it is bi-directional and allows for an interactive experience for the customer,” he said. The goal is to enhance customer experience and provide targeted, personalized customer service.

360-Degree Customer View versus Customer Event Hub

In the 360-degree customer view, a customer is surrounded by an ever-increasing set of channels of interaction; the view is an augmentation of all of these channels, all the data, all interactions that are happening with one particular customer across all these different channels. The 360 view brings the data together to create a single view for consumption.

The purpose and advantage of having a 360 view, explained Prabhakar, is that it gives you a consistent understanding of the customer and helps you build relevant functionality. The problem is that these various channels are often implemented as silos and therefore they are isolated from one another, which creates a fragmented user experience. The CEH collapses all these channels into a single omnichannel.

“The key difference between a CEH and a 360-degree customer view is the interactivity,” said Prabhakar. “A 360-degree view is for consumption by the enterprise, whereas a CEH is a bi-directional channel” that allows for an interactive experience for the customer, as well; it gives the customer a consistent view of the enterprise, which is critical in establishing relationships with your customers.

A Customer Event Hub in Action

For example, describes Prabhakar, a high-value banking customer is trying to transfer money online but cannot do it. As a result, the customer calls the bank’s technical support line. Unfortunately, this leads to even greater frustration.

Prabhakar suggests instead that financial institutions consider the possibilities of a call center response application that understands the needs of the caller. If the system knew, for example, what the customer wanted, it could route the caller to a much more immediate answer and result in a much more satisfying experience. “That’s the kind of use you can get from a Customer Event Hub,” he said.

Key Advantages for Your Business

According to Prabhakar, “All enterprises need to operate a CEH; it’s imperative for business agility as well as competitive advantage.” Some of the benefits of operating a CEH include:

Enhanced customer service and real-time personalization

“We all want the services and channels we engage with to be aware of who we are, what we like, and to respond accordingly,” said Prabhakar. “But there’s often a lag between when we exhibit certain behaviors and when the systems pick them up.” A CEH provides a way for enterprises to bridge that gap.

Innovative event-driven applications

As we’re increasingly finding new ways of engaging and working with the social channels, the CEH gives you the capability of building the next-generation infrastructure for new applications.

Security

Security is enhanced because the CEH lets you track up-to-the-minute activity on all your users and customers interacting with your enterprise.

Increased operational efficiency

With the CEH, you can eliminate the losses that are the result of a mismanaged application, mismanaged effort, or mismanaged expenses. This lowers the operational costs, which also means you increase the operational efficiency of the enterprise.

Now that we understand the purpose and benefits of CEHs, let’s take a look at how to build one.

Architecture of a CEH

At a high level, there are three processes that go into the working of a CEH:

  • Capturing and integrating events coming from all channels.

  • Normalizing, sanitizing and standardizing events, including addressing regulatory compliance concerns.

  • Delivering data for consumption through various feeds and end-consuming applications.

Capturing and Integrating Events

According to Prabhakar, the first phase of enablement involves pulling together or capturing all the interaction channels and then integrating them. This means creating an event consolidation framework, often referred to as the event fire hose. This is how you bring the events into the CEH.

What kind of data and events are in the fire hose? Social media, structured and unstructured data, electronic files, binary files, teller notebooks, and so on—in other words, an ever-exploding and always expanding set of formats, both human- and machine-generated.

Naturally, due to the diversity of formats, you’re not going to have a uniform level of control over all of this data. “Your capability of running an application across all these channels will be limited by not being natively tied to those channels,” said Prabhakar. And this is what the CEH solves.

Sanitizing and Standardizing Events

Next, you need to sanitize and standardize the data. According to Prabhakar, “The goal is to create a consistent understanding of those events for your consuming endpoints.” An additional goal, of course, is to meet compliance and regulatory requirements. Ultimately though, standardization makes it possible for you to thread a story together across these channels and events.

Prabhakar explained that standardizing the data and preparing it for consumption primarily involves attaching metadata to every event. This process generally involves threading a handling mechanism around each event so that anybody can identify it, parse it out, and take action around it.

Delivering Data for Consumption

With the CEH, you can deliver data to various feeds and applications. According to Prabhakar, “If you’re delivering the data to an HBase cluster, chances are your online web applications could directly reference them and you can have it deliver these events in a very low latency manner.” Thus, you can access the data online across your enterprise. Prabhakar explained that you can also send this data into batch or offline processing stores.

In the earlier customer experience example, the call center application magically knew that a valuable customer had been trying to do something on the company’s website. It knows because the data has been delivered to another channel to produce a more meaningful user engagement.

Sounds relatively straightforward, doesn’t it? If so, why isn’t everyone building one?

Drift: The Key Challenge in Implementing a High-Level Architecture

Why aren’t CEHs very common yet? Prabhakar explains that, at a high level, it boils down to one word: drift. “Drift is the manifestation of change in the data landscape,” he said. Drift can be defined as the accumulation of unanticipated changes that occur in data streams and can corrupt data quality and pipeline reliability. This results in unreliable real-time analysis, which ultimately means that bad data can lead to bad decisions that affect the entire business. Drift can be categorized into three distinct types:

Infrastructure drift

This refers to the hardware and software and everything related to them such as physical layouts, topologies, data centers, and deployments; all of which are in a constant state of flux.

Structural drift

Prabhakar explained that flexibility is usually a positive structural attribute; therefore, formats such as JSON are popular in part because they are flexible. The drawback, however, is the very thing that makes them attractive: they can change without notice. This means that if you have events in JSON format, they might change.

Semantic drift

The most subtle and perhaps most dangerous kind of drift, says Prabhakar, is the semantic drift. Semantic drift refers to data that you’re consuming that has either changed its meaning or for which the consuming applications must change their interpretation of it. According to Prabhakar in a 2015 blog post written for elastic.co, “When semantic changes are missed or ignored—as is common—data quality erodes over time with the accumulation of dropped records, null values, and changing meaning of values.”

According to Prabhakar, this infrastructural drift becomes a monumental challenge to overcome in order to be able to build a CEH. Why? Because drift means change and everything from the applications to the data streams are in a constant state of change and evolution.

So, how do you deal with this? One way is to write code, he says, or your own topologies or producers and consumers. Unfortunately, as Prabhakar points out, “Those will get brutally tied to what you know today, and they would not be resilient enough to accommodate the changes that are coming in.”

Ingestion Infrastructures to Combat Drift

CEHs act as the “front door” for an event pipeline. Reliable and high-quality data ingestion is a critical component of any analytics pipeline; therefore, what you need is an ingestion infrastructure to address the problem of drift.

One such infrastructure, according to Prabhakar, is the StreamSets Data Collector. In Prabhakar’s 2015 blog post for elastic.co, he writes, “StreamSets Data Collector provides an enhanced data ingestion process to ensure that data streaming into Elasticsearch is pristine, and remains so on a continuous basis.” It provides an open source Apache-licensed ingestion infrastructure that helps you to build continuously curated ingestion pipelines, and improves upon legacy ETL and hand-coded solutions.

Microsoft Azure also offers an event hub ingress service—Azure Event Hubs—which is a highly scalable data ingress service. Additional ingestion and streaming tools include Flume, Chukwa, Scoop, and others.

Get Data Infrastructure for Next-Gen Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.