Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture

CHAPTER 4

Streaming Data

In the previous chapter, we focused on a long-term processing job, which runs in a Hadoop cluster and leverages YARN or Hive. In this chapter, I would like to introduce you to what I call the 2014 way of processing the data: streaming data. Indeed, more and more data processing infrastructures are relying on streaming or logging architecture that ingest the data, make some transformation, and then transport the data to a data persistency layer.

This chapter will focus on three key technologies: Kafka, Spark, and the ELK stack from Elastic. We will work on combining them to implement different kind of logging architecture ...

Get Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture by Bahaaldine Azarmi

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly