You are previewing Streaming Architecture.
O'Reilly logo
Streaming Architecture

Book Description

More and more data-driven companies are looking to adopt stream processing and streaming analytics. With this concise ebook, you’ll learn best practices for designing a reliable architecture that supports this emerging big-data paradigm.

Authors Ted Dunning and Ellen Friedman (Real World Hadoop) help you explore some of the best technologies to handle stream processing and analytics, with a focus on the upstream queuing or message-passing layer. To illustrate the effectiveness of these technologies, this book also includes specific use cases.

Ideal for developers and non-technical people alike, this book describes:

  • Key elements in good design for streaming analytics, focusing on the essential characteristics of the messaging layer
  • New messaging technologies, including Apache Kafka and MapR Streams, with links to sample code
  • Technology choices for streaming analytics: Apache Spark Streaming, Apache Flink, Apache Storm, and Apache Apex
  • How stream-based architectures are helpful to support microservices
  • Specific use cases such as fraud detection and geo-distributed data streams

Ted Dunning is Chief Applications Architect at MapR Technologies, and active in the open source community. He currently serves as VP for Incubator at the Apache Foundation, as a champion and mentor for a large number of projects, and as committer and PMC member of the Apache ZooKeeper and Drill projects. Ted is on Twitter as @ted_dunning.

Ellen Friedman, a committer for the Apache Drill and Apache Mahout projects, is a solutions consultant and well-known speaker and author, currently writing mainly about big data topics. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics. Ellen is on Twitter as @Ellen_Friedman.

Table of Contents

  1. Preface
    1. Who Should Use This Book
    2. What Is Covered
    3. Conventions Used in This Book
    4. Safari® Books Online
    5. How to Contact Us
  2. 1. Why Stream?
    1. Planes, Trains, and Automobiles: Connected Vehicles and the IoT
    2. Streaming Data: Life As It Happens
      1. Where Streaming Matters
    3. Beyond Real Time: More Benefits of Streaming Architecture
    4. Emerging Best Practices for Streaming Architectures
    5. Healthcare Example with Data Streams
    6. Streaming Data as a Central Aspect of Architectural Design
  3. 2. Stream-based Architecture
    1. A Limited View: Single Real-Time Application
    2. Key Aspects of a Universal Stream-based Architecture
    3. Importance of the Messaging Technology
    4. Choices for Real-Time Analytics
      1. Apache Storm
      2. Apache Spark Streaming
      3. Apache Flink
      4. Apache Apex
    5. Comparison of Capabilities for Streaming Analytics
    6. Summary
  4. 3. Streaming Architecture: Ideal Platform for Microservices
    1. Why Microservices Matter
    2. What Is Needed to Support Microservices
    3. Microservices in More Detail
    4. Designing a Streaming Architecture: Online Video Service Example
      1. A New Design: Infrastructure to Support Messaging
    5. Importance of a Universal Microarchitecture
    6. What’s in a Name?
    7. Why Use Distributed Files and NoSQL Databases?
    8. New Design for the Video Service
    9. Summary: The Converged Platform View
  5. 4. Kafka as Streaming Transport
    1. Motivations for Kafka
    2. Kafka Innovations
    3. Kafka Basic Concepts
      1. Ordering
      2. Persistence
    4. The Kafka APIs
      1. KafkaProducer API
      2. KafkaConsumer API
      3. Legacy APIs
    5. Kafka Utility Programs
      1. Load Balancing
      2. Mirroring
    6. Kafka Gotchas
      1. Kafka in Production Settings
      2. Limited Number of Topics and Partitions
      3. Manual Balancing of Partitions and Load
      4. No Inherent Serialization Mechanism
      5. Mirroring Deficiencies
    7. Summary
  6. 5. MapR Streams
    1. Innovations in MapR Streams
    2. History and Context of MapR’s Streaming System
    3. How MapR Streams Works
    4. How to Configure MapR Streams
    5. Geo-Distributed Replication
    6. MapR Streams Gotchas
  7. 6. Fraud Detection with Streaming Data
    1. Card Velocity
    2. Fast Response Decision to the Question: “Is It Fraud?”
    3. Multiuse Streaming Data
    4. Scaling Up the Fraud Detector
    5. Summary
  8. 7. Geo-Distributed Data Streams
    1. Stakeholders
    2. Design Goals
    3. Design Choices
      1. Our Design
      2. Follow the Data
      3. Control Who Has Access to Stream Data
    4. Advantages of Streams-based Geo-Replication
  9. 8. Putting It All Together
    1. Benefits of Stream-based Architectures
    2. Making the Transition to Streaming Architecture
    3. Conclusion
  10. A. Additional Resources
    1. Streaming Data Topics
    2. Selected O’Reilly Publications by the Authors