O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Introduction to Apache Flink

Book Description

There’s growing interest in learning how to analyze streaming data in large-scale systems such as web traffic, financial transactions, machine logs, industrial sensors, and many others. But analyzing data streams at scale has been difficult to do well—until now. This practical book delivers a deep introduction to Apache Flink, a highly innovative open source stream processor with a surprising range of capabilities.

Table of Contents

  1. Preface
    1. How to Use This Book
    2. Conventions Used in This Book
  2. 1. Why Apache Flink?
    1. Consequences of Not Doing Streaming Well
      1. Retail and Marketing
      2. The Internet of Things
      3. Telecom
      4. Banking and Financial Sector
    2. Goals for Processing Continuous Event Data
    3. Evolution of Stream Processing Technologies
    4. First Look at Apache Flink
      1. Batch and Stream Processing
    5. Flink in Production
      1. Bouygues Telecom
      2. Other Examples of Apache Flink in Production
    6. Where Flink Fits
  3. 2. Stream-First Architecture
    1. Traditional Architecture versus Streaming Architecture
    2. Message Transport and Message Processing
    3. The Transport Layer: Ideal Capabilities
      1. Performance with Persistence
      2. Decoupling of Multiple Producers from Multiple Consumers
    4. Streaming Data for a Microservices Architecture
      1. Data Stream as the Centralized Source of Data
      2. Fraud Detection Use Case: Better Design with Stream-First Architecture
      3. Flexibility for Developers
    5. Beyond Real-Time Applications
    6. Geo-Distributed Replication of Streams
  4. 3. What Flink Does
    1. Different Types of Correctness
      1. Natural Fit for Sessions
      2. Event Time
      3. Accuracy Under Failures: Keeping Track of State
      4. Answers When They Matter
      5. Ease of Development and Operations
    2. Hierarchical Use Cases: Adopting Flink in Stages
  5. 4. Handling Time
    1. Counting with Batch and Lambda Architectures
    2. Counting with Streaming Architecture
      1. Batching in Stream Processing Systems
    3. Notions of Time
    4. Windows
      1. Time Windows
      2. Count Windows
      3. Session Windows
      4. Triggers
      5. Implementation of Windows
    5. Time Travel
    6. Watermarks
      1. How Watermarks Are Generated
    7. A Real-World Example: Kappa Architecture at Ericsson
  6. 5. Stateful Computation
    1. Notions of Consistency
    2. Flink Checkpoints: Guaranteeing Exactly Once
    3. Savepoints: Versioning State
    4. End-to-End Consistency and the Stream Processor as a Database
    5. Flink Performance: the Yahoo! Streaming Benchmark
      1. Original Application with the Yahoo! Streaming Benchmark
      2. First Modification: Using Flink State
      3. Second Modification: Increase Volume Through Improved Data Generator
      4. Third Modification: Dealing with Network Bottleneck
      5. Fifth Modification: Increased Cardinality and Direct Query
    6. Conclusion
  7. 6. Batch Is a Special Case of Streaming
    1. Batch Processing Technology
    2. Case Study: Flink as a Batch Processor
  8. A. Additional Resources
    1. Going Further with Apache Flink
      1. More on Time and Windows
      2. More on Flink’s State and Checkpointing
      3. Handling Batch Processing with Flink
      4. Flink Use Cases and User Stories
      5. Stream-First Architecture
      6. Message Transport: Apache Kafka
      7. Message Transport: MapR Streams
    2. Selected O’Reilly Publications by Ted Dunning and Ellen Friedman