You are previewing Storm Real-time Processing Cookbook.
O'Reilly logo
Storm Real-time Processing Cookbook

Book Description

Java developers can expand into real-time data processing with this fantastic guide to Storm. Using a cookbook approach with lots of practical recipes, it’s the user-friendly way to learn how to process unlimited data streams.

  • Learn the key concepts of processing data in real time with Storm

  • Concepts ranging from Log stream processing to mastering data management with Storm

  • Written in a Cookbook style, with plenty of practical recipes with well-explained code examples and relevant screenshots and diagrams

  • In Detail

    Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

    Storm Real Time Processing Cookbook will have basic to advanced recipes on Storm for real-time computation.

    The book begins with setting up the development environment and then teaches log stream processing. This will be followed by real-time payments workflow, distributed RPC, integrating it with other software such as Hadoop and Apache Camel, and more.

    Table of Contents

    1. Storm Real-time Processing Cookbook
      1. Table of Contents
      2. Storm Real-time Processing Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.packtpub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. An introduction to the Storm processor
        2. What this book covers
        3. What you need for this book
        4. Who this book is for
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Setting Up Your Development Environment
        1. Introduction
        2. Setting up your development environment
          1. How to do it…
          2. How it works…
        3. Distributed version control
          1. How to do it…
        4. Creating a "Hello World" topology
          1. How to do it…
          2. How it works…
        5. Creating a Storm cluster – provisioning the machines
          1. How to do it...
          2. How it works...
        6. Creating a Storm cluster – provisioning Storm
          1. How to do it…
          2. How it works…
        7. Deriving basic click statistics
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Unit testing a bolt
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Implementing an integration test
          1. How to do it…
          2. How it works…
        10. Deploying to the cluster
          1. How to do it…
          2. How it works…
      9. 2. Log Stream Processing
        1. Introduction
        2. Creating a log agent
          1. How to do it…
          2. How it works…
        3. Creating the log spout
          1. How to do it…
          2. How it works…
          3. There's more…
        4. Rule-based analysis of the log stream
          1. How to do it…
          2. How it works…
        5. Indexing and persisting the log data
          1. How to do it…
          2. How it works…
        6. Counting and persisting log statistics
          1. How to do it…
          2. How it works…
        7. Creating an integration test for the log stream cluster
          1. How to do it…
          2. How it works…
        8. Creating a log analytics dashboard
          1. How to do it…
          2. How it works…
      10. 3. Calculating Term Importance with Trident
        1. Introduction
        2. Creating a URL stream using a Twitter filter
          1. How to do it…
          2. How it works…
          3. There's more…
        3. Deriving a clean stream of terms from the documents
          1. How to do it…
          2. How it works…
        4. Calculating the relative importance of each term
          1. How to do it…
          2. How it works…
          3. There's more…
      11. 4. Distributed Remote Procedure Calls
        1. Introduction
        2. Using DRPC to complete the required processing
          1. How to do it…
          2. How it works…
          3. There's more...
        3. Integration testing of a Trident topology
          1. How to do it…
          2. How it works…
          3. There's more…
        4. Implementing a rolling window topology
          1. How to do it…
          2. How it works…
        5. Simulating time in integration testing
          1. How to do it…
          2. How it works…
      12. 5. Polyglot Topology
        1. Introduction
        2. Implementing the multilang protocol in Qt
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Implementing the SplitSentence bolt in Qt
          1. How to do it…
          2. How it works…
          3. There's more…
        4. Implementing the count bolt in Ruby
          1. How to do it…
          2. How it works…
        5. Defining the word count topology in Clojure
          1. How to do it…
          2. How it works…
          3. There's more…
      13. 6. Integrating Storm and Hadoop
        1. Introduction
        2. Implementing TF-IDF in Hadoop
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Persisting documents from Storm
          1. How to do it…
          2. How it works…
        4. Integrating the batch and real-time views
          1. How to do it…
          2. How it works…
      14. 7. Real-time Machine Learning
        1. Introduction
        2. Implementing a transactional topology
          1. Getting ready
          2. How to do it…
          3. How it works...
        3. Creating a Random Forest classification model using R
          1. Getting ready
          2. How to do it…
          3. How it works...
          4. There's more...
        4. Operational classification of transactional streams using Random Forest
          1. Getting ready
          2. How to do it…
          3. How it works...
          4. There's more...
        5. Creating an association rules model in R
          1. Getting ready
          2. How to do it…
          3. How it works...
        6. Creating a recommendation engine
          1. How to do it…
          2. How it works...
          3. There's more...
        7. Real-time online machine learning
          1. How to do it…
          2. How it works...
      15. 8. Continuous Delivery
        1. Introduction
        2. Setting up a CI server
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Setting up system environments
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Defining a delivery pipeline
          1. How to do it…
          2. How it works...
          3. There's more...
        5. Implementing automated acceptance testing
          1. Getting ready
          2. How to do it…
          3. How it works...
          4. There's more...
      16. 9. Storm on AWS
        1. Introduction
        2. Deploying Storm on AWS using Pallet
          1. Getting ready
          2. How to do it…
          3. There's more…
        3. Setting up a Virtual Private Cloud
          1. How to do it…
        4. Deploying Storm into Virtual Private Cloud using Vagrant
          1. Getting ready
          2. How to do it…
      17. Index