You are previewing Building Python Real-Time Applications with Storm.
O'Reilly logo
Building Python Real-Time Applications with Storm

Book Description

Learn to process massive real-time data streams using Storm and Python—no Java required!

About This Book

  • Learn to use Apache Storm and the Python Petrel library to build distributed applications that process large streams of data

  • Explore sample applications in real-time and analyze them in the popular NoSQL databases MongoDB and Redis

  • Discover how to apply software development best practices to improve performance, productivity, and quality in your Storm projects

  • Who This Book Is For

    This book is intended for Python developers who want to benefit from Storm’s real-time data processing capabilities. If you are new to Python, you’ll benefit from the attention to key supporting tools and techniques such as automated testing, virtual environments, and logging. If you’re an experienced Python developer, you’ll appreciate the thorough and detailed examples

    What You Will Learn

  • Install Storm and learn about the prerequisites

  • Get to know the components of a Storm topology and how to control the flow of data between them

  • Ingest Twitter data directly into Storm

  • Use Storm with MongoDB and Redis

  • Build topologies and run them in Storm

  • Use an interactive graphical debugger to debug your topology as it’s running in Storm

  • Test your topology components outside of Storm

  • Configure your topology using YAML

  • In Detail

    Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.”

    At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily.

    You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.

    Style and approach

    This book takes an easy-to-follow and a practical approach to help you understand all the concepts related to Storm and Python.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. Building Python Real-Time Applications with Storm
      1. Table of Contents
      2. Building Python Real-Time Applications with Storm
      3. Credits
      4. About the Authors
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Getting Acquainted with Storm
        1. Overview of Storm
          1. Before the Storm era
          2. Key features of Storm
          3. Storm cluster modes
            1. Developer mode
            2. Single-machine Storm cluster
            3. Multimachine Storm cluster
            4. The Storm client
          4. Prerequisites for a Storm installation
            1. Zookeeper installation
        2. Storm installation
          1. Enabling native (Netty only) dependency
            1. Netty configuration
            2. Starting daemons
          2. Playing with optional configurations
        3. Summary
      9. 2. The Storm Anatomy
        1. Storm processes
          1. Supervisor
          2. Zookeeper
          3. The Storm UI
        2. Storm-topology-specific terminologies
          1. The worker process, executor, and task
          2. Worker processes
          3. Executors
          4. Tasks
          5. Interprocess communication
        3. A physical view of a Storm cluster
          1. Stream grouping
          2. Fault tolerance in Storm
          3. Guaranteed tuple processing in Storm
            1. XOR magic in acking
        4. Tuning parallelism in Storm – scaling a distributed computation
        5. Summary
      10. 3. Introducing Petrel
        1. What is Petrel?
          1. Building a topology
          2. Packaging a topology
          3. Logging events and errors
          4. Managing third-party dependencies
        2. Installing Petrel
        3. Creating your first topology
          1. Sentence spout
          2. Splitter bolt
          3. Word Counting Bolt
            1. Defining a topology
        4. Running the topology
        5. Troubleshooting
        6. Productivity tips with Petrel
          1. Improving startup performance
          2. Enabling and using logging
          3. Automatic logging of fatal errors
        7. Summary
      11. 4. Example Topology – Twitter
        1. Twitter analysis
        2. Twitter's Streaming API
          1. Creating a Twitter app to use the Streaming API
          2. The topology configuration file
          3. The Twitter stream spout
          4. Splitter bolt
          5. Rolling word count bolt
          6. The intermediate rankings bolt
          7. The total rankings bolt
          8. Defining the topology
        3. Running the topology
        4. Summary
      12. 5. Persistence Using Redis and MongoDB
        1. Finding the top n ranked topics using Redis
          1. The topology configuration file – the Redis case
          2. Rolling word count bolt – the Redis case
          3. Total rankings bolt – the Redis case
          4. Defining the topology – the Redis case
        2. Running the topology – the Redis case
          1. Finding the hourly count of tweets by city name using MongoDB
          2. Defining the topology – the MongoDB case
        3. Running the topology – the MongoDB case
        4. Summary
      13. 6. Petrel in Practice
        1. Testing a bolt
          1. Example – testing SplitSentenceBolt
          2. Example – testing SplitSentenceBolt with WordCountBolt
        2. Debugging
        3. Installing Winpdb
          1. Add Winpdb breakpoint
          2. Launching and attaching the debugger
        4. Profiling your topology's performance
          1. Split sentence bolt log
          2. Word count bolt log
        5. Summary
      14. A. Managing Storm Using Supervisord
        1. Storm administration over a cluster
          1. Introducing supervisord
          2. Supervisord components
            1. Supervisord installation
              1. Configuration of supervisord.conf
              2. Configuration of supervisord.conf on 172-31-19-62
        2. Summary
      15. Index