You are previewing Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives.
O'Reilly logo
Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

Book Description

Master alternative Big Data technologies that can do what Hadoop can't: real-time analytics and iterative machine learning.

When most technical professionals think of Big Data analytics today, they think of Hadoop. But there are many cutting-edge applications that Hadoop isn't well suited for, especially real-time analytics and contexts requiring the use of iterative machine learning algorithms. Fortunately, several powerful new technologies have been developed specifically for use cases such as these. Big Data Analytics Beyond Hadoop is the first guide specifically designed to help you take the next steps beyond Hadoop. Dr. Vijay Srinivas Agneeswaran introduces the breakthrough Berkeley Data Analysis Stack (BDAS) in detail, including its motivation, design, architecture, Mesos cluster management, performance, and more. He presents realistic use cases and up-to-date example code for: 

  • Spark, the next generation in-memory computing technology from UC Berkeley

  • Storm, the parallel real-time Big Data analytics technology from Twitter

  • GraphLab, the next-generation graph processing paradigm from CMU and the University of Washington (with comparisons to alternatives such as Pregel and Piccolo)

  • Halo also offers architectural and design guidance and code sketches for scaling machine learning algorithms to Big Data, and then realizing them in real-time. He concludes by previewing emerging trends, including real-time video analytics, SDNs, and even Big Data governance, security, and privacy issues. He identifies intriguing startups and new research possibilities, including BDAS extensions and cutting-edge model-driven analytics.

    Big Data Analytics Beyond Hadoop is an indispensable resource for everyone who wants to reach the cutting edge of Big Data analytics, and stay there: practitioners, architects, programmers, data scientists, researchers, startup entrepreneurs, and advanced students.

    Table of Contents

    1. About This eBook
    2. Title Page
    3. Copyright Page
    4. Dedication Page
    5. Contents
    6. Foreword
    7. Acknowledgments
    8. About the Author
    9. 1. Introduction: Why Look Beyond Hadoop Map-Reduce?
      1. Hadoop Suitability
      2. Big Data Analytics: Evolution of Machine Learning Realizations
      3. Closing Remarks
      4. References
    10. 2. What Is the Berkeley Data Analytics Stack (BDAS)?
      1. Motivation for BDAS
      2. BDAS Design and Architecture
      3. Spark: Paradigm for Efficient Data Processing on a Cluster
      4. Shark: SQL Interface over a Distributed System
      5. Mesos: Cluster Scheduling and Management System
      6. Closing Remarks
      7. References
    11. 3. Realizing Machine Learning Algorithms with Spark
      1. Basics of Machine Learning
      2. Logistic Regression: An Overview
      3. Logistic Regression Algorithm in Spark
      4. Support Vector Machine (SVM)
      5. PMML Support in Spark
      6. Machine Learning on Spark with MLbase
      7. References
    12. 4. Realizing Machine Learning Algorithms in Real Time
      1. Introduction to Storm
      2. Design Patterns in Storm
      3. Implementing Logistic Regression Algorithm in Storm
      4. Implementing Support Vector Machine Algorithm in Storm
      5. Naive Bayes PMML Support in Storm
      6. Real-Time Analytic Applications
      7. Spark Streaming
      8. References
    13. 5. Graph Processing Paradigms
      1. Pregel: Graph-Processing Framework Based on BSP
      2. Open Source Pregel Implementations
      3. GraphLab
      4. References
    14. 6. Conclusions: Big Data Analytics Beyond Hadoop Map-Reduce
      1. Overview of Hadoop YARN
      2. Other Frameworks over YARN
      3. What Does the Future Hold for Big Data Analytics?
      4. References
    15. A. Code Sketches
      1. Code for Naive Bayes PMML Scoring in Spark
      2. Code for Linear Regression PMML Support in Spark
      3. Page Rank in GraphLab
      4. SGD in GraphLab
    16. Index