Big Data video edition

Video description

In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.

"Transcends individual tools or platforms. Required reading for anyone working with big data systems."
Jonathan Esterhazy, Groupon

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this Video Editions book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.

Inside:
  • Introduction to big data systems
  • Real-time processing of web-scale data
  • Tools like Hadoop, Cassandra, and Storm
  • Extensions to traditional database skills
This Video Editions book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.

Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.

A comprehensive, example-driven tour of the Lambda Architecture with its originator as your guide.
Mark Fisher, Pivotal

Contains wisdom that can only be gathered after tackling many big data projects. A must-read.
Pere Ferrera Bertran, Datasalt

The de facto guide to streamlining your data pipeline in batch and near-real time.
Alex Holmes, Author of "Hadoop in Practice"

NARRATED BY MARK THOMAS AND CHRIS PENICK

Table of contents

  1. A NEW PARADIGM FOR BIG DATA
    1. Chapter 1. A new paradigm for Big Data
    2. Chapter 1. Scaling with a traditional database
    3. Chapter 1. NoSQL is not a panacea
    4. Chapter 1. The problems with fully incremental architectures
    5. Chapter 1. Lambda Architecture
    6. Chapter 1. Batch and serving layers satisfy almost all properties
    7. Chapter 1. Recent trends in technology
  2. PART 1 BATCH LAYER
    1. Chapter 2. Data model for Big Data
    2. Chapter 2. Data is raw
    3. Chapter 2. Data is immutable
    4. Chapter 2. The fact-based model for representing data
    5. Chapter 2. Graph schemas
    6. Chapter 3. Data model for Big Data: Illustration
    7. Chapter 3. Tying everything together into data objects
    8. Chapter 4. Data storage on the batch layer
    9. Chapter 4. Storing a master dataset with a distributed filesystem
    10. Chapter 5. Data storage on the batch layer: Illustration
    11. Chapter 5. Data storage in the batch layer with Pail
    12. Chapter 5. Storing the master dataset for SuperWebAnalytics.com
    13. Chapter 6. Batch layer
    14. Chapter 6. Recomputation algorithms vs. incremental algorithms
    15. Chapter 6. Scalability in the batch layer
    16. Chapter 6. Low-level nature of MapReduce
    17. Chapter 6. Pipe diagrams: a higher-level way of thinking about batch computation
    18. Chapter 7. Batch layer: Illustration
    19. Chapter 7. An introduction to JCascalog
    20. Chapter 7. Grouping and aggregators
    21. Chapter 7. Composition
    22. Chapter 8. An example batch layer: Architecture and algorithms
    23. Chapter 8. Workflow overview
    24. Chapter 8. Deduplicate pageviews
    25. Chapter 9. An example batch layer: Implementation
    26. Chapter 9. URL normalization
  3. PART 2 SERVING LAYER
    1. Chapter 10. Serving layer
    2. Chapter 10. The serving layer solution to the normalization/denormalization problem
    3. Chapter 10. Designing a serving layer for SuperWebAnalytics.com
    4. Chapter 10. Contrasting with a fully incremental solution
    5. Chapter 10. Comparing to the Lambda Architecture solution
    6. Chapter 11. Serving layer: Illustration
    7. Chapter 11. Building the serving layer for SuperWebAnalytics.com
  4. PART 3 SPEED LAYER
    1. Chapter 12. Realtime views
    2. Chapter 12. Storing realtime views
    3. Chapter 12. Challenges of incremental computation
    4. Chapter 12. Asynchronous versus synchronous updates
    5. Chapter 13. Realtime views: Illustration
    6. Chapter 14. Queuing and stream processing
    7. Chapter 14. Stream processing
    8. Chapter 14. Higher-level, one-at-a-time stream processing
    9. Chapter 14. Guaranteeing message processing
    10. Chapter 14. SuperWebAnalytics.com speed layer
    11. Chapter 14. Topology structure
    12. Chapter 15. Queuing and stream processing: Illustration
    13. Chapter 15. Implementing the SuperWebAnalytics.com uniques-over-time speed layer
    14. Chapter 16. Micro-batch stream processing
    15. Chapter 16. Micro-batch processing topologies
    16. Chapter 16. Core concepts of micro-batch stream processing
    17. Chapter 16. Extending pipe diagrams for micro-batch processing
    18. Chapter 16. Bounce-rate analysis
    19. Chapter 16. Another look at the bounce-rate-analysis example
    20. Chapter 17. Micro-batch stream processing: Illustration
    21. Chapter 17. Finishing the SuperWebAnalytics.com speed layer
    22. Chapter 17. Fully fault-tolerant, in-memory, micro-batch processing
    23. Chapter 18. Lambda Architecture in depth
    24. Chapter 18. Batch and serving layers
    25. Chapter 18. Incremental batch processing - part 1
    26. Chapter 18. Incremental batch processing - part 2
    27. Chapter 18. Measuring and optimizing batch layer resource usage
    28. Chapter 18. Speed layer

Product information

  • Title: Big Data video edition
  • Author(s): Nathan Marz, James Warren
  • Release date: April 2015
  • Publisher(s): Manning Publications
  • ISBN: 9781617290343VE