O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Big Data Video Edition

Video Description

"Transcends individual tools or platforms. Required reading for anyone working with big data systems."
Jonathan Esterhazy, Groupon

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this Video Editions book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.

Inside:
  • Introduction to big data systems
  • Real-time processing of web-scale data
  • Tools like Hadoop, Cassandra, and Storm
  • Extensions to traditional database skills
This Video Editions book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.

Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.

A comprehensive, example-driven tour of the Lambda Architecture with its originator as your guide.
Mark Fisher, Pivotal

Contains wisdom that can only be gathered after tackling many big data projects. A must-read.
Pere Ferrera Bertran, Datasalt

The de facto guide to streamlining your data pipeline in batch and near-real time.
Alex Holmes, Author of "Hadoop in Practice"

NARRATED BY MARK THOMAS AND CHRIS PENICK

Table of Contents

  1. A NEW PARADIGM FOR BIG DATA
    1. Chapter 1. A new paradigm for Big Data 00:04:54
    2. Chapter 1. Scaling with a traditional database 00:08:58
    3. Chapter 1. NoSQL is not a panacea 00:10:23
    4. Chapter 1. The problems with fully incremental architectures 00:12:22
    5. Chapter 1. Lambda Architecture 00:06:13
    6. Chapter 1. Batch and serving layers satisfy almost all properties 00:07:44
    7. Chapter 1. Recent trends in technology 00:09:15
  2. PART 1 BATCH LAYER
    1. Chapter 2. Data model for Big Data 00:05:59
    2. Chapter 2. Data is raw 00:05:49
    3. Chapter 2. Data is immutable 00:06:59
    4. Chapter 2. The fact-based model for representing data 00:11:47
    5. Chapter 2. Graph schemas 00:07:46
    6. Chapter 3. Data model for Big Data: Illustration 00:06:58
    7. Chapter 3. Tying everything together into data objects 00:06:21
    8. Chapter 4. Data storage on the batch layer 00:10:06
    9. Chapter 4. Storing a master dataset with a distributed filesystem 00:07:16
    10. Chapter 5. Data storage on the batch layer: Illustration 00:06:55
    11. Chapter 5. Data storage in the batch layer with Pail 00:10:46
    12. Chapter 5. Storing the master dataset for SuperWebAnalytics.com 00:06:30
    13. Chapter 6. Batch layer 00:08:48
    14. Chapter 6. Recomputation algorithms vs. incremental algorithms 00:09:12
    15. Chapter 6. Scalability in the batch layer 00:13:20
    16. Chapter 6. Low-level nature of MapReduce 00:05:44
    17. Chapter 6. Pipe diagrams: a higher-level way of thinking about batch computation 00:12:15
    18. Chapter 7. Batch layer: Illustration 00:10:51
    19. Chapter 7. An introduction to JCascalog 00:09:26
    20. Chapter 7. Grouping and aggregators 00:10:27
    21. Chapter 7. Composition 00:12:09
    22. Chapter 8. An example batch layer: Architecture and algorithms 00:11:26
    23. Chapter 8. Workflow overview 00:12:11
    24. Chapter 8. Deduplicate pageviews 00:06:01
    25. Chapter 9. An example batch layer: Implementation 00:10:10
    26. Chapter 9. URL normalization 00:10:58
  3. PART 2 SERVING LAYER
    1. Chapter 10. Serving layer 00:07:25
    2. Chapter 10. The serving layer solution to the normalization/denormalization problem 00:08:14
    3. Chapter 10. Designing a serving layer for SuperWebAnalytics.com 00:06:09
    4. Chapter 10. Contrasting with a fully incremental solution 00:16:43
    5. Chapter 10. Comparing to the Lambda Architecture solution 00:02:39
    6. Chapter 11. Serving layer: Illustration 00:09:33
    7. Chapter 11. Building the serving layer for SuperWebAnalytics.com 00:05:30
  4. PART 3 SPEED LAYER
    1. Chapter 12. Realtime views 00:06:16
    2. Chapter 12. Storing realtime views 00:06:10
    3. Chapter 12. Challenges of incremental computation 00:10:35
    4. Chapter 12. Asynchronous versus synchronous updates 00:08:15
    5. Chapter 13. Realtime views: Illustration 00:09:05
    6. Chapter 14. Queuing and stream processing 00:09:16
    7. Chapter 14. Stream processing 00:05:07
    8. Chapter 14. Higher-level, one-at-a-time stream processing 00:06:41
    9. Chapter 14. Guaranteeing message processing 00:05:10
    10. Chapter 14. SuperWebAnalytics.com speed layer 00:07:07
    11. Chapter 14. Topology structure 00:05:30
    12. Chapter 15. Queuing and stream processing: Illustration 00:07:06
    13. Chapter 15. Implementing the SuperWebAnalytics.com uniques-over-time speed layer 00:08:45
    14. Chapter 16. Micro-batch stream processing 00:07:26
    15. Chapter 16. Micro-batch processing topologies 00:05:03
    16. Chapter 16. Core concepts of micro-batch stream processing 00:04:32
    17. Chapter 16. Extending pipe diagrams for micro-batch processing 00:05:55
    18. Chapter 16. Bounce-rate analysis 00:10:36
    19. Chapter 16. Another look at the bounce-rate-analysis example 00:03:08
    20. Chapter 17. Micro-batch stream processing: Illustration 00:08:17
    21. Chapter 17. Finishing the SuperWebAnalytics.com speed layer 00:09:25
    22. Chapter 17. Fully fault-tolerant, in-memory, micro-batch processing 00:05:36
    23. Chapter 18. Lambda Architecture in depth 00:05:15
    24. Chapter 18. Batch and serving layers 00:06:44
    25. Chapter 18. Incremental batch processing - part 1 00:08:06
    26. Chapter 18. Incremental batch processing - part 2 00:06:33
    27. Chapter 18. Measuring and optimizing batch layer resource usage 00:10:41
    28. Chapter 18. Speed layer 00:08:31