Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo
Learning Path: Architect and Build Big Data Applications

Video Description

With datasets growing increasingly large, the need for custom data solutions has soared as well. This Learning Path will take you through the entire process of designing and building data applications that can visualize, navigate, and interpret reams of data. Get a thorough introduction to the most important tools in the big data ecosystem.

Table of Contents

  1. Introduction to Big Data, by Vladimir Bacvanski
    1. Introduction To The Course 00:00:58
    2. About The Author 00:01:20
    3. Big Data Challenges 00:02:04
    4. Big Data Characteristics 00:05:50
    5. Problems In Capitalizing On Big Data 00:03:10
    6. Solving Big Data Problems 00:02:10
    7. The Challenges Of Relational Databases 00:03:38
    8. MapReduce And Hadoop 00:02:27
    9. MapReduce Algorithm 00:06:08
    10. Introducing Hadoop 00:03:26
    11. Hadoop Distributed File System 00:05:41
    12. Interacting With HDFS 00:02:57
    13. Hadoop Infrastructure 00:03:48
    14. YARN 00:02:19
    15. Programming Hadoop 00:04:54
    16. Hive 00:06:12
    17. Hive Architecture 00:04:45
    18. Hive Data Model 00:05:19
    19. Hive Queries 00:03:09
    20. When To Use Hive 00:03:06
    21. Pig 00:03:01
    22. Pig Data Model 00:03:30
    23. Pig Latin 00:05:56
    24. Pig Example 00:03:40
    25. When To Use Pig 00:03:12
    26. Scalding 00:03:14
    27. Programming With Scalding 00:05:43
    28. When To Use Scalding 00:02:47
    29. Hadoop Ecosystem 00:08:49
    30. HBase 00:08:19
    31. When To Use HBase 00:02:08
    32. Beyond Classic Hadoop - Spark And Flink 00:07:15
    33. NoSQL Stores 00:07:48
    34. Key-Value Stores 00:02:51
    35. Columnar Stores 00:03:25
    36. Document Stores 00:02:36
    37. Graph Stores 00:02:28
    38. Data Modeling For NoSQL Stores 00:03:34
    39. Streaming 00:01:58
    40. Storm 00:03:45
    41. Spark And Flink Streaming 00:02:09
    42. Lambda Architecture 00:02:50
    43. Introducing Big Data And NoSQL In The Enterprise 00:06:34
    44. Polyglot Persistence 00:05:27
    45. Seven Habits Of Successful Big Data And NoSQL Projects 00:02:41
    46. Wrap-Up 00:00:17
  2. Learning Apache Cassandra, by Ruth Stryker
    1. Introducing The Course 00:04:41
    2. Understanding What Cassandra Is 00:04:58
    3. Learning What Cassandra Is Being Used For 00:04:56
    4. Understanding The System Requirements 00:06:54
    5. How To Access Your Working Files 00:01:15
    6. Opening The Main Virtual Machine 00:02:53
    7. Pop Quiz - Intro to Cassandra 00:01:24
    8. Understanding That Cassandra Is A Distributed Database 00:02:23
    9. Learning What Snitch Is For 00:03:53
    10. Learning What Gossip Is For 00:01:52
    11. Learning How Data Gets Distributed 00:05:35
    12. Learning About Replication 00:02:12
    13. Learning About Virtual Nodes 00:03:01
    14. Pop Quiz - Getting Started with Architecture 00:01:25
    15. Downloading Cassandra 00:02:48
    16. Ensuring Oracle Java 7 Is Installed 00:02:02
    17. Installing Cassandra 00:03:44
    18. Viewing The Main Configuration File 00:02:46
    19. Providing Cassandra With Permission To Directories 00:01:46
    20. Starting Cassandra 00:03:41
    21. Checking Status 00:04:00
    22. Accessing The Cassandra system.log File 00:02:06
    23. Pop Quiz - Installing Cassandra 00:01:28
    24. Understanding Ways To Communicate With Cassandra 00:03:47
    25. Using CQLSH 00:02:29
    26. Pop Quiz - Communicating with Cassandra 00:01:08
    27. Understanding A Cassandra Database 00:01:54
    28. Defining A Keyspace 00:04:57
    29. Deleting A Keyspace 00:00:52
    30. Pop Quiz - Creating a Database 00:01:53
    31. Lab: Create A Second Database 00:02:39
    32. Creating A Table 00:01:49
    33. Defining Columns And Data Types 00:02:48
    34. Defining A Primary Key 00:01:49
    35. Recognizing A Partition Key 00:02:44
    36. Specifying A Descending Clustering Order 00:03:02
    37. Pop Quiz - Creating a Table 00:01:54
    38. Lab: Create A Second Table 00:02:33
    39. Understanding Ways To Write Data 00:01:28
    40. Using The INSERT INTO Command 00:04:45
    41. Using The COPY Command 00:05:53
    42. How Data Is Stored In Cassandra 00:04:21
    43. How Data Is Stored On Disk 00:05:29
    44. Pop Quiz - Inserting Data 00:02:15
    45. Lab: Insert Data 00:09:10
    46. Understanding Data Modeling In Cassandra 00:01:21
    47. Using A WHERE Clause 00:04:17
    48. Understanding Secondary Indexes 00:02:18
    49. Creating A Secondary Index 00:01:38
    50. Defining A Composite Partition Key 00:09:34
    51. Pop Quiz - Modeling Data 00:03:34
    52. Understanding Cassandra Drivers 00:02:31
    53. Exploring The DataStax Java Driver 00:03:14
    54. Setting Up A Development Environment 00:04:04
    55. Creating An Application Page 00:04:51
    56. Acquiring The DataStax Java Driver Files 00:03:24
    57. Getting The DataStax Java Driver Files Through Maven 00:02:23
    58. Providing The DataStax Java Driver Files Manually 00:02:36
    59. Connecting To A Cassandra Cluster 00:03:39
    60. Executing A Query 00:07:47
    61. Displaying Query Results - Part 1 00:05:59
    62. Displaying Query Results - Part 2 00:07:20
    63. Using An MVC Pattern 00:04:59
    64. Pop Quiz - Creating an Application 00:02:50
    65. Lab: Create A Second Application - Part 1 00:05:20
    66. Lab: Create A Second Application - Part 2 00:09:49
    67. Lab: Create A Second Application - Part 3 00:03:08
    68. Updating Data 00:03:39
    69. Understanding How Updating Works 00:03:55
    70. Deleting Data 00:07:10
    71. Understanding Tombstones 00:07:18
    72. Using TTLs 00:05:09
    73. Updating A TTL 00:02:38
    74. Pop Quiz - Updating and Deleting Data 00:02:38
    75. Lab: Update And Delete Data 00:07:00
    76. Understanding Hardware Choices 00:00:30
    77. Understanding RAM And CPU Recommendations 00:02:45
    78. Selecting Storage 00:04:08
    79. Deploying In The Cloud 00:04:07
    80. Pop Quiz - Selecting Hardware 00:02:06
    81. Understanding Cassandra Nodes 00:03:39
    82. Having A Network Connection - Part 1 00:05:35
    83. Having A Network Connection - Part 2 00:05:02
    84. Having A Network Connection - Part 3 00:04:46
    85. Specifying The IP Address Of A Node In Cassandra 00:04:12
    86. Specifying Seed Nodes 00:06:30
    87. Bootstrapping A Node 00:06:18
    88. Cleaning Up A Node 00:02:59
    89. Using cassandra-stress 00:10:33
    90. Pop Quiz - Adding Nodes to a Cluster 00:01:39
    91. Lab: Add A Third Node 00:10:42
    92. Understanding Cassandra Monitoring Tools 00:00:46
    93. Using Nodetool 00:04:54
    94. Using JConsole 00:03:24
    95. Learning About OpsCenter 00:03:24
    96. Pop Quiz - Monitoring a Cluster 00:01:49
    97. Understanding Repair 00:05:17
    98. Repairing Nodes 00:04:17
    99. Understanding Consistency - Part 1 00:06:26
    100. Understanding Consistency - Part 2 00:04:33
    101. Understanding Hinted Handoff 00:03:30
    102. Understanding Read Repair 00:01:58
    103. Pop Quiz - Repairing Nodes 00:03:30
    104. Lab: Repair Nodes For A Keyspace 00:05:45
    105. Understanding Removing A Node 00:00:54
    106. Decommissioning A Node 00:04:36
    107. Putting A Node Back Into Service 00:06:38
    108. Removing A Dead Node 00:06:42
    109. Pop Quiz - Removing a Node 00:04:10
    110. Lab: Put A Node Back Into Service 00:05:00
    111. Redefining For Multiple Data Centers - Part 1 00:04:50
    112. Redefining For Multiple Data Centers - Part 2 00:05:59
    113. Changing Snitch Type 00:05:25
    114. Modifying cassandra-rackdc.properties 00:07:45
    115. Changing Replication Strategy - Part 1 00:05:55
    116. Changing Replication Strategy - Part 2 00:03:58
    117. Pop Quiz - Redefining a Cluster 00:02:30
    118. Accessing Documentation 00:02:51
    119. Reading Blogs And Books 00:04:53
    120. Watching Video Recordings 00:04:05
    121. Posting Questions 00:04:10
    122. Attending Events 00:03:00
    123. Wrap Up 00:01:03
  3. Introduction to Apache Kafka, by Gwen Shapira
    1. The Case for Kafka 00:11:23
    2. The Basics 00:09:10
    3. Setting up a Kafka Cluster 00:15:30
    4. Writing a Kafka Producer 00:14:33
    5. Writing a Kafka Consumer 00:16:34
    6. Using Kafka from Python 00:08:03
    7. Troubleshooting Kafka 00:29:29
    8. Integrating Kafka and Hadoop with Flafka 00:26:06
    9. Kafka Availability and Consistency 00:22:38
    10. Kafka Ecosystem 00:13:13
    11. Future of Kafka 00:08:53
  4. Introduction to Apache Spark, by Paco Nathan
    1. Pre-Flight Check 00:13:08
    2. Spark Deconstructed 00:14:31
    3. A Brief History 00:23:28
    4. Simple Spark Apps 00:25:07
    5. Spark Essentials 00:35:18
    6. Spark Examples 00:21:55
    7. Unifying the Pieces - Spark SQL 00:24:07
    8. Unifying the Pieces - Spark Streaming 00:14:48
    9. Unifying the Pieces - MLlib and GraphX 00:20:00
    10. Unified Workflows Demo 00:22:35
    11. The Full SDLC 00:04:01
    12. Developer Certification 00:06:10
    13. Resources 00:04:44
    14. Introduction - Why DataFrames? 00:02:28
    15. ETL to Prepare the Data from Capital Bikeshare 00:02:46
    16. Create a DataFrame, Explore using SQL 00:02:47
    17. Data Preparation for Machine Learning Models 00:05:33
    18. Build a Classifier Using Naive Bayes 00:04:43
    19. Build a Classifier Using Decision Trees 00:02:26
    20. Build a Classifier Using Random Forests 00:02:20
    21. Use a DataFrame to Compare Models 00:04:15
    22. Parquet as a Best Practice with DataFrames 00:00:58
    23. How to Store a DataFrame with Parquet 00:03:25
    24. How to Read a DataFrame Back in From Parquet 00:02:57
    25. Use SQL to Estimate Route Durations 00:01:41
    26. Data Preparation for GraphX - Model Route Costs 00:04:43
    27. Use PageRank to Rank Popular Stations 00:03:14
    28. Optimize Routes to Columbus Circle 00:03:43
    29. Compare Results with Google Maps 00:01:58
    30. Analyze a Popular Tourist Route 00:02:30
    31. Examples of How to Use DataFrames in Python 00:02:57
    32. Summary - The New DataFrames Features in Spark 00:01:03
  5. Building Big Data Platforms
    1. Introduction - Building big data platforms at Strata+Hadoop World - Ben Lorica 00:00:53
    2. Big Data at Netflix: Faster and Easier - Kurt Brown 00:40:27
    3. Building Interactive Data Applications at Scale - Fangjin Yang and Vadim Ogievetsky 00:42:56
    4. Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3 - Anil Madan 00:50:36
    5. Building Real-time Data Products at LinkedIn with Apache Samza - Martin Kleppmann 00:49:42
    6. An Open Source Approach to Gathering and Analyzing Device Sourced Health Data - Ian Eslick 00:41:41
    7. Ticketmaster: Marketing and Selling the World's Tickets - John Carnahan 00:39:35
    8. Unlocking Big Data at CERN - Matthias Braeger and Manish Devgan 00:41:13
    9. Unboxing Data Startups - Michael Abbott 00:38:50
  6. Architectural Considerations for Hadoop Applications, by Mark Grover, Gwen Shapira, Jonathan Seidman, and Ted Malaska
    1. Introduction to Clickstream Case Study 00:11:19
    2. Requirements 00:08:04
    3. Data Modeling 00:14:55
    4. Data Ingest 00:16:16
    5. Data Processing Engines - Part 1 00:16:23
    6. Data Processing Engines - Part 2 00:10:59
    7. Data Processing Patterns 00:09:32
    8. Orchestration 00:14:34
    9. Putting It All Together 00:03:08
    10. Demo 00:21:47
    11. Q&A 00:24:35
  7. An Introduction to Time Series with Team Apache, by Patrick McFadin
    1. Introduction to Time Series Problems 00:09:58
    2. Kafka Architecture and Deployment 00:11:33
    3. Kafka Usage 00:03:42
    4. Introduction to Spark 00:15:43
    5. Spark Architecture 00:12:02
    6. Spark Streaming: Windows & Slides 00:08:35
    7. Spark Streaming: Ingestion Sources & Using Kafka 00:08:32
    8. Sparks Streaming: Operations on the Stream 00:01:30
    9. Introduction to Cassandra 00:08:56
    10. Cassandra Basic Architecture 00:11:59
    11. Replication, High Availability and Multi Datacenter 00:14:06
    12. Cassandra Weather Website Example 00:11:46
    13. Cassandra Query Language (CQL) 00:18:00
    14. Cassandra Partitions & Clustering 00:08:22
    15. Cassandra Read and Write Path 00:12:17
    16. Working with Cassandra 00:06:32
    17. Cassandra Drivers and Access Patterns 00:10:37
    18. Spark and Cassandra Architecture 00:12:00
    19. Analyzing Cassandra Data & Spark SQL 00:12:12
    20. Spark and Cassandra DataStax Enterprise 00:04:31
    21. Real World Use Cases: Streaming Problems 00:17:11
    22. Real World Use Cases: In-place Analytic Problems 00:10:58