O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Path: Real-Time Data Applications

Video Description

There are a variety of useful applications for real-time data, including quick identification of general patterns and trends in data, performing sentiment analysis, crafting responses in real-time, and—perhaps one of the most important uses—when having analysis immediately will change the outcome of the situation. This Learning Path provides an in-depth tour of technologies used in processing and analyzing real-time data.

Table of Contents

  1. Learning Apache Cassandra, by Ruth Stryker
    1. Introducing The Course 00:04:41
    2. Understanding What Cassandra Is 00:04:58
    3. Learning What Cassandra Is Being Used For 00:04:56
    4. Understanding The System Requirements 00:06:54
    5. How To Access Your Working Files 00:01:15
    6. Opening The Main Virtual Machine 00:02:53
    7. Pop Quiz - Intro to Cassandra 00:01:24
    8. Understanding That Cassandra Is A Distributed Database 00:02:23
    9. Learning What Snitch Is For 00:03:53
    10. Learning What Gossip Is For 00:01:52
    11. Learning How Data Gets Distributed 00:05:35
    12. Learning About Replication 00:02:12
    13. Learning About Virtual Nodes 00:03:01
    14. Pop Quiz - Getting Started with Architecture 00:01:25
    15. Downloading Cassandra 00:02:48
    16. Ensuring Oracle Java 7 Is Installed 00:02:02
    17. Installing Cassandra 00:03:44
    18. Viewing The Main Configuration File 00:02:46
    19. Providing Cassandra With Permission To Directories 00:01:46
    20. Starting Cassandra 00:03:41
    21. Checking Status 00:04:00
    22. Accessing The Cassandra system.log File 00:02:06
    23. Pop Quiz - Installing Cassandra 00:01:28
    24. Understanding Ways To Communicate With Cassandra 00:03:47
    25. Using CQLSH 00:02:29
    26. Pop Quiz - Communicating with Cassandra 00:01:08
    27. Understanding A Cassandra Database 00:01:54
    28. Defining A Keyspace 00:04:57
    29. Deleting A Keyspace 00:00:52
    30. Pop Quiz - Creating a Database 00:01:53
    31. Lab: Create A Second Database 00:02:39
    32. Creating A Table 00:01:49
    33. Defining Columns And Data Types 00:02:48
    34. Defining A Primary Key 00:01:49
    35. Recognizing A Partition Key 00:02:44
    36. Specifying A Descending Clustering Order 00:03:02
    37. Pop Quiz - Creating a Table 00:01:54
    38. Lab: Create A Second Table 00:02:33
    39. Understanding Ways To Write Data 00:01:28
    40. Using The INSERT INTO Command 00:04:45
    41. Using The COPY Command 00:05:53
    42. How Data Is Stored In Cassandra 00:04:21
    43. How Data Is Stored On Disk 00:05:29
    44. Pop Quiz - Inserting Data 00:02:15
    45. Lab: Insert Data 00:09:10
    46. Understanding Data Modeling In Cassandra 00:01:21
    47. Using A WHERE Clause 00:04:17
    48. Understanding Secondary Indexes 00:02:18
    49. Creating A Secondary Index 00:01:38
    50. Defining A Composite Partition Key 00:09:34
    51. Pop Quiz - Modeling Data 00:03:34
    52. Understanding Cassandra Drivers 00:02:31
    53. Exploring The DataStax Java Driver 00:03:14
    54. Setting Up A Development Environment 00:04:04
    55. Creating An Application Page 00:04:51
    56. Acquiring The DataStax Java Driver Files 00:03:24
    57. Getting The DataStax Java Driver Files Through Maven 00:02:23
    58. Providing The DataStax Java Driver Files Manually 00:02:36
    59. Connecting To A Cassandra Cluster 00:03:39
    60. Executing A Query 00:07:47
    61. Displaying Query Results - Part 1 00:05:59
    62. Displaying Query Results - Part 2 00:07:20
    63. Using An MVC Pattern 00:04:59
    64. Pop Quiz - Creating an Application 00:02:50
    65. Lab: Create A Second Application - Part 1 00:05:20
    66. Lab: Create A Second Application - Part 2 00:09:49
    67. Lab: Create A Second Application - Part 3 00:03:08
    68. Updating Data 00:03:39
    69. Understanding How Updating Works 00:03:55
    70. Deleting Data 00:07:10
    71. Understanding Tombstones 00:07:18
    72. Using TTLs 00:05:09
    73. Updating A TTL 00:02:38
    74. Pop Quiz - Updating and Deleting Data 00:02:38
    75. Lab: Update And Delete Data 00:07:00
    76. Understanding Hardware Choices 00:00:30
    77. Understanding RAM And CPU Recommendations 00:02:45
    78. Selecting Storage 00:04:08
    79. Deploying In The Cloud 00:04:07
    80. Pop Quiz - Selecting Hardware 00:02:06
    81. Understanding Cassandra Nodes 00:03:39
    82. Having A Network Connection - Part 1 00:05:35
    83. Having A Network Connection - Part 2 00:05:02
    84. Having A Network Connection - Part 3 00:04:46
    85. Specifying The IP Address Of A Node In Cassandra 00:04:12
    86. Specifying Seed Nodes 00:06:30
    87. Bootstrapping A Node 00:06:18
    88. Cleaning Up A Node 00:02:59
    89. Using cassandra-stress 00:10:33
    90. Pop Quiz - Adding Nodes to a Cluster 00:01:39
    91. Lab: Add A Third Node 00:10:42
    92. Understanding Cassandra Monitoring Tools 00:00:46
    93. Using Nodetool 00:04:54
    94. Using JConsole 00:03:24
    95. Learning About OpsCenter 00:03:24
    96. Pop Quiz - Monitoring a Cluster 00:01:49
    97. Understanding Repair 00:05:17
    98. Repairing Nodes 00:04:17
    99. Understanding Consistency - Part 1 00:06:26
    100. Understanding Consistency - Part 2 00:04:33
    101. Understanding Hinted Handoff 00:03:30
    102. Understanding Read Repair 00:01:58
    103. Pop Quiz - Repairing Nodes 00:03:30
    104. Lab: Repair Nodes For A Keyspace 00:05:45
    105. Understanding Removing A Node 00:00:54
    106. Decommissioning A Node 00:04:36
    107. Putting A Node Back Into Service 00:06:38
    108. Removing A Dead Node 00:06:42
    109. Pop Quiz - Removing a Node 00:04:10
    110. Lab: Put A Node Back Into Service 00:05:00
    111. Redefining For Multiple Data Centers - Part 1 00:04:50
    112. Redefining For Multiple Data Centers - Part 2 00:05:59
    113. Changing Snitch Type 00:05:25
    114. Modifying cassandra-rackdc.properties 00:07:45
    115. Changing Replication Strategy - Part 1 00:05:55
    116. Changing Replication Strategy - Part 2 00:03:58
    117. Pop Quiz - Redefining a Cluster 00:02:30
    118. Accessing Documentation 00:02:51
    119. Reading Blogs And Books 00:04:53
    120. Watching Video Recordings 00:04:05
    121. Posting Questions 00:04:10
    122. Attending Events 00:03:00
    123. Wrap Up 00:01:03
  2. Introduction to Apache Kafka, by Gwen Shapira
    1. The Case for Kafka 00:11:23
    2. The Basics 00:09:10
    3. Setting up a Kafka Cluster 00:15:30
    4. Writing a Kafka Producer 00:14:33
    5. Writing a Kafka Consumer 00:16:34
    6. Using Kafka from Python 00:08:03
    7. Troubleshooting Kafka 00:29:29
    8. Integrating Kafka and Hadoop with Flafka 00:26:06
    9. Kafka Availability and Consistency 00:22:38
    10. Kafka Ecosystem 00:13:13
    11. Future of Kafka 00:08:53
  3. Introduction to Apache Spark, by Paco Nathan
    1. Pre-Flight Check 00:13:08
    2. Spark Deconstructed 00:14:31
    3. A Brief History 00:23:28
    4. Simple Spark Apps 00:25:07
    5. Spark Essentials 00:35:18
    6. Spark Examples 00:21:55
    7. Unifying the Pieces - Spark SQL 00:24:07
    8. Unifying the Pieces - Spark Streaming 00:14:48
    9. Unifying the Pieces - MLlib and GraphX 00:20:00
    10. Unified Workflows Demo 00:22:35
    11. The Full SDLC 00:04:01
    12. Developer Certification 00:06:10
    13. Resources 00:04:44
    14. Introduction - Why DataFrames? 00:02:28
    15. ETL to Prepare the Data from Capital Bikeshare 00:02:46
    16. Create a DataFrame, Explore using SQL 00:02:47
    17. Data Preparation for Machine Learning Models 00:05:33
    18. Build a Classifier Using Naive Bayes 00:04:43
    19. Build a Classifier Using Decision Trees 00:02:26
    20. Build a Classifier Using Random Forests 00:02:20
    21. Use a DataFrame to Compare Models 00:04:15
    22. Parquet as a Best Practice with DataFrames 00:00:58
    23. How to Store a DataFrame with Parquet 00:03:25
    24. How to Read a DataFrame Back in From Parquet 00:02:57
    25. Use SQL to Estimate Route Durations 00:01:41
    26. Data Preparation for GraphX - Model Route Costs 00:04:43
    27. Use PageRank to Rank Popular Stations 00:03:14
    28. Optimize Routes to Columbus Circle 00:03:43
    29. Compare Results with Google Maps 00:01:58
    30. Analyze a Popular Tourist Route 00:02:30
    31. Examples of How to Use DataFrames in Python 00:02:57
    32. Summary - The New DataFrames Features in Spark 00:01:03
  4. Large-scale Real-time Stream Processing and Analytics
    1. Introduction - Large-scale real time stream processing and analytics at Strata+Hadoop World - Ben Lorica 00:01:08
    2. Going Real-time: Data Collection and Stream Processing with Apache Kafka - Jay Kreps 00:39:29
    3. Say goodbye to batch - Tyler Akidau (Google) 00:42:35
    4. Stream Processing Everywhere - What to Use? - Jim Scott 00:39:06
    5. From Source to Solution: Building A System for Machine and Event-Oriented Data - Eric Sammer 00:41:59
    6. Spark Streaming - The State of the Union, and Beyond - Tathagata Das 00:36:46
    7. Dynamic Events in Massive Data Streams, from Astrophysics to Marketing Automation - Kirk Borne 00:40:06
    8. TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies - Anirudh Todi 00:41:28
    9. Streaming Analytics: It’s Not The Same Game - Subutai Ahmad 00:38:46
    10. Realtime Data Analysis Patterns - Mikio Braun (streamdrill) 00:39:24
    11. The IoT P2P Backbone - Bruno Fernandez-Ruiz 00:27:05
    12. Practical Methods for Identifying Anomalies That Matter in Large Datasets - Robert Grossman 00:36:43
  5. An Introduction to Time Series with Team Apache, by Patrick McFadin
    1. Introduction to Time Series Problems 00:09:58
    2. Kafka Architecture and Deployment 00:11:33
    3. Kafka Usage 00:03:42
    4. Introduction to Spark 00:15:43
    5. Spark Architecture 00:12:02
    6. Spark Streaming: Windows & Slides 00:08:35
    7. Spark Streaming: Ingestion Sources & Using Kafka 00:08:32
    8. Sparks Streaming: Operations on the Stream 00:01:30
    9. Introduction to Cassandra 00:08:56
    10. Cassandra Basic Architecture 00:11:59
    11. Replication, High Availability and Multi Datacenter 00:14:06
    12. Cassandra Weather Website Example 00:11:46
    13. Cassandra Query Language (CQL) 00:18:00
    14. Cassandra Partitions & Clustering 00:08:22
    15. Cassandra Read and Write Path 00:12:17
    16. Working with Cassandra 00:06:32
    17. Cassandra Drivers and Access Patterns 00:10:37
    18. Spark and Cassandra Architecture 00:12:00
    19. Analyzing Cassandra Data & Spark SQL 00:12:12
    20. Spark and Cassandra DataStax Enterprise 00:04:31
    21. Real World Use Cases: Streaming Problems 00:17:11
    22. Real World Use Cases: In-place Analytic Problems 00:10:58