O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Path: Advanced Architecture for Big Data Applications

Video Description

Sharpen your architectural skills by understanding challenges in the main areas of distributed systems: storage, computation, messaging, timing, and consensus. You’ll learn how to develop highly scalable big data applications using Apache Accumulo, to model and design an agile data warehouse, and to use Elasticsearch to search, aggregate, analyze, and scale large volume datastores. You’ll also learn how to identify insecurities in your big data cluster, and to secure them using MIT Kerberos, authentication with Active Directory, and authorization.

Table of Contents

  1. Welcome 00:10:42
  2. What Distributed Systems Are, and Why They Exist 00:02:51
  3. Read Replication 00:04:57
  4. Sharding 00:08:45
  5. Consistent Hashing 00:19:27
  6. CAP Theorem 00:12:08
  7. Distributed Transactions 00:16:54
  8. Distributed Computation Introduction 00:04:32
  9. Map Reduce 00:12:25
  10. Hadoop 00:22:53
  11. Spark 00:18:22
  12. Storm 00:16:27
  13. Lambda Architecture 00:08:36
  14. Synchronization 00:04:19
  15. Network Time Protocol 00:08:10
  16. Vector Clocks 00:12:26
  17. Distributed Consensus: Paxos 00:16:32
  18. Messaging Introduction 00:06:25
  19. Kafka 00:16:13
  20. Zookeeper 00:18:57
  21. Wrap-Up 00:06:13
  22. Getting Started
    1. About The Course 00:02:38
    2. About The Author 00:02:54
    3. What Is A Data Warehouse? 00:03:52
    4. Comparing Operational Applications And Data Warehouses 00:04:31
    5. How To Access Your Working Files 00:01:15
  23. Data Warehouse Overview
    1. Development Approach 00:03:07
    2. Data Sources 00:02:26
    3. Staging Tables 00:01:29
    4. Data Warehouse Model 00:06:57
    5. Data Warehouse Design 00:03:08
    6. Data Warehouse Data 00:02:42
    7. End User Access, Old Data, And Metadata Management 00:01:21
    8. Introduction To The Case Study 00:01:54
  24. Data Sources
    1. Data Modeling Review - Part 1 00:04:49
    2. Data Modeling Review - Part 2 00:03:29
    3. Data Sources Overview 00:02:04
    4. Source Data: Menu Definition 00:02:12
    5. Source Data: Miscellaneous Metadata 00:01:51
    6. Source Data: Customer Order 00:02:16
    7. Source Data: Customer Account 00:01:37
    8. Source Data: Customer Prospect 00:00:31
    9. Source Data: Vendor Procurement 00:01:58
    10. Case Study: Assess Source Data 00:06:36
  25. Staging Tables
    1. Staging Tables Overview 00:03:58
    2. Case Study: Create Staging Model 00:05:13
  26. Data Warehouse Modeling Basics
    1. The Star Schema 00:06:10
    2. Dimension 00:03:27
    3. Fact 00:05:03
    4. Surrogate Keys 00:03:36
    5. The Bus Architecture 00:02:53
    6. Dimensional Modeling And Agile Development 00:01:02
    7. Practical Tips 00:02:44
    8. Self Assessment Test 00:06:40
    9. Case Study: Business Requirements 00:05:04
    10. Case Study: Bus Architecture 00:04:43
  27. Recurrent Dimensions
    1. Date 00:05:45
    2. Time 00:02:53
    3. Customer 00:05:48
    4. Account 00:01:41
    5. Employee 00:03:15
    6. Unit of Measure 00:03:20
    7. Product 00:02:45
    8. Currency 00:02:39
    9. Audit 00:01:14
    10. Case Study: Initial Warehouse Model - Part 1 00:06:02
    11. Case Study: Initial Warehouse Model - Part 2 00:03:58
    12. Case Study: Initial Warehouse Model - Part 3 00:02:36
  28. DW Modeling - Advanced Dimension
    1. Kinds of Conformed Dimensions 00:01:45
    2. Junk Dimension 00:05:14
    3. Degenerate Dimension 00:05:19
    4. Slowly Changing Dimension - Part 1 00:07:40
    5. Slowly Changing Dimension - Part 2 00:05:32
    6. Snowflake, Outrigger, and Bridge 00:04:24
    7. Swappable Dimension 00:02:17
    8. Master Dimension 00:01:26
    9. Hierarchy 00:07:03
    10. Practical Tips 00:02:36
    11. Self Assessment Test 00:09:10
    12. Case Study: Elaborate Dimensions 00:03:18
  29. DW Modeling - Advanced Fact
    1. Kinds Of Facts 00:01:30
    2. Transaction Fact 00:02:28
    3. Periodic Snapshot 00:01:32
    4. Accumulating Snapshot 00:01:51
    5. Aggregate Fact 00:01:36
    6. Consolidated Fact 00:01:06
    7. Practical Tips 00:01:10
    8. Case Study: Elaborate Facts 00:04:23
  30. Data Warehouse Modeling Recap
    1. Warehouse Modeling Review 00:01:47
    2. Common Warehouse Modeling Mistakes 00:02:42
  31. Data Warehouse Design
    1. Conceptual, Logical, Physical Models 00:00:55
    2. System Attributes - Part 1 00:06:22
    3. System Attributes - Part 2 00:04:57
    4. Data Types And Domains 00:07:34
    5. Nullability 00:09:52
    6. Constraints 00:03:03
    7. Data Warehouse Tuning - Part 1 00:03:38
    8. Data Warehouse Tuning - Part 2 00:08:57
    9. Views - Part 1 00:09:24
    10. Views - Part 2 00:05:03
    11. Miscellaneous Aspects Of Design 00:02:40
    12. Practical Tips 00:01:49
    13. Self Assessment Test 00:04:48
    14. Case Study: Create Staging SQL 00:02:15
    15. Case Study: Execute Staging SQL 00:07:28
    16. Case Study: Create Warehouse SQL 00:07:45
    17. Case Study: Execute Warehouse SQL 00:01:26
  32. Data Warehouse Data
    1. Warehouse Data Overview 00:01:59
    2. Source-To-Target Mappings 00:12:18
    3. Data Profiling 00:12:55
    4. Loading Staging Tables - Part 1 00:10:38
    5. Loading Staging Tables - Part 2 00:07:53
    6. Loading The Date and Time Dimensions - Part 1 00:11:07
    7. Loading The Date and Time Dimensions - Part 2 00:05:32
    8. Initial Warehouse Loading: Dimensions 00:13:29
    9. Initial Warehouse Loading: Facts 00:15:30
    10. Updating The Warehouse 00:02:07
    11. Warehouse Data Processing And Agile Development 00:01:55
    12. Case Study: Load Warehouse Data 00:01:50
  33. End User Access
    1. End User Access Overview 00:04:29
    2. Case Study: Analyze Data - Part 1 00:12:50
    3. Case Study: Analyze Data - Part 2 00:10:34
  34. Data And Metadata Management
    1. Offload Of Old Data 00:02:07
    2. Metadata Management 00:02:43
  35. Conclusion
    1. Course Wrap-Up 00:04:42
  36. In Search Of Database Nirvana
    1. The Swinging Database Pendulum 00:18:28
    2. Hybrid Transaction/Analytical Processing Workloads 00:32:32
    3. Query Versus Storage Engines 00:23:12
    4. The Challenges Of HTAP 00:10:44
  37. Getting Started
    1. Introduction To Elasticsearch 00:02:24
    2. About The Author 00:00:56
    3. How To Access Your Working Files 00:01:15
  38. Basic Operations
    1. Installing And Configuring Elasticsearch 00:08:08
    2. Document CRUD - Creating, Retrieving, Updating And Deleting 00:04:19
    3. Running Searches And Aggregations 00:07:09
  39. Data Structure
    1. Mappings And Predefined Fields 00:05:54
    2. Core Types For Your Own Fields 00:06:34
    3. Using Predefined And Custom Analyzers 00:05:24
  40. Queries And Relevance
    1. Returning Specific Fields, Sorting And Pagination 00:05:20
    2. Full-Text Search With Match And Multi-Match Queries 00:08:38
    3. Using The Lucene Query Syntax In Query Strings 00:07:56
    4. Combining Full-Text And Term-Oriented Queries With The Bool Query 00:07:14
    5. Tuning Relevance 00:06:06
  41. Aggregations
    1. Using Queries And Aggregations Together In A Cluster 00:05:09
    2. Combining Different Kinds Of Aggregations 00:05:53
    3. Important Aggregation Types 00:07:00
  42. Document Relationships
    1. Objects And Nested Documents 00:05:07
    2. Parent-Child Relations 00:06:31
    3. Denormalizing And Application-Side Joins 00:03:24
  43. Performance And Scaling
    1. Optimizing Indexing And Searching 00:08:01
    2. Optimizing Node Settings 00:08:18
    3. Configuring Shards And Replicas 00:05:34
    4. Scaling Strategies 00:07:33
  44. Monitoring And Administration
    1. Easy Maintenance With Aliases And Index Templates 00:02:31
    2. Tuning Your Cluster For Stability 00:04:30
    3. Monitoring Elasticsearch Logs And Metrics 00:07:17
    4. Backups And Upgrades 00:04:59
  45. Conclusion
    1. Course Wrap-Up 00:03:44
  46. Data Model And Architecture
    1. Introduction To Accumulo 00:03:42
    2. About The Author 00:02:41
    3. The Accumulo Data Model 00:05:25
    4. Architecture 00:05:20
    5. How To Access Your Working Files 00:01:15
  47. Working With Accumulo
    1. Installation And Configuration 00:05:36
    2. Running And Monitoring 00:04:25
    3. Using The Shell 00:03:59
  48. Basic Application Development
    1. Starting Development 00:03:02
    2. Writing Data 00:05:41
    3. Reading Data 00:04:20
    4. Table API 00:03:09
  49. Application Security
    1. Authentication 00:03:03
    2. Authorization 00:04:11
  50. Intermediate Application Development
    1. Updates And Deletes 00:04:17
    2. Writing Secondary Indexes 00:04:46
    3. Reading Secondary Indexes 00:03:19
    4. Handling Hardware Failure 00:04:15
  51. Advanced Application Development
    1. Mapreduce 00:03:04
    2. Spark 00:02:22
    3. Iterators 00:03:40
    4. Thrift Proxy 00:03:53
  52. Performance
    1. Table Design 00:06:23
    2. Optimization Features 00:07:07
  53. Administration
    1. Monitoring 00:04:58
    2. Table Management 00:04:08
    3. Importing And Exporting Tables 00:04:36
    4. Cluster Changes 00:04:02
    5. Replication 00:07:35
  54. Conclusion
    1. Conclusion 00:01:07
  55. Course Overview
    1. About This Course 00:10:55
    2. About The Instructor 00:07:07
    3. Course Sittings 00:07:06
    4. At The End Of This Course 00:07:00
    5. How To Access Your Working Files 00:01:15
  56. Tooling
    1. Initializing Amazon Web Services 00:07:56
    2. Using Cloudera Director To Spin Up A Test Cluster 00:09:25
    3. Crash Course In Cloudera Manager 00:17:04
  57. Hadoop Insecurities
    1. Permissions And Encryption 00:09:28
    2. Where Permissions Stop 00:04:31
    3. Hive: Transform Harmful 00:06:58
  58. Authentication With MIT Kerberos
    1. Installing MIT Kerberos 00:08:57
    2. Enabling Kerberos Authentication 00:06:54
    3. Using MIT Kerberos 00:04:28
    4. Submitting Jobs And Running Queries With Kerberos Auth 00:06:48
  59. Authentication With Active Directory
    1. Installing An AD Server 00:06:30
    2. Preparing AD Server For Hadoop 00:03:46
    3. Impala LDAP Authentication With Active Directory 00:07:36
    4. Using Hue With Active Directory 00:09:10
    5. Preparing Cluster With Kerberos Authentication 00:04:41
    6. Running The CM Wizard 00:03:54
    7. Using Kerberos 00:06:05
    8. Sharing Kerberos Tickets With Active Directory 00:07:56
  60. Authorization
    1. No Authorization 00:02:04
    2. Enabling Sentry Authorization 00:05:56
    3. Using Sentry - Defining Roles 00:07:16
    4. Using Sentry - Querying With Hue 00:05:29
    5. Custom Code And Hive UDFs With Sentry 00:03:51
    6. HDFS Extended ACLs 00:05:17
    7. HDFS Sentry Sync 00:02:13
    8. Sentry Authentication With Solr - Part 1 00:06:02
    9. Sentry Authentication With Solr - Part 2 00:04:18
  61. Encryption
    1. Creating An HDFS Encryption Zone 00:05:27
    2. Using HDFS Encryption Zones 00:02:36
    3. SSL: Crash Course In SSL Tools 00:08:40
    4. SSL: Preparing A Cluster For SSL Using A Self-Signed Root CA 00:04:27
    5. SSL: Enabling SSL For HDFS And Yarn 00:03:46
    6. SSL: Verifying SSL With HDFS And Yarn 00:02:52
    7. SASL Hive And HiveServer2 00:02:41
    8. SSL With HBase And Oozie 00:01:57
    9. SSL With Impala 00:01:34
    10. SSL With Hue 00:05:29
  62. Developer Topics
    1. UserGroupInformation Basics 00:09:46
    2. Delegation Tokens 00:03:53
    3. Secure Impersonation 00:08:55
  63. Administrator Topics
    1. Role Assignments And Gateway Isolation 00:12:28
    2. Hbase ACLs 00:05:16
    3. Audits 00:06:19
    4. Sqoop 00:04:50
    5. Joining An AD Domain 00:11:31
  64. Secure Hadoop Topics
    1. The Secure Hadoop Market 00:05:04
    2. Cheats 00:05:52
  65. Conclusion
    1. Wrap Up 00:04:40