You are previewing Aster Data SQL and MapReduce.
O'Reilly logo
Aster Data SQL and MapReduce

Book Description

The Aster Data SQL and MapReduce book shows you the fundamental architecture that will allow you to fully understand how Aster Data works. You will be able to create tables, perform partitioning, follow the best Aster modeling strategies, and have a great reference for Aster Data SQL. You will also elevate your knowledge immensely from the brilliant MapReduce portion of the books. You will have MapReduce examples, explanations, and workshops that are designed to make you a MapReduce wizard.

Table of Contents

  1. Cover
  2. The Tera-Tom Genius Series
  3. Tera-Tom- Author of over 50 Books
  4. The Best Query Tool Works on all Systems
  5. Trademarks and Copyrights
  6. About Tom Coffing
  7. About John Nolan
  8. Contents
  9. Chapter 1 – The Aster Data Architecture
    1. What is Parallel Processing?
    2. Aster Data is a Parallel Processing System
    3. Each vworker holds a Portion of Every Table
    4. The Rows of a Table are Spread Across All vworkers
    5. Aster Tables are defined as Fact or Dimension when Created
    6. Fact Table
    7. A More Detailed Look at the Fact Table Distribution
    8. Dimension Table are Replicated
    9. A Dimension Table is often Replicated across vworkers
    10. Aster Data has Fact and Dimension Tables
    11. Aster Tables are defined as Fact or Dimension when Created
    12. Fact and Dimension Tables can be Hashed by the same Key
    13. Distribution Key Rules
    14. Aster Data Uses a Hash Formula
    15. The Hash Map Determines which vworker will own the Row
    16. The Hash Formula, Hash Map and vworker
    17. Placing rows on the vworker
    18. Placing rows on the vworker Continued
    19. A Review of the Hashing Process
    20. Like Data Hashes to the Same vworker
    21. Distribution Key Data Types
    22. Run ANALYZE to COLLECT STATISTICS on a Table
    23. Some Examples of ANALYZE
    24. What Columns to Analyze
  10. Chapter 2 - Four Options for Aster Data Table Design
    1. There are Four Options to Aster Table Design
    2. Straight up Distribute by Hash
    3. Straight up Distribute by Hash - Problems
    4. Straight up Distribute by Replication
    5. Partition the Table with Logical Partitioning
    6. This Partitioned Table Sorts Rows by Month of Order_Date
    7. An All vworkers Retrieve By Way of a Single Partition
    8. You can Partition a Table by Range or by List
    9. A Partitioned By List Example with Three Tactical Queries
    10. Aster Data Multi-Level Partitioning
    11. Aster Allows for Multi-Level Partitioning
    12. SQL Commands for Logical Partitioning as One Table
    13. What Partitions are on my Table?
    14. What does a Columnar Table look like?
    15. A Comparison of Data for Normal Vs. Columnar
    16. A Columnar Table is best for Queries with Few Columns
    17. Quiz – How Many Blocks Move to vworker Memory?
    18. Answer – How Many Containers Move to vworker Memory?
    19. When to use a Columnar Table
  11. Chapter 3 - How Joins Work Inside the Aster Engine
    1. Aster Join Quiz
    2. Aster Join Quiz Answer
    3. The Joining of Two Tables
    4. Aster Moves Joining Rows to the Same vworker
    5. Because of the Join Rule – Dimension Table are Replicated
    6. The Two Different Philosophies for Table Join Design
    7. What Could You Do If Two Tables Joined 1000 Times a Day?
    8. Fact and Dimension Tables can be Hashed by the same Key
    9. Joining Two Tables with the same PK/FK Distribution Key
    10. A Join With Co-Location
    11. A Performance Tuning Technique for Large Joins
    12. The Joining of Two Tables with an Additional WHERE Clause
    13. Aster Performs Joins Using Three Different Methods
    14. The Hash Join
    15. The Merge Join
    16. Nested Loop Joins
  12. Chapter 4 - Temporary and Analytic Tables
    1. Aster has Three Types of Data
    2. Create a Permanent Table Using Create Table AS (CTAS)
    3. Create a Logically Partitioned Table and Populate It
    4. Create a Temporary Table with using Create Table AS (CTAS)
    5. A Temporary Table in Action
    6. A Temporary Table That Uses an Insert/Select
    7. Create an Analytic Table Using an Insert/Select
    8. Create an Analytic Table Using CREATE TABLE AS (CTAS)
    9. Operations that Invalidate an Analytic Table
    10. If an Analytic Table is Invalid
    11. Tera-Tom History
  13. Chapter 5 – Tera-Tom’s Top Tips
    1. Tera-Tom’s Top Tips
    2. Tera-Tom’s Top Tips # 2
    3. Tera-Tom’s Top Tips #3
    4. Tera-Tom’s Top Tips # 3 Rewritten
    5. Tera-Tom’s Top Tips #4
    6. When the GROUP BY Column is NOT the Distribution Key
    7. Example of GROUP BY Column is NOT the Distribution Key
    8. Tera-Tom’s Top Tips #5
    9. Tera-Tom’s Top Tips #6 – Use EXPLAIN
    10. Query Plan and Estimates
    11. Explain Plan Showing a Hash Join
    12. Explain Plan Showing a Merge Join
    13. Explain Plan Showing a Nested Loop Join
  14. Chapter 6 - Indexes
    1. There are Only Three Types of Scans
    2. Guidelines for Indexes
    3. An Index Syntax Example
    4. The B-Tree Index
    5. Which Columns Might You Create an Index?
    6. Answer - Which Columns Might You Create an Index?
    7. A Visual of an Index (Conceptually)
    8. A Query Using an Index Uses All vworkers
    9. Multicolumn indexes
    10. A NUSI BITMAP Theory
    11. A NUSI Bitmap in Action
    12. Indexes on Expressions
    13. Indexes on Extracts of Dates
    14. GiST Indexes
    15. Five Operational Tips for Efficient Indexing
    16. REINDEX
    17. createCompressedIndexOnCompressedTableByDefault Flag
  15. Chapter 7 – SQL-MapReduce
    1. MapReduce History
    2. What is MapReduce?
    3. What is SQL-MapReduce?
    4. SQL-MapReduce Input
    5. SQL-MapReduce Output
    6. Subtle SQL-MapReduce Processing
    7. Aster Data Provides an Analytic Foundation
    8. Path Analysis
    9. Text Analysis
    10. Statistical Analysis
    11. Segmentation (Data Mining)
    12. Graph Analysis
    13. Transformation of Data
    14. Sessionize
    15. Tokenize
    16. SQL-MapReduce Function . . . nPath
    17. nPath SELECT Clause
    18. nPath ON Clause
    19. nPath PARTITION BY Expression
    20. nPath DIMENSION Expression
    21. nPath ORDER BY Expression
    22. nPath MODE Clause has Overlapping or NonOverlapping
    23. nPath PATTERN Clause
    24. Pattern Operators
    25. Pattern Operators Order of Precedence
    26. Matching Patterns Which Repeat
    27. nPath SYMBOLS Clause
    28. nPath RESULTS Clause
    29. Adding an Aggregate to nPath Results
    30. Adding an Aggregate to nPath Results (Continued)
    31. SQL-MapReduce Examples - Use Regular SQL
    32. SQL-MapReduce Examples - Create Objects
    33. SQL-MapReduce Examples - Subquery
    34. SQL-MapReduce Examples - Query as Input
    35. SQL-MapReduce Examples - Nesting Functions
    36. SQL-MapReduce Examples - Functions in Derived Tables
    37. SQL-MapReduce Examples - SMAVG
    38. SQL-MapReduce Examples - Pack Function
    39. SQL-MapReduce Examples - Pack Function (Continued)
    40. SQL-MapReduce Examples - Pivot Columns
    41. Workshop: Create This Table
    42. Login to your GNOME Terminal
    43. Login to your Linux
    44. Using the GNOME Terminal Unzip the bank_web_data.zip
    45. Use the Function ncluster_loader to Load the Bank Data
    46. Run this nPath Map Reduce Function on your Table
    47. nPath in Action
    48. Operators at their Simplest
    49. Pattern
    50. Accumulate
    51. Accumulate With All Pages
    52. Accumulate – nPath with a WHERE Clause
    53. SQL-MapReduce Examples - Path Generator
    54. SQL-MapReduce Examples - Path Generator (Continued)
    55. SQL-MapReduce Examples - Path Generator (Continued)
    56. SQL-MapReduce Examples - Path Generator (Continued)
    57. SQL-MapReduce Examples - Path Generator (Continued)
    58. SQL-MapReduce Examples - Linear Regression
    59. SQL-MapReduce Examples - Linear Regression (Continued)
    60. SQL-MapReduce Examples - Linear Regression (Continued)
    61. SQL-MapReduce Examples - Naive Bayes
    62. SQL-MapReduce Examples - Naive Bayes (Continued)
    63. SQL-MapReduce Examples - Naive Bayes (Continued)
    64. SQL-MapReduce Examples - Naive Bayes (Continued)
    65. SQL-MapReduce Examples - Naive Bayes (Continued)
    66. SQL-MapReduce Examples - Naive Bayes (Continued)
    67. SQL-MapReduce Examples - Naive Bayes (Continued)
    68. SQL-MapReduce Examples - Naive Bayes (Continued)
    69. Join Aster, Teradata and Hadoop Tables; feed into MapReduce
    70. Run Both of these Examples Together and Compare
    71. Run this nPath Map Reduce Function
    72. nPath in Action
    73. Another nPath Example
    74. Finding Out What Functions You Have Installed
    75. Workshop 1 – Fill in the x’s
    76. Answer Workshop 1 - Fill in the x’s
    77. Workshop 2 – Fill in the x’s
    78. Answer Workshop 2 – Fill in the x’s
    79. Answer Workshop 2 – You Could Have Used a GROUP BY
    80. Workshop 3 – Add to the Query
    81. Workshop 3 – Answer to Add to the Query
    82. Workshop 4 – Fill in the x’s
    83. Answer to Workshop 4 – Fill in the x’s
    84. Workshop 5 – Find that Customer
    85. Answer to Workshop 5 – Find that Customer
    86. Workshop 6 – Change the MapReduce Function
    87. Answer to Workshop 6 – Change the MapReduce Function
    88. Workshop 7 – Build the MapReduce Function
    89. Answer to Workshop 7 – Build the MapReduce Function
    90. Best Answer to Workshop 7 – Build the MapReduce Function
    91. Workshop 8 – Build the Accumulate in the Result
    92. Answer to Workshop 8 – Build the Accumulate in the Result
    93. SQL-MapReduce Examples - Linear Regression (Continued)
    94. Workshop 9 – Build the Subquery
    95. Answer to Workshop 9 – Build the Subquery
    96. Workshop 10 – Do Your First Join
    97. Answer to Workshop 10 – Do Your First Join
    98. Answer to Workshop 10 – Do the Join Using a New Syntax
    99. Workshop 11 – Super Join the Tables
    100. Answer to Workshop 11 –Super Join the Tables
    101. Answer to Workshop 11 – Super Join the Tables
    102. Workshop 12 – Sessionize the Data
    103. Answer to Workshop 12 – Sessionize the Data
    104. Workshop 13 – What is this Query Doing?
    105. Answer to Workshop 13 – What is this Query Doing?
    106. Workshop 14 – Using ilike
    107. Answer to Workshop 14 – Using ilike
    108. Answer to Workshop 14 – Using ilike
    109. Workshop 15 – What are the First Two Pages Visited?
    110. Workshop 15 – What are the First Two Pages Visited?
    111. Workshop 16 – Advanced - First Two Pages Visited?
    112. Answer to Workshop 16 Advanced - First Two Pages Visited?
    113. Workshop 17 – Can You Clean Up the Results?
    114. Answer to Workshop 17 – Can You Clean Up the Results?
    115. Answer to Workshop 17 – Format the Date
    116. Workshop 18 – Build a Churn Table
    117. Workshop 18 – Run the Query Before Building to Test
    118. Workshop 18 – A Better Example
    119. Answer to Workshop 18 – Build a Basic Churn Table
    120. Workshop 18 – Create the Churn Table with a Better Example
    121. Multi-Case
    122. The Multi-Case Function
    123. The Multi-Case Function in Nexus
    124. The Multi-Case Function Mixing and Matching
    125. The Multi-Case Function Mixing and Matching
    126. SQL-MapReduce Examples - cFilter
    127. SQL-MapReduce Examples - cFilter (Continued)
    128. SQL-MapReduce Examples - Linear Regression (Continued)
    129. SQL-MapReduce Examples - cFilter (Continued)
    130. SQL-MapReduce Examples - Linear Regression (Continued)
    131. SQL-MapReduce Examples - cFilter (Continued)
    132. SQL-MapReduce Examples - cFilter (Continued)
    133. SQL-MapReduce Examples - cFilter (Continued)
    134. SQL-MapReduce Examples - cFilter (Continued)
    135. SQL-MapReduce Examples - cFilter (Continued)
    136. SQL-MapReduce Examples - cFilter (Continued)
    137. SQL-MapReduce Examples - cFilter (Continued)
    138. CFILTER in Action with Bank_Web_Clicks
    139. CFILTER in Action
    140. CFILTER using Nexus
    141. nPath Error
  16. Chapter 8 – Time and Date
    1. Date, Time, and Timestamp Keywords
    2. Add or Subtract Days from a date
    3. The to_char command
    4. A Summary of Math Operations on Dates
    5. Using a Math Operation to find your Age in Years
    6. Find What Day of the week you were Born
    7. Date Related Functions
    8. The EXTRACT Command
    9. EXTRACT from DATES and TIME
    10. EXTRACT with DATE and TIME Literals
    11. EXTRACT of the Month on Aggregate Queries
    12. A Side Title example with Reserved Words as an Alias
    13. Implied Extract of Day, Month and Year
    14. DATE_PART Function
    15. DATE_TRUNC Function
    16. DATE_TRUNC Function using TIME
    17. Aster NOW() Function
  17. Chapter 9 – Aster Windows Functions
    1. Cumulative Sum
    2. Cumulative Sum - Major and Minor Sort Key(s)
    3. The ANSI CSUM – Getting a Sequential Number
    4. The ANSI OLAP – Reset with a PARTITION BY Statement
    5. PARTITION BY only Resets a Single OLAP not ALL of them
    6. ANSI Moving Sum is Current Row and Preceding n Rows
    7. How ANSI Moving SUM Handles the Sort
    8. Quiz – How is that Total Calculated?
    9. Answer to Quiz – How is that Total Calculated?
    10. Moving SUM every 3-rows vs. a Continuous Sum
    11. Moving Average
    12. Quiz – How is that Total Calculated?
    13. Answer to Quiz – How is that Total Calculated?
    14. Quiz – How is that 4th Row Calculated?
    15. Answer to Quiz – How is that 4th Row Calculated?
    16. Partition By Resets an ANSI OLAP
    17. Moving Average Using BETWEEN
    18. Moving Difference using ANSI Syntax
    19. Moving Difference using ANSI Syntax with Partition By
    20. RANK Defaults to Ascending Order
    21. Getting RANK to Sort in DESC Order
    22. You can use Window Functions in Expressions
    23. RANK() OVER and PARTITION BY
    24. DENSE_RANK() OVER
    25. PERCENT_RANK() OVER
    26. PERCENT_RANK() OVER with 14 rows in Calculation
    27. PERCENT_RANK() OVER with 21 rows in Calculation
    28. RANK With ORDER BY SUM()
    29. COUNT OVER for a Sequential Number
    30. Quiz – What caused the COUNT OVER to Reset?
    31. Answer to Quiz – What caused the COUNT OVER to Reset?
    32. The MAX OVER Command
    33. MAX OVER with PARTITION BY Reset
    34. The MIN OVER Command
    35. Quiz – Fill in the Blank
    36. Answer to Quiz – Fill in the Blank
    37. The Row_Number Command
    38. Quiz – How did the Row_Number Reset?
    39. Quiz – How did the Row_Number Reset?
    40. NTILE
    41. NTILE Using a Value of 10
    42. NTILE With a Partition
    43. CUME_DIST
    44. CUME_DIST With a Partition
    45. LEAD
    46. LEAD With Partitioning
    47. LAG
    48. LAG with Partitioning
    49. FIRST_VALUE
    50. FIRST_VALUE After Sorting by the Highest Value
    51. FIRST_VALUE with Partitioning
    52. LAST_VALUE
    53. NTH_VALUE
    54. NTH_VALUE With Partition
    55. SUM(SUM(n))
  18. Chapter 10 – The Fundamental SQL Commands That Work on Aster
    1. BETWEEN is Inclusive
    2. BETWEEN Works for Character Data
    3. LIKE uses Wildcards Percent ‘%’ and Underscore ‘_’
    4. LIKE command Underscore is Wildcard for one Character
    5. GROUP BY Vs. DISTINCT – Good Advice
    6. The Five Aggregates of Aster Data
    7. GROUP BY when Aggregates and Normal Columns Mix
    8. GROUP BY Delivers one row per Group
    9. GROUP BY Dept_No or GROUP BY 1 the same thing
    10. Limiting Rows and Improving Performance with WHERE
    11. WHERE Clause in Aggregation limits unneeded Calculations
    12. Keyword HAVING tests Aggregates after they are Totaled
    13. Keyword HAVING is like an Extra WHERE Clause for Totals
    14. Getting the Average Values per Column
    15. Getting the Average Values per Column
    16. Average Values per Column for All Columns in a Table
    17. A two-table join using Non-ANSI Syntax
    18. A two-table join using Non-ANSI Syntax with Table Alias
    19. Aliases and Fully Qualifying Columns
    20. A two-table join using ANSI Syntax
    21. Both Queries have the same Results and Performance
    22. Quiz – Can You Finish the Join Syntax?
    23. Answer to Quiz – Can You Finish the Join Syntax?
    24. Quiz – Can You Find the Error?
    25. Answer to Quiz – Can You Find the Error?
    26. Quiz – Which rows from both tables Won’t Return?
    27. Answer to Quiz – Which rows from both tables Won’t Return?
    28. LEFT OUTER JOIN
    29. LEFT OUTER JOIN Brings Back All Rows in the Left Table
    30. RIGHT OUTER JOIN
    31. RIGHT OUTER JOIN Brings Back All Rows in the RIGHT Table
    32. FULL OUTER JOIN
    33. FULL OUTER JOIN Brings Back All Rows in All Tables
    34. Which Tables are the Left and which are the Right?
    35. Answer - Which Tables are the Left and which are the Right?
    36. INNER JOIN with Additional AND Clause
    37. ANSI INNER JOIN with Additional AND Clause
    38. ANSI INNER JOIN with Additional WHERE Clause
    39. OUTER JOIN with Additional WHERE Clause
    40. OUTER JOIN with Additional AND Clause
    41. Results from OUTER JOIN with Additional AND Clause
    42. Quiz – Why is this considered an INNER JOIN?
    43. The DREADED Product Join
    44. Result Set of the DREADED Product Join
    45. The Horrifying Cartesian Product Join
    46. The ANSI Cartesian Join will ERROR
    47. Quiz – Do these Joins Return the Same Answer Set?
    48. Answer – Do these Joins Return the Same Answer Set?
    49. How would you Join these two tables?
    50. How would you Join these two tables? You Can’t Yet!
    51. An Associative Table is a Bridge that Joins Two Tables
    52. Quiz – Can you Write the 3-Table Join?
    53. Answer to Quiz – Can you Write the 3-Table Join?
    54. Quiz – Can you Write the 3-Table Join to ANSI Syntax?
    55. Answer – Can you Write the 3-Table Join to ANSI Syntax?
    56. Quiz – Can you Place the ON Clauses at the End?
    57. Answer – Can you Place the ON Clauses at the End?
    58. The 5-Table Join – Logical Insurance Model
    59. Quiz - Write a Five Table Join Using ANSI Syntax
    60. Answer - Write a Five Table Join Using ANSI Syntax
    61. Quiz - Write a Five Table Join Using ANSI Syntax
    62. Answer - Write a Five Table Join Using ANSI Syntax
    63. Quiz - Write a Five Table Join Using Non-ANSI Syntax
    64. Answer - Write a Five Table Join Using Non-ANSI Syntax
    65. Quiz –Re-Write this putting the ON clauses at the END
    66. Answer –Re-Write this putting the ON clauses at the END
    67. The Nexus Query Chameleon Writes the SQL for Users
    68. An IN List is much like a Subquery
    69. An IN List Never has Duplicates – Just like a Subquery
    70. An IN List Ignores Duplicates
    71. The Subquery
    72. How a Basic Subquery Works
    73. The Final Answer Set from the Subquery
    74. Quiz- Answer the Difficult Question
    75. Answer to Quiz- Answer the Difficult Question
    76. Should you use a Subquery or a Join?
    77. Quiz- Write the Subquery
    78. Answer to Quiz- Write the Subquery
    79. Quiz- Write the More Difficult Subquery
    80. Answer to Quiz- Write the More Difficult Subquery
    81. Quiz- Write the Subquery with an Aggregate
    82. Answer to Quiz- Write the Subquery with an Aggregate
    83. Quiz – Write the Triple Subquery
    84. Answer to Quiz – Write the Triple Subquery
    85. CHARACTER_LENGTH AND OCTET_LENGTH
    86. The TRIM Command trims both Leading and Trailing Spaces
    87. Trim and Trailing is Case Sensitive
    88. Trim and Trailing works if Case right
    89. The SUBSTRING Command
    90. How SUBSTRING Works with NO ENDING POSITION
    91. Using SUBSTRING to move Backwards
    92. How SUBSTRING Works with a Starting Position of -1
    93. How SUBSTRING Works with an Ending Position of 0
    94. An Example using SUBSTRING, TRIM and CHAR Together
    95. SUBSTRING and SUBSTR are equal, but use different syntax
    96. The POSITION Command finds a Letters Position
    97. Concatenation
    98. The Basics of CAST (Convert and Store)
    99. Some Great CAST (Convert and Store) Examples
    100. Some Great CAST (Convert and Store) Examples
    101. Combining Searched Case and Valued Case
    102. A Trick for getting a Horizontal Case
    103. Nested Case
    104. Put a CASE in the ORDER BY