You are previewing Teradata Aster Data.
O'Reilly logo
Teradata Aster Data

Book Description

This book will explain Aster Data so that users are instant experts. It will explain how things are set up, how things work, and all of the advanced analytics that Aster can provide. Like the other Genius Series books, it begins at a very basic level and gradually gets more and more detailed.

Table of Contents

  1. Cover
  2. The Tera-Tom Genius Series
  3. Tera-Tom- Author of over 50 Books
  4. The Best Query Tool Works on all Systems
  5. Trademarks and Copyrights
  6. About Tom Coffing
  7. About John Nolan
  8. Contents
  9. Chapter 1 – The Aster Data Architecture
    1. What is Parallel Processing?
    2. Aster Data is a Parallel Processing System
    3. Each vworker holds a Portion of Every Table
    4. The Rows of a Table are Spread Across All vworkers
    5. The Aster Data Architecture
    6. The Queen Node
    7. The Worker Node
    8. The Loader Node
    9. The Backup Node
    10. The Aster Architecture Interconnect
    11. Backup and Loader Nodes Do Not use the Interconnect
    12. The Aster Architecture has Spare Nodes
    13. The Aster Architecture Allows Flexibility based on Need
    14. Aster Data Provides Four Fundamental Hardware Strengths
    15. Replication Failover
    16. Data is Compressed on Data Transfers
    17. Aster Utilizes Dual Optimizers
    18. Aster Allows a Hybrid of SQL and MapReduce
    19. MapReduce History
    20. What is MapReduce?
    21. What is SQL-MR?
    22. Sessionize – An Example of SQL-MR
    23. Support for Mixed Workload Management and Prioritization
  10. Chapter 2 – Fact and Dimension Tables
    1. Aster Tables are defined as Fact or Dimension when Created
    2. Fact Table
    3. A More Detailed Look at the Fact Table Distribution
    4. Dimension Table are Replicated
    5. A Dimension Table is often Replicated across vworkers
    6. Aster Data has Fact and Dimension Tables
    7. Aster Tables are defined as Fact or Dimension when Created
    8. Fact and Dimension Tables can be Hashed by the same Key
    9. Distribution Key Rules
    10. Aster Data Uses a Hash Formula
    11. The Hash Map Determines which vworker will own the Row
    12. The Hash Formula, Hash Map and vworker
    13. Placing rows on the vworker
    14. Placing rows on the vworker Continued
    15. A Review of the Hashing Process
    16. Like Data Hashes to the Same vworker
    17. Distribution Key Data Types
    18. Run ANALYZE to COLLECT STATISTICS on a Table
    19. Some Examples of ANALYZE
    20. What Columns to Analyze
  11. Chapter 3 – How Aster Processes Data
    1. When a Table is Created, a Table Header is Created
    2. Every vworker has the Exact Same Tables
    3. All Aster Tables are spread across All vworkers
    4. The Table Header and the Data Rows are Stored Separately
    5. A vworker Stores the Rows of a Table inside a Data Block
    6. To Read Rows, a vworker Moves the Data Block into Memory
    7. A Full Table Scan Means All vworkers must Read All Rows
    8. The “Achilles Heel”, or Slowest Process, is Block Transfer
    9. Each Table has a Distribution Key
    10. A Query Using the Distribution Key uses a Single vworker
    11. As Rows are Added, a Data Block will Eventually Split
    12. A Full Table Scan Means All vworkers Read All Blocks
    13. Distribution Key Query uses One vworker
    14. Each vworker Can Have Many Blocks for a Single Table
    15. A Full Table Scan Means All vworkers Read All Blocks
    16. Quiz – How Many Blocks Move into vworker Memory?
    17. Answer – How Many Blocks Move into vworker Memory?
    18. Quiz – How Many Blocks Move Using the Distribution Key?
    19. Answer-How Many Blocks Move Using the Distribution Key?
  12. Chapter 4 - Four Options for Aster Data Table Design
    1. There are Four Options to Aster Table Design
    2. Straight up Distribute by Hash
    3. Straight up Distribute by Hash - Problems
    4. Straight up Distribute by Replication
    5. Partition the Table with Logical Partitioning
    6. This Partitioned Table Sorts Rows by Month of Order_Date
    7. An All vworkers Retrieve By Way of a Single Partition
    8. You can Partition a Table by Range or by List
    9. A Partitioned By List Example with Three Tactical Queries
    10. Aster Data Multi-Level Partitioning
    11. Aster Allows for Multi-Level Partitioning
    12. SQL Commands for Logical Partitioning as One Table
    13. What Partitions are on my Table?
    14. What does a Columnar Table look like?
    15. A Comparison of Data for Normal Vs. Columnar
    16. A Columnar Table is best for Queries with Few Columns
    17. Quiz – How Many Blocks Move to vworker Memory?
    18. Answer – How Many Containers Move to vworker Memory?
    19. When to use a Columnar Table
  13. Chapter 5 - How Joins Work Inside the Aster Engine
    1. Aster Join Quiz
    2. Aster Join Quiz Answer
    3. The Joining of Two Tables
    4. Aster Moves Joining Rows to the Same vworker
    5. Because of the Join Rule – Dimension Table are Replicated
    6. The Two Different Philosophies for Table Join Design
    7. What Could You Do If Two Tables Joined 1000 Times a Day?
    8. Fact and Dimension Tables can be Hashed by the same Key
    9. Joining Two Tables with the same PK/FK Distribution Key
    10. A Join With Co-Location
    11. A Performance Tuning Technique for Large Joins
    12. The Joining of Two Tables with an Additional WHERE Clause
    13. Aster Performs Joins Using Three Different Methods
    14. The Hash Join
    15. The Merge Join
    16. Nested Loop Joins
  14. Chapter 6 - Temporary and Analytic Tables
    1. Aster has Three Types of Data
    2. Create a Permanent Table Using Create Table AS (CTAS)
    3. Create a Logically Partitioned Table and Populate It
    4. Create a Temporary Table with using Create Table AS (CTAS)
    5. A Temporary Table in Action
    6. A Temporary Table That Uses an Insert/Select
    7. Create an Analytic Table Using an Insert/Select
    8. Create an Analytic Table Using CREATE TABLE AS (CTAS)
    9. Operations that Invalidate an Analytic Table
    10. If an Analytic Table is Invalid
    11. Tera-Tom History
  15. Chapter 7 – Aster Modeling Rules
    1. Modeling Rules for Aster Data
    2. Three Principles that Govern the Modeling Rules
    3. Modeling Rule 1 – Dimensionalize your Model
    4. A Dimensional Model is called a "Star Schema"
    5. To Read a Data Block, a vworker Moves the Block to Memory
    6. A Dimensional Model Moves Less Mass into Memory
    7. Which Move From Disk to Memory Would You Choose?
    8. Vworkers transfer their Fact Table into Memory in Parallel
    9. Modeling Rule 2 – Use Columnar
    10. Which Move From Disk to Memory Would You Choose?
    11. Let's Discuss Modeling and Joins at the Simplest Level
    12. Let's Discuss Modeling and Joins at the Simplest Level
    13. Let's Discuss Joins at the Simplest Level
    14. Modeling Rule 3 – Distribute your Tables Based on Joins
    15. The Two Different Philosophies for Table Join Design
    16. Facts are Hashed and most often the Dimension is Replicated
    17. Fact and Dimension Tables can be Hashed by the same Key
    18. Joining Two Tables with the same PK/FK Primary Index
    19. A Join With No Redistribution or Duplication
    20. Aster Hates Joining Tables with a Different Distribution Key
    21. Aster Hates to Redistribute by Hash to Join Tables
    22. Modeling Rule 4 – Replicate Dimension Tables
    23. Modeling Rule 5 – Partition Your Tables
    24. Modeling Rule 6 – Make Fact Tables Skinny
    25. Modeling Rule 6 – Make Fact Tables Skinny Example
    26. Modeling Rule 7 – Index Your Tables
    27. The B-Tree Index
    28. Which Columns Might You Create an Index?
    29. Answer - Which Columns Might You Create an Index?
    30. Modeling Rule 8 – Denormalize based on Your Environment
    31. Modeling Rule 8 – Denormalize based on Your Environment
  16. Chapter 8 – Tera-Tom's Top Tips
    1. Tera-Tom's Top Tips
    2. Tera-Tom's Top Tips # 2
    3. Tera-Tom's Top Tips #3
    4. Tera-Tom's Top Tips # 3 Rewritten
    5. Tera-Tom's Top Tips #4
    6. When the GROUP BY Column is NOT the Distribution Key
    7. Example of GROUP BY Column is NOT the Distribution Key
    8. Tera-Tom's Top Tips #5
    9. Tera-Tom's Top Tips #6 – Use EXPLAIN
    10. Query Plan and Estimates
    11. Explain Plan Showing a Hash Join
    12. Explain Plan Showing a Merge Join
    13. Explain Plan Showing a Nested Loop Join
  17. Chapter 9 - Indexes
    1. There are Only Three Types of Scans
    2. Guidelines for Indexes
    3. An Index Syntax Example
    4. The B-Tree Index
    5. Which Columns Might You Create an Index?
    6. Answer - Which Columns Might You Create an Index?
    7. A Visual of an Index (Conceptually)
    8. A Query Using an Index Uses All vworkers
    9. Multicolumn indexes
    10. A NUSI BITMAP Theory
    11. A NUSI Bitmap in Action
    12. Indexes on Expressions
    13. Indexes on Extracts of Dates
    14. GiST Indexes
    15. Five Operational Tips for Efficient Indexing
    16. REINDEX
    17. createCompressedIndexOnCompressedTableByDefault Flag
  18. Chapter 10 – Aster Windows Functions
    1. Cumulative Sum
    2. Cumulative Sum - Major and Minor Sort Key(s)
    3. The ANSI CSUM – Getting a Sequential Number
    4. The ANSI OLAP – Reset with a PARTITION BY Statement
    5. PARTITION BY only Resets a Single OLAP not ALL of them
    6. ANSI Moving Sum is Current Row and Preceding n Rows
    7. How ANSI Moving SUM Handles the Sort
    8. Quiz – How is that Total Calculated?
    9. Answer to Quiz – How is that Total Calculated?
    10. Moving SUM every 3-rows vs. a Continuous Sum
    11. Moving Average
    12. Quiz – How is that Total Calculated?
    13. Answer to Quiz – How is that Total Calculated?
    14. Quiz – How is that 4th Row Calculated?
    15. Answer to Quiz – How is that 4th Row Calculated?
    16. Partition By Resets an ANSI OLAP
    17. Moving Average Using BETWEEN
    18. Moving Difference using ANSI Syntax
    19. Moving Difference using ANSI Syntax with Partition By
    20. RANK Defaults to Ascending Order
    21. Getting RANK to Sort in DESC Order
    22. You can use Window Functions in Expressions
    23. RANK() OVER and PARTITION BY
    24. DENSE_RANK() OVER
    25. PERCENT_RANK() OVER
    26. PERCENT_RANK() OVER with 14 rows in Calculation
    27. PERCENT_RANK() OVER with 21 rows in Calculation
    28. RANK With ORDER BY SUM()
    29. COUNT OVER for a Sequential Number
    30. Quiz – What caused the COUNT OVER to Reset?
    31. Answer to Quiz – What caused the COUNT OVER to Reset?
    32. The MAX OVER Command
    33. MAX OVER with PARTITION BY Reset
    34. The MIN OVER Command
    35. Quiz – Fill in the Blank
    36. Answer to Quiz – Fill in the Blank
    37. The Row_Number Command
    38. Quiz – How did the Row_Number Reset?
    39. Quiz – How did the Row_Number Reset?
    40. NTILE
    41. NTILE Using a Value of 10
    42. NTILE With a Partition
    43. CUME_DIST
    44. CUME_DIST With a Partition
    45. LEAD
    46. LEAD With Partitioning
    47. LAG
    48. LAG with Partitioning
    49. FIRST_VALUE
    50. FIRST_VALUE After Sorting by the Highest Value
    51. FIRST_VALUE with Partitioning
    52. LAST_VALUE
    53. NTH_VALUE
    54. NTH_VALUE With Partition
    55. SUM(SUM(n))
  19. Chapter 11 – SQL-MapReduce
    1. MapReduce History
    2. What is MapReduce?
    3. What is SQL-MapReduce?
    4. SQL-MapReduce Input
    5. SQL-MapReduce Output
    6. Subtle SQL-MapReduce Processing
    7. Aster Data Provides an Analytic Foundation
    8. Path Analysis
    9. Text Analysis
    10. Statistical Analysis
    11. Segmentation (Data Mining)
    12. Graph Analysis
    13. Transformation of Data
    14. Sessionize
    15. Tokenize
    16. SQL-MapReduce Function . . . nPath
    17. nPath SELECT Clause
    18. nPath ON Clause
    19. nPath PARTITION BY Expression
    20. nPath DIMENSION Expression
    21. nPath ORDER BY Expression
    22. nPath MODE Clause has Overlapping or NonOverlapping
    23. nPath PATTERN Clause
    24. Pattern Operators
    25. Pattern Operators Order of Precedence
    26. Matching Patterns Which Repeat
    27. nPath SYMBOLS Clause
    28. nPath RESULTS Clause
    29. Adding an Aggregate to nPath Results
    30. Adding an Aggregate to nPath Results (Continued)
    31. SQL-MapReduce Examples - Use Regular SQL
    32. SQL-MapReduce Examples - Create Objects
    33. SQL-MapReduce Examples - Subquery
    34. SQL-MapReduce Examples - Query as Input
    35. SQL-MapReduce Examples - Nesting Functions
    36. SQL-MapReduce Examples - Functions in Derived Tables
    37. SQL-MapReduce Examples - SMAVG
    38. SQL-MapReduce Examples - Pack Function
    39. SQL-MapReduce Examples - Pack Function (Continued)
    40. SQL-MapReduce Examples - Pivot Columns