You are previewing Teradata Database Administration – Teradata Internals.
O'Reilly logo
Teradata Database Administration – Teradata Internals

Book Description

This DBA book takes the reader deep inside the guts of Teradata, down to the row level, block level, and cylinder level. Readers have an opportunity to understand Teradata at a level that 1% of the world understands Teradata. This book will bring you into the top 1% of knowledge of Teradata.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. About Tom Coffing
  5. About Todd Wilson
  6. Contents
  7. Chapter 1 - The Cold, Hard Teradata Facts
    1. What is Parallel Processing?
    2. The Basics of a Single Computer
    3. Teradata Parallel Processes Data
    4. Parallel Architecture
    5. The Teradata Architecture
    6. All Teradata Tables are spread across All AMPS
    7. Teradata Systems can Add AMPs for Linear Scalability
    8. Understand that Teradata can Scale to Incredible Size
    9. AMPs and Parsing Engines (PEs) Live inside SMP Nodes
    10. Each Node is attached via a Network to a Disk Farm
    11. Two SMP Nodes Connected become One MPP System
    12. There are Many Nodes in a Teradata Cabinet
    13. Inside a Teradata Node
    14. The Boardless BYNET and the Physical BYNET
    15. The Parsing Engine
    16. The AMPs Responsibilities
    17. Teradata Parallel Processing
    18. Each Table has a Primary Index that is Unique or Non-Unique
    19. The Hash Map Determines which AMP will Own the Row
    20. A Unique Primary Index Spreads the Data Evenly
    21. The AMP Adds a Uniqueness Value to Create the Row-ID
    22. Each AMP Sorts their Tables by the Row-ID
    23. A Non-Unique Primary Index Skews the Data
    24. Comparing the Same Table with Different Primary Indexes
    25. Unique Primary Index Queries are a Single AMP Retrieve
    26. A Non-Unique Primary Index is also a Single AMP Retrieve
    27. Teradata has a No Primary Index Table Called a NoPI Table
    28. There are Normal Tables and then there are Partitioned Tables
    29. A Visual of One Year of Data with Range_N Per Month
    30. Partitioning is Designed to Eliminate the Full Table Scan
    31. A Partition # and Row-ID = Row Key
    32. An AMP Stores its Rows Sorted in only Two Different Ways
    33. AMPs Moves their Data Blocks into Memory to Read/Write
    34. The Most Taxing thing for an AMP is Moving Blocks into Memory
    35. Rows are Stored in Data Blocks which are stored in Cylinders
    36. Rows for an AMP Stored inside a Data Block in a Cylinder
    37. An AMP’s Master Index is Used to Find the Right Cylinder
    38. The Row Reference Array (RRA) Does the Binary Search
    39. A Block Splits into Two Blocks at Maximum Block Size
    40. Data Blocks Maximum Block Size has Changed (V14.10)
    41. The New Block Split with Teradata V14.10
    42. The Block Split with Even More Detail in Teradata V14.10
    43. There is One Master Index and Thousands of Cylinder Indexes
    44. Each Table has a 48-bit TableID
  8. Chapter 2 - How Teradata Tracks Objects
    1. Teradata Assigns Each Object a Unique Numeric ID
    2. The Table ID
    3. The Table ID in Greater Detail
    4. Looking at the TableID inside the Actual Cylinders
    5. A More Detailed View of TableID inside the Actual Cylinders
    6. The Blocks Below are All Associated with the Same Table
    7. Bits, Bytes, and More
    8. Cylinder Sizes
  9. Chapter 3 – AMP Worker Tasks
    1. Teradata is a Message Passing System
    2. The Parsing Engine Parses the SQL and Comes up with a Plan
    3. What is an AMP Worker Task (AWT)?
    4. Each AMP has 80 AMP Worker Tasks (AWTs)
    5. Each Query Takes Up One or More AMP Worker Tasks
    6. An All-AMP Query Usually Won’t Use More Than 4 AWTs
    7. A Live Example of AWTs in Action
    8. There are 24 AWTs Reserved for Internal Work
    9. How Utilities Use AWTs
    10. Monitoring AMP Worker Tasks with ResAMPCpuByGroup
  10. Chapter 4 - Deep Dive Overhead for Each Row
    1. Why Go Deep inside the Overhead of a Row?
    2. A Row Layout in Teradata
    3. Row Length
    4. Row ID
    5. How the Row Hash is Created for Each Row
    6. Unique Primary Indexes have Even Distribution
    7. The AMP Adds a Uniqueness Value to its Rows
    8. The Row-Hash is 32-bits and so is the Uniqueness Value
    9. Non-Unique Primary Indexes have Skewed Data
    10. Flag Byte
    11. Presence Byte
    12. Presence Byte is used to show Null Values in Each Row
    13. A Close-up Look at the Presence Byte for Nulls
    14. An Extreme Example to Look at the Presence Byte for Nulls
    15. Quiz – How Many Presence Bits used for these Columns?
    16. Answer – How Many Presence Bits used for these Columns?
    17. Quiz – How Many Presence Bits used with NOT NULL?
    18. Answer – How Many Presence Bits used with NOT NULL?
    19. Quiz – Which bit will be Set to a One?
    20. Answer – Which bit will be Set to a One?
    21. Quiz – How Many Presence Bits Needed Now?
    22. Answer – How Many Presence Bits Needed Now?
    23. What Happens when we need More than One Presence Byte?
    24. An Example that Must use a 2nd Presence Byte
    25. An Example that Must use a 2nd Presence Byte
    26. Quiz – Answer the Presence Bit/Byte Questions?
    27. Answers to the – Answer the Presence Bit/Byte Questions?
    28. Quiz - How Many Nullable Columns are Possible?
    29. Answer- How Many Nullable Columns are Possible?
  11. Chapter 5 - Compression
    1. Important Information about Compression
    2. Presence Bytes are also used for Compression
    3. Why One Byte (8 bits) can Represent up to 255 Values
    4. Answers to One Byte (8 bits) can Represent up to 255 Values
    5. Now that you Understand that 8 Bits can Represent 0 – 255
    6. A Compression Example that Compresses Two Values
    7. A Compression Example that Compresses Three Values
    8. Quiz – Name that Compression Value
    9. The Next Important Concept in Compression
    10. Quiz – Can you Fill in the Compression Values?
    11. Answer – Can you Fill in the Compression Values?
    12. The Last Major Concept in Compression
    13. Quiz – Using One Presence Byte for Multiple Columns
    14. Answer – Using One Presence Byte for Multiple Columns
    15. Quiz – How Many Presence Bytes are Needed?
    16. Answer – How Many Presence Bytes are Needed?
    17. Advanced Quiz – Fill in the Presence Bits?
    18. Answer to Advanced Quiz – Fill in the Presence Bits?
    19. The Cost Vs. the Savings
    20. The Cost List of Compression
    21. A Deeper Dive into NULL Values
    22. Quiz - How Much Space did we Just Save?
    23. Answer - How Much Space did we Just Save?
    24. Advanced Quiz - How Much Space did we Just Save?
    25. Advanced Quiz - How Much Space did we Just Save?
    26. Using the DBC Tables in a Compression Experiment
    27. A Compression Test
    28. A Compression Test
    29. We then Moved all Eight Tables to another Database
    30. Compression Reports with Nexus and SmartCompress
    31. We then Created Two Global Temporary Tables
    32. We then Created and Executed Our Macro
    33. Report Comparing Compressed and NonCompressed Tables
  12. Chapter 6 - Data Stored in the Row
    1. The Varchar Offset
    2. An Example of a Varchar Offset
    3. An Example of Two Varchar Columns
    4. The Fixed Columns
    5. An Example with Multiple Fixed Columns
    6. Compressible Columns
    7. An Example with Fixed Columns and a Compression Column
    8. An Example with a Fixed Column and a Compressed Varchar
    9. Varchar Columns
    10. An Example of a Fixed Column and a Varchar
    11. Teradata’s Maximum Row Size
  13. Chapter 7 - How Data Rows are Stored in Blocks
    1. Why Go Deep inside Data Blocks?
    2. In the Beginning a Table is created
    3. Every AMP has the Exact Same Tables
    4. Rows are Stored in Blocks
    5. Each Table Header and Data Block have the Same TableID
    6. AMPs Moves their Data Blocks into Memory to Read/Write
    7. AMPs can Read/Write their Rows once they are in FSG Cache
    8. Every Data Block Starts with a Data Block Header
    9. Every Data Block Ends with a Data Block Trailer
    10. Each Block has a Row Reference Array (RRA)
    11. The Row Reference Array (RRA) is in Descending Order
    12. A Binary Search is always done through the RRA
    13. A Binary Search is a Quick Search among Thousands of Rows
    14. The Ref Array Pointer in the Row Layout in Teradata
    15. How Blocks of Data Begin in Teradata
    16. How Blocks of Data Grow in Teradata
    17. Did You Notice the Row Reference Array (RRA)?
    18. A Great Picture of a Single AMP’s Data Block with Details
    19. Data Blocks Grow until they Reach Maximum Block Size
    20. The Block Split
    21. The Block Split with Even More Detail
    22. The Block Split Showing Two Blocks with Greater Detail
    23. Blocks Continue to Split as Tables Grow Larger
    24. Reminder – Data Blocks Maximum Block Size has Changed (V14.10)
    25. Reminder – The New Block Split with Teradata V14.10
    26. Reminder – The Block Split with Even More Detail in Teradata V14.10
    27. Reminder – Teradata V14.10 Block Split Defaults
    28. FYI – Some Advanced Information about Data Block Headers
  14. Chapter 8 - Disk Cylinders and the Master Index
    1. Disks have Cylinders which hold Data Blocks
    2. Rows are Stored in Data Blocks which are Stored in Cylinders
    3. A Real World View of Rows inside a Data Block in a Cylinder
    4. A Top down View of Cylinders
    5. There are Hot, Warm, and Cold Cylinders
    6. Cylinders are used for Perm, Spool, Temp, and Journals
    7. Synchronized Scan (Sync Scan)
    8. EXPLAIN Using a Synchronized Scan
    9. Intelligent Memory (Teradata V14.10)
    10. Teradata V14.10 Intelligent Memory Gives Data a Temperature
    11. Data Deemed VeryHot Stays in Each AMP's Intelligent Memory
    12. Intelligent Memory Stays in Memory
    13. Each AMP has their Own Master Index
    14. Each Cylinder on an AMP has a Cylinder Index
    15. Quiz – What Two Things does and AMP Read?
    16. Answer – What Two Things does and AMP Read?
    17. Quiz – How Many Row Reference Arrays do you See?
    18. Answer – How Many Row Reference Arrays do you See?
    19. Quiz – How Many Row Reference Arrays are there Now?
    20. Answer – How Many Row Reference Arrays do you See?
    21. Quiz – How Many Row Reference Arrays in Total?
    22. Answer – How Many Row Reference Arrays in Total?
    23. Quiz – How Many Cylinder Indexes are Here?
    24. Answer – How Many Cylinder Indexes are Here?
    25. A More Detailed Illustration of the Master Index
    26. A Real-World View of the Master Index
    27. An Even More Realistic View of an AMP’s Master Index
    28. The Cylinder Index
    29. An Even More Realistic View of a Cylinder Index
    30. How a Query using the Primary Index works
    31. How the AMPs do a Full Table Scan
    32. How an AMP Reads Using a Primary Index
    33. Teradata Assigns Each Object a Unique Numeric ID
    34. The Table ID
    35. The Table ID in Greater Detail
    36. Looking at the TableID inside the Actual Cylinders
    37. A More Detailed View of TableID inside the Actual Cylinders
    38. An Even More Realistic View of a Cylinder Index
    39. Bits, Bytes, and More
    40. Cylinder Sizes
    41. How TVS Monitors and Migrates Tables
    42. How TVS Monitors and Migrates Partitioned (PPI) Tables
    43. A Summary of the Master and Cylinder Index
  15. Chapter 9 - Teradata Virtual Storage (TVS)
    1. Solid State Drives (SSD) Vs. Hard Disk Drives (HDD)
    2. Teradata Uses Two Types of Disks
    3. Traditional Teradata Without Teradata Virtual Storage (TVS)
    4. Teradata With TVS in a Conceptual Diagram
    5. What TVS is Responsible for Doing
    6. The Benefits of Teradata Virtual Storage (TVS)
    7. What is a Clique?
    8. If a Node goes down the AMPs Migrate within the Clique?
    9. Review of a Clique
    10. Teradata Virtual Storage (TVS) Manages within a Clique
    11. Before TVS vs. TVS Today with Teradata V13.10
    12. TVS Operates in Two Different Modes
    13. TVS knows the Disks and Which Cylinders are the Fastest
    14. A Concept Called Recency
    15. Data Placement and Migration
    16. Review – Intelligent Memory (Teradata V14.10)
    17. Review – Teradata V14.10 Intelligent Memory Gives Data a Temperature
    18. Review – Data Deemed VeryHot Stays in each AMP's Intelligent Memory
    19. Review – Intelligent Memory Stays in Memory
  16. Chapter 10 - Teradata Writes and Blocks
    1. A Teradata Write
    2. Teradata Insert (Option 1 of 3) has Enough Space for the Insert
    3. Teradata Insert (Option 2 of 3) is a Defragment to Make Space
    4. Teradata Insert (Option 3 of 3) is to Get a Bigger Block
    5. Checksum Determines if a New Block is Needed
    6. A Reminder of How Rows are Sorted with Block Utilities
    7. A Reminder of How Rows are Sorted with SQL Inserts
    8. When a Block Reaches Maximum Size, it Splits into Two
    9. A Block Split always Sorts the Rows Perfectly Once Again
    10. In Teradata V14.10 the Maximum Block Size is 1 Megabyte
    11. Cylinder Sizes
    12. Teradata V14 Large Cylinders
    13. Quiz on Block Split
    14. Answer – Quiz on Block Split
    15. Quiz – How many Items are in the Picture?
    16. Answer to Quiz – How many Items are in the Picture?
    17. Quiz on Teradata V14.10 Block Split
    18. Answer – Quiz on Block Split
    19. Blocking Terms for Teradata V14 and Below
    20. Blocking Terms for Teradata V14.10 and Beyond
    21. Block Sizes and Filling of Cylinders
    22. Space Fragmentation (1 of 2)
    23. Space Fragmentation (2 of 2)
    24. What is a Defrag? (1 of 2)
    25. What is a Defrag? (2 of 2)
    26. What Happens when a Cylinder is Full?
    27. What is a Mini-Cylpack? (1 of 2)
    28. What is a Mini-Cylpack? (2 of 2)
    29. What is a Mini-Cylpack Vs. a Pack Disk?
    30. A Pack Disk Picture
    31. New Teradata 13.10 Auto Cylinder Pack Feature
    32. A Pack Disk Honors the Free Space Percent
    33. Free Space Percent
    34. Three Examples of the Free Space Percent
    35. Simpler Terms for Our Free Space Percent Examples
    36. The Free Space Percent can be Set in Three Ways
    37. Two Table Create Examples of Min and Max Block Size
    38. The Same Table Creates using KILOBYTES (KBYTES)
    39. Why would I want Bigger or Smaller Block Sizes?
    40. How does Teradata Manage Space?
    41. How can Many Big Blocks become Many Small Blocks?
    42. Merge Datablocks (13.10 Feature)
    43. Merge Datablocks Details
    44. Setting Merge Datablocks in DBS Control or at Table Level
    45. How Have Customers Previously Handled Shrinking Blocks?
  17. Chapter 11 – Access Logging
    1. Access Logging
    2. Security for the DBA
    3. The Tables and Views Associated with Access Logging
    4. Begin Logging Options
    5. Begin Logging, View Rules, See Log Data, and End Logging
    6. Begin Logging Examples
    7. The DBC.AccessLogV View
    8. The DBC.AccessLogV View for Today’s Queries
    9. The DBC.AccessLogV View Denials For Today
    10. Controlling the Log Files
  18. Chapter 12 – DBQL Query Logging
    1. DBQL Query Logging
    2. The Tables and Views Associated with DBQL
    3. There are Seven Major Tables to Store DBQL Entries
    4. The Views for the Major DBQL Tables
    5. Begin Query Logging Default Information
    6. Begin Query Logging with Options
    7. Begin Query Logging Examples
    8. Begin Query Logging Limit Options
    9. Begin Query Logging Limit Examples
    10. Summary and Threshold have Additional Options
    11. Begin Query Logging with Additional Options Examples
    12. Begin and End Query Logging Examples
    13. Replace Query Logging Statement
    14. An inside Look at the View DBC.DBQLRulesV
    15. The Columns in the View DBC.DBQLRulesV
    16. Begin Logging, View Rules, See Log Data and End Logging
    17. DBC.DBQLRulesV
    18. DBC.QryLogV
    19. DBC.QryLogSummaryV
  19. Chapter 13 – ResUsage
    1. ResUsage
    2. Major Tables to Store ResUsage Entries
    3. The ResUsage Views
    4. ResUsage Macro Information
    5. ResUsage Macros
    6. Executing the ResUsage Macro DBC.Resnode
    7. DBC.Resnode Major Column Explanation
    8. ResAMPCpuByGroup
    9. ResCpuByAMP
    10. ResCpuByGroup
    11. ResCpuByNode
    12. ResCpuByPE
    13. ResHostByGroup
    14. ResHostByLink
    15. ResHostTotal
    16. ResHostTotalDay
    17. ResHostTotalHour
    18. ResIvprMigrate
    19. ResIvprMigrateHour
    20. ResLdvByGroup
    21. ResLdvByNode
    22. ResMemByGroup
    23. ResMemMgmtByNode
    24. ResNetByGroup
    25. ResNetByNode
    26. ResPeCpuByGroup
    27. ResPeCpuByGroup
    28. ResScpuDayTotal
    29. ResScpuSec
    30. ResSvprDetailReadTotal
    31. ResSvprPreReadBySec
    32. ResSvprQLenAvg
    33. ResSvprQLenAvgByVproc
    34. ResSvprQLenAvgByVproc
    35. ResSvprQLenMaxHour
    36. ResSvprReadByVprocSec
    37. ResSvprReadByVprocSec
    38. ResSvprReadTotal
    39. ResSvprReadTotal
    40. ResSvprWriteTotal
    41. ResSvprWriteTotalHour
    42. ResSyncScan
  20. Chapter 14 – Top SQL Commands Cheat Sheet
    1. Select All Columns from a Table and Sort
    2. Select Specific Columns and Limiting the Rows
    3. Changing your Default Database
    4. Keywords that Describe You
    5. Select TOP Rows in a Rank Order
    6. A Sample Number of Rows
    7. Getting a Sample Percentage of Rows
    8. Find Information about a Database
    9. Find Information about a Table
    10. Using Aggregates
    11. Performing a Join
    12. Performing a Join using ANSI Syntax
    13. Using Date, Time and Timestamp
    14. Using Date Functions
    15. Using the System Calendar
    16. Using the System Calendar in a Query
    17. Formatting Data
    18. Using Rank
    19. Using a Derived Table
    20. Using a Subquery
    21. Correlated Subquery
    22. Using Substring
    23. Basic Case Statement
    24. Advanced Case Statement
    25. Using an Access Lock in your SQL
    26. Collect Statistics
    27. Creating a Volatile Table with a Primary Index
    28. Creating a Volatile Table that is Partitioned (PPI)
    29. Creating a Volatile Table that is Deleted after the Query
    30. Finding the Typical Rows per Value for Specific Column
    31. Finding Out how much Space you have
    32. How much Space you have Per AMP
    33. Finding your Space
    34. Finding Space Skew in Tables in a Database
    35. Finding the Number of rows per AMP for a Column
    36. Finding Account Information
    37. Ordered Analytics
  21. Back Cover