You are previewing The Art of SQL.
O'Reilly logo
The Art of SQL

Book Description

For all the buzz about trendy IT techniques, data processing is still at the core of our systems, especially now that enterprises all over the world are confronted with exploding volumes of data. Database performance has become a major headache, and most IT departments believe that developers should provide simple SQL code to solve immediate problems and let DBAs tune any "bad SQL" later.

In The Art of SQL, author and SQL expert Stephane Faroult argues that this "safe approach" only leads to disaster. His insightful book, named after Art of War by Sun Tzu, contends that writing quick inefficient code is sweeping the dirt under the rug. SQL code may run for 5 to 10 years, surviving several major releases of the database management system and on several generations of hardware. The code must be fast and sound from the start, and that requires a firm understanding of SQL and relational theory.

The Art of SQL offers best practices that teach experienced SQL users to focus on strategy rather than specifics. Faroult's approach takes a page from Sun Tzu's classic treatise by viewing database design as a military campaign. You need knowledge, skills, and talent. Talent can't be taught, but every strategist from Sun Tzu to modern-day generals believed that it can be nurtured through the experience of others. They passed on their experience acquired in the field through basic principles that served as guiding stars amid the sound and fury of battle. This is what Faroult does with SQL.

Like a successful battle plan, good architectural choices are based on contingencies. What if the volume of this or that table increases unexpectedly? What if, following a merger, the number of users doubles? What if you want to keep several years of data online? Faroult's way of looking at SQL performance may be unconventional and unique, but he's deadly serious about writing good SQL and using SQL well. The Art of SQL is not a cookbook, listing problems and giving recipes. The aim is to get you-and your manager-to raise good questions.

Table of Contents

  1. The Art of SQL
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. Preface
      1. Why Another SQL Book?
      2. Audience
      3. Assumptions This Book Makes
      4. Contents of This Book
      5. Conventions Used in This Book
      6. Using Code Examples
      7. Comments and Questions
      8. Safari® Enabled
      9. Acknowledgments
    3. 1. Laying Plans
      1. 1.1. The Relational View of Data
      2. 1.2. The Importance of Being Normal
        1. 1.2.1. Step 1: Ensure Atomicity
        2. 1.2.2. Step 2: Check Dependence on the Whole Key
        3. 1.2.3. Step 3: Check Attribute Independence
      3. 1.3. To Be or Not to Be, or to Be Null
      4. 1.4. Qualifying Boolean Columns
      5. 1.5. Understanding Subtypes
      6. 1.6. Stating the Obvious
      7. 1.7. The Dangers of Excess Flexibility
      8. 1.8. The Difficulties of Historical Data
      9. 1.9. Design and Performance
      10. 1.10. Processing Flow
      11. 1.11. Centralizing Your Data
      12. 1.12. System Complexity
      13. 1.13. The Completed Plans
    4. 2. Waging War
      1. 2.1. Query Identification
      2. 2.2. Stable Database Connections
      3. 2.3. Strategy Before Tactics
      4. 2.4. Problem Definition Before Solution
      5. 2.5. Stable Database Schema
      6. 2.6. Operations Against Actual Data
      7. 2.7. Set Processing in SQL
      8. 2.8. Action-Packed SQL Statements
      9. 2.9. Profitable Database Accesses
      10. 2.10. Closeness to the DBMS Kernel
      11. 2.11. Doing Only What Is Required
      12. 2.12. SQL Statements Mirror Business Logic
      13. 2.13. Program Logic into Queries
      14. 2.14. Multiple Updates at Once
      15. 2.15. Careful Use of User-Written Functions
      16. 2.16. Succinct SQL
      17. 2.17. Offensive Coding with SQL
      18. 2.18. Discerning Use of Exceptions
    5. 3. Tactical Dispositions
      1. 3.1. The Identification of "Entry Points"
      2. 3.2. Indexes and Content Lists
      3. 3.3. Making Indexes Work
      4. 3.4. Indexes with Functions and Conversions
      5. 3.5. Indexes and Foreign Keys
      6. 3.6. Multiple Indexing of the Same Columns
      7. 3.7. System-Generated Keys
      8. 3.8. Variability of Index Accesses
    6. 4. Maneuvering
      1. 4.1. The Nature of SQL
        1. 4.1.1. SQL and Databases
        2. 4.1.2. SQL and the Optimizer
        3. 4.1.3. Limits of the Optimizer
      2. 4.2. Five Factors Governing the Art of SQL
        1. 4.2.1. Total Quantity of Data
        2. 4.2.2. Criteria Defining the Result Set
        3. 4.2.3. Size of the Result Set
        4. 4.2.4. Number of Tables
          1. 4.2.4.1. Joins
          2. 4.2.4.2. Complex queries and complex views
        5. 4.2.5. Number of Other Users
      3. 4.3. Filtering
        1. 4.3.1. Meaning of Filtering Conditions
        2. 4.3.2. Evaluation of Filtering Conditions
          1. 4.3.2.1. Buyers of Batmobiles
          2. 4.3.2.2. More Batmobile purchases
          3. 4.3.2.3. Lessons to be learned from the Batmobile trade
        3. 4.3.3. Querying Large Quantities of Data
        4. 4.3.4. The Proportions of Retrieved Data
    7. 5. Terrain
      1. 5.1. Structural Types
      2. 5.2. The Conflicting Goals
      3. 5.3. Considering Indexes as Data Repositories
      4. 5.4. Forcing Row Ordering
      5. 5.5. Automatically Grouping Data
        1. 5.5.1. Round-Robin Partitioning
        2. 5.5.2. Data-Driven Partitioning
      6. 5.6. The Double-Edged Sword of Partitioning
      7. 5.7. Partitioning and Data Distribution
      8. 5.8. The Best Way to Partition Data
      9. 5.9. Pre-Joining Tables
      10. 5.10. Holy Simplicity
    8. 6. The Nine Situations
      1. 6.1. Small Result Set, Direct Specific Criteria
        1. 6.1.1. Index Usability
        2. 6.1.2. Query Efficiency and Index Usage
        3. 6.1.3. Data Dispersion
        4. 6.1.4. Criterion Indexability
      2. 6.2. Small Result Set, Indirect Criteria
      3. 6.3. Small Intersection of Broad Criteria
      4. 6.4. Small Intersection, Indirect Broad Criteria
      5. 6.5. Large Result Set
      6. 6.6. Self-Joins on One Table
      7. 6.7. Result Set Obtained by Aggregation
      8. 6.8. Simple or Range Searching on Dates
        1. 6.8.1. Many Items, Few Historical Values
          1. 6.8.1.1. Using subqueries
          2. 6.8.1.2. Using OLAP functions
        2. 6.8.2. Many Historical Values Per Item
        3. 6.8.3. Current Values
      9. 6.9. Result Set Predicated on Absence of Data
    9. 7. Variations in Tactics
      1. 7.1. Tree Structures
        1. 7.1.1. Tree Structures Versus Master/Detail Relationships
        2. 7.1.2. Practical Examples of Hierarchies
      2. 7.2. Representing Trees in an SQL Database
      3. 7.3. Practical Implementation of Trees
        1. 7.3.1. Adjacency Model
        2. 7.3.2. Materialized Path Model
        3. 7.3.3. Nested Sets Model (After Celko)
      4. 7.4. Walking a Tree with SQL
        1. 7.4.1. Top-Down Walk: The Vandamme Query
          1. 7.4.1.1. Adjacency model
          2. 7.4.1.2. Materialized path model
          3. 7.4.1.3. Nested sets model
          4. 7.4.1.4. Comparing the Vandamme query under the various models
        2. 7.4.2. Bottom-Up Walk: The Highlanders Query
          1. 7.4.2.1. Adjacency model
          2. 7.4.2.2. Materialized path model
          3. 7.4.2.3. Nested sets model
          4. 7.4.2.4. Comparing the various models for the Highlanders query
      5. 7.5. Aggregating Values from Trees
        1. 7.5.1. Aggregation of Values Stored in Leaf Nodes
          1. 7.5.1.1. Modeling head counts
          2. 7.5.1.2. Computing head counts at every level
        2. 7.5.2. Propagation of Percentages Across Different Levels
    10. 8. Weaknesses and Strengths
      1. 8.1. Deceiving Criteria
      2. 8.2. Abstract Layers
      3. 8.3. Distributed Systems
      4. 8.4. Dynamically Defined Search Criteria
        1. 8.4.1. Designing a Simple Movie Database and the Main Query
        2. 8.4.2. Right-Sizing Queries
        3. 8.4.3. Wrapping SQL in PHP
    11. 9. Multiple Fronts
      1. 9.1. The Database Engine as a Service Provider
        1. 9.1.1. The Virtues of Indexes
        2. 9.1.2. A Just-So Story
        3. 9.1.3. Get in Line
      2. 9.2. Concurrent Data Changes
        1. 9.2.1. Locking
          1. 9.2.1.1. Locking granularity
          2. 9.2.1.2. Lock handling
          3. 9.2.1.3. Locking and committing
          4. 9.2.1.4. Locking and scalability
        2. 9.2.2. Contention
          1. 9.2.2.1. Insertion and contention
          2. 9.2.2.2. DBA solutions
          3. 9.2.2.3. Architectural solutions
          4. 9.2.2.4. Development solutions
          5. 9.2.2.5. Results
    12. 10. Assembly of Forces
      1. 10.1. Increasing Volumes
        1. 10.1.1. Sensitivity of Operations to Volume Increases
          1. 10.1.1.1. Insensitivity to volume increase
          2. 10.1.1.2. Linear sensitivity to volume increases
          3. 10.1.1.3. Non-linear sensitivity to volume increases
          4. 10.1.1.4. Putting it all together
          5. 10.1.1.5. Disentangling subqueries
        2. 10.1.2. Partitioning to the Rescue
        3. 10.1.3. Data Purges
      2. 10.2. Data Warehousing
        1. 10.2.1. Facts and Dimensions: the Star Schema
        2. 10.2.2. Query Tools
        3. 10.2.3. Extraction, Transformation, and Loading
          1. 10.2.3.1. Data extraction
          2. 10.2.3.2. Transformation
          3. 10.2.3.3. Loading
          4. 10.2.3.4. Integrity constraints and indexes
        4. 10.2.4. Querying Dimensions and Facts: Ad Hoc Reports
          1. 10.2.4.1. The star transformation
          2. 10.2.4.2. Emulating the star transformation
          3. 10.2.4.3. Querying a star schema the way it is not intended to be queried
        5. 10.2.5. A (Strong) Word of Caution
    13. 11. Stratagems
      1. 11.1. Turning Data Around
        1. 11.1.1. Rows That Should Have Been Columns
        2. 11.1.2. Columns That Should Have Been Rows
          1. 11.1.2.1. Creating a pivot table
          2. 11.1.2.2. Multiplying rows with a pivot table
          3. 11.1.2.3. Using pivot table values
          4. 11.1.2.4. The pivot and unpivot operators
        3. 11.1.3. Single Columns That Should Have Been Something Else
          1. 11.1.3.1. First normal form on the fly
          2. 11.1.3.2. Lifting the veil on the Chapter 7 mystery path explosion
      2. 11.2. Querying with a Variable in List
      3. 11.3. Aggregating by Range (Bands)
      4. 11.4. Superseding a General Case
      5. 11.5. Selecting Rows That Match Several Items in a List
      6. 11.6. Finding the Best Match
      7. 11.7. Optimizer Directives
    14. 12. Employment of Spies
      1. 12.1. The Database Is Slow
      2. 12.2. The Components of Server Load
      3. 12.3. Defining Good Performance
        1. 12.3.1. Knowing What You Spend
        2. 12.3.2. Knowing What You Get
        3. 12.3.3. Checking Against Acknowledged Standards
        4. 12.3.4. Defining Performance Goals
      4. 12.4. Thinking in Business Tasks
      5. 12.5. Execution Plans
        1. 12.5.1. Identifying the Fastest Execution Plan
          1. 12.5.1.1. Our contestants
          2. 12.5.1.2. Our battle field
          3. 12.5.1.3. And the winner is.. .
        2. 12.5.2. Forcing the Right Execution Plan
          1. 12.5.2.1. A stubborn query
          2. 12.5.2.2. Study of search criteria
          3. 12.5.2.3. A moral to the story
      6. 12.6. Using Execution Plans Properly
        1. 12.6.1. How Not to Execute a Query
        2. 12.6.2. Hidden Complexity
      7. 12.7. What Really Matters?
    15. PHOTO CREDITS
    16. About the Authors
    17. About the Author
    18. SPECIAL OFFER: Upgrade this ebook with O’Reilly