You are previewing Optimizing Oracle Performance.
O'Reilly logo
Optimizing Oracle Performance

Book Description

Oracle system performance inefficiencies often go undetected for months or even years--even under intense scrutiny--because traditional Oracle performance analysis methods and tools are fundamentally flawed. They're unreliable and inefficient. Oracle DBAs and developers are all too familiar with the outlay of time and resources, blown budgets, missed deadlines, and marginally effective performance fiddling that is commonplace with traditional methods of Oracle performance tuning. In this crucial book, Cary Millsap, former VP of Oracle's System Performance Group, clearly and concisely explains how to use Oracle's response time statistics to diagnose and repair performance problems. Cary also shows how "queueing theory" can be applied to response time statistics to predict the impact of upgrades and other system changes. Optimizing Oracle Performance eliminates the time-consuming, trial-and-error guesswork inherent in most conventional approaches to tuning. You can determine exactly where a system's performance problem is, and with equal importance, where it is not, in just a few minutes--even if the problem is several years old. Optimizing Oracle Performance cuts a path through the complexity of current tuning methods, and streamlines an approach that focuses on optimization techniques that any DBA can use quickly and successfully to make noticeable--even dramatic--improvements. For example, the one thing database users care most about is response time. Naturally, DBAs focus much of their time and effort towards improving response time. But it is entirely too easy to spend hundreds of hours to improve important system metrics such as hit ratios, average latencies, and wait times, only to find users are unable to perceive the difference. And an expensive hardware upgrade may not help either. It doesn't have to be that way. Technological advances have added impact, efficiency, measurability, predictive capacity, reliability, speed, and practicality to the science of Oracle performance optimization. Optimizing Oracle Performance shows you how to slash the frustration and expense associated with unraveling the true root cause of any type of performance problem, and reliably predict future performance. The price of this essential book will be paid back in hours saved the first time its methods are used.

Table of Contents

  1. Optimizing Oracle Performance
  2. Dedication
  3. A Note Regarding Supplemental Files
  4. Foreword
  5. Preface
    1. Why I Wrote This Book
    2. Audience for This Book
    3. Structure of This Book
    4. Which Platform and Version?
    5. What This Book Is and Is Not
    6. About the Tools, Examples, and Exercises
    7. Citations
    8. Conventions Used in This Book
    9. Comments and Questions
    10. Acknowledgments
  6. I. Method
    1. 1. A Better Way to Optimize
      1. 1.1. “You’re Doing It Wrong”
      2. 1.2. Requirements of a Good Method
      3. 1.3. Three Important Advances
        1. 1.3.1. User Action Focus
        2. 1.3.2. Response Time Focus
        3. 1.3.3. Amdahl’s Law
        4. 1.3.4. All Together Now
      4. 1.4. Tools for Analyzing Response Time
        1. 1.4.1. Sequence Diagram
        2. 1.4.2. Resource Profile
      5. 1.5. Method R
        1. 1.5.1. Who Uses the Method
          1. 1.5.1.1. The abominable smokestack
          2. 1.5.1.2. The optimal performance analyst
          3. 1.5.1.3. Your role
        2. 1.5.2. Overcoming Common Objections
          1. 1.5.2.1. “But my whole system is slow”
          2. 1.5.2.2. “The method only works if the problem is the database”
          3. 1.5.2.3. “The method is unconventional”
        3. 1.5.3. Evaluation of Effectiveness
    2. 2. Targeting the Right User Actions
      1. 2.1. Specification Reliability
        1. 2.1.1. The System
        2. 2.1.2. Economic Constraints
      2. 2.2. Making a Good Specification
        1. 2.2.1. User Action
        2. 2.2.2. Identifying the Right User Actions and Contexts
        3. 2.2.3. Prioritizing the User Actions
        4. 2.2.4. Determining Who Will Execute Each Action and When
      3. 2.3. Specification Over-Constraint
    3. 3. Targeting the Right Diagnostic Data
      1. 3.1. Expectations About Data Collection
      2. 3.2. Data Scope
        1. 3.2.1. Scoping Errors
        2. 3.2.2. Long-Running User Actions
        3. 3.2.3. “Too Much Data” Is Really Not Enough Data
      3. 3.3. Oracle Diagnostic Data Sources
      4. 3.4. For More Information
    4. 4. Targeting the Right Improvement Activity
      1. 4.1. A New Standard of Customer Care
      2. 4.2. How to Find the Economically Optimal Performance Improvement Activity
      3. 4.3. Making Sense of Your Diagnostic Data
      4. 4.4. Forecasting Project Net Payoff
        1. 4.4.1. Forecasting Project Benefits
          1. 4.4.1.1. Monetizing the benefits
          2. 4.4.1.2. If you can’t monetize the benefits
        2. 4.4.2. Forecasting Project Cost
        3. 4.4.3. Forecasting Project Risk
  7. II. Reference
    1. 5. Interpreting Extended SQL Trace Data
      1. 5.1. Trace File Walk-Through
      2. 5.2. Extended SQL Trace Data Reference
        1. 5.2.1. Trace File Element Definitions
          1. 5.2.1.1. Cursor numbers
          2. 5.2.1.2. Session identification and timestamps
          3. 5.2.1.3. Application identification
          4. 5.2.1.4. Cursor identification
          5. 5.2.1.5. Database calls
          6. 5.2.1.6. Wait events
          7. 5.2.1.7. Bind variables
          8. 5.2.1.8. Row source operations
          9. 5.2.1.9. Transaction end markers
          10. 5.2.1.10. Reference summary
        2. 5.2.2. Oracle Time Units
      3. 5.3. Response Time Accounting
        1. 5.3.1. Time Within a Database Call
        2. 5.3.2. Time Between Database Calls
        3. 5.3.3. Recursive SQL Double-Counting
          1. 5.3.3.1. Parent-child relationships
          2. 5.3.3.2. Recursive statistics
      4. 5.4. Evolution of the Response Time Model
      5. 5.5. Walking the Clock
        1. 5.5.1. Oracle Release 8 and Prior
        2. 5.5.2. Oracle Release 9
        3. 5.5.3. Clock Walk Formulas
      6. 5.6. Forward Attribution
        1. 5.6.1. Forward Attribution for Within-Call Events
        2. 5.6.2. Forward Attribution for Between-Call Events
      7. 5.7. Detailed Trace File Walk-Through
      8. 5.8. Exercises
    2. 6. Collecting Extended SQL Trace Data
      1. 6.1. Understanding Your Application
      2. 6.2. Activating Extended SQL Trace
        1. 6.2.1. Tracing Your Own Source Code
        2. 6.2.2. Tracing Someone Else’s Source Code
          1. 6.2.2.1. Triggering a session to activate its own trace
          2. 6.2.2.2. Activating trace from a third-party session
      3. 6.3. Finding Your Trace File(s)
        1. 6.3.1. Trace File Names
        2. 6.3.2. Simple Client-Server Applications
        3. 6.3.3. Oracle Parallel Execution
        4. 6.3.4. Oracle Multi-Threaded Server
        5. 6.3.5. Connection-Pooling Applications
        6. 6.3.6. Some Good News
      4. 6.4. Eliminating Collection Error
        1. 6.4.1. Time Scope Errors at Trace Activation
          1. 6.4.1.1. Missing wait event data at trace activation
          2. 6.4.1.2. Missing database call data at trace activation
          3. 6.4.1.3. Excess database call data at trace activation
        2. 6.4.2. Missing Time at Trace Deactivation
        3. 6.4.3. Incomplete Recursive SQL Data
      5. 6.5. Exercises
    3. 7. Oracle Kernel Timings
      1. 7.1. Operating System Process Management
        1. 7.1.1. The sys call Transition
        2. 7.1.2. The interrupt Transition
        3. 7.1.3. Other States and Transitions
      2. 7.2. Oracle Kernel Timings
      3. 7.3. How Software Measures Itself
        1. 7.3.1. Elapsed Time
        2. 7.3.2. CPU Consumption
      4. 7.4. Unaccounted-for Time
      5. 7.5. Measurement Intrusion Effect
      6. 7.6. CPU Consumption Double-Counting
      7. 7.7. Quantization Error
        1. 7.7.1. Measurement Resolution
        2. 7.7.2. Definition of Quantization Error
        3. 7.7.3. Complications in Measuring CPU Consumption
          1. 7.7.3.1. How gettimeofday works
          2. 7.7.3.2. How getrusage works
        4. 7.7.4. Detection of Quantization Error
        5. 7.7.5. Bounds of Quantization Error
      8. 7.8. Time Spent Not Executing
        1. 7.8.1. Instrumenting the Experiment
        2. 7.8.2. Process States and Transitions Revisited
      9. 7.9. Un-Instrumented Oracle Kernel Code
        1. 7.9.1. Effect
        2. 7.9.2. Trace Writing
      10. 7.10. Exercises
    4. 8. Oracle Fixed View Data
      1. 8.1. Deficiencies of Fixed View Data
        1. 8.1.1. Too Many Data Sources
        2. 8.1.2. Lack of Detail
        3. 8.1.3. Measurement Intrusion Effect of Polling
        4. 8.1.4. Difficulty of Proper Action-Scoping
        5. 8.1.5. Difficulty of Proper Time-Scoping
        6. 8.1.6. Susceptibility to Overflow and Other Errors
        7. 8.1.7. Lack of Database Call Duration Data
        8. 8.1.8. Lack of Read Consistency
      2. 8.2. Fixed View Reference
        1. 8.2.1. V$SQL
        2. 8.2.2. V$SESS_IO
        3. 8.2.3. V$SYSSTAT
        4. 8.2.4. V$SESSTAT
        5. 8.2.5. V$SYSTEM_EVENT
        6. 8.2.6. V$SESSION_EVENT
        7. 8.2.7. V$SESSION_WAIT
      3. 8.3. Useful Fixed View Queries
        1. 8.3.1. Tom Kyte’s Test Harness
        2. 8.3.2. Finding a Fixed View Definition
        3. 8.3.3. Finding Inefficient SQL
        4. 8.3.4. Finding Where a Session Is Stuck
        5. 8.3.5. Finding Where a System Is Stuck
        6. 8.3.6. Approximating a Session’s Resource Profile
        7. 8.3.7. Viewing Waits System-Wide
          1. 8.3.7.1. The “idle events” problem
          2. 8.3.7.2. The denominator problem
          3. 8.3.7.3. Infinite capacity for waiting
          4. 8.3.7.4. Idle events in background sessions
          5. 8.3.7.5. Targeting revisited
      4. 8.4. The Oracle “Wait Interface”
      5. 8.5. Exercises
    5. 9. Queueing Theory for the Oracle Practitioner
      1. 9.1. Performance Models
      2. 9.2. Queueing
        1. 9.2.1. Queueing Economics
        2. 9.2.2. Queueing Visualized
      3. 9.3. Queueing Theory
        1. 9.3.1. Model Input and Output Values
          1. 9.3.1.1. Arrivals and completions
          2. 9.3.1.2. Service channels, utilization, and stability
          3. 9.3.1.3. Service time and service rate
          4. 9.3.1.4. Queueing delay and response time
          5. 9.3.1.5. Maximum effective throughput
          6. 9.3.1.6. Cumulative distribution function (CDF) of response time
        2. 9.3.2. Random Variables
          1. 9.3.2.1. Expected value
          2. 9.3.2.2. Probability density function (pdf)
          3. 9.3.2.3. Using the pdf
          4. 9.3.2.4. Why understanding distribution is important
        3. 9.3.3. Queueing Theory Versus the “Wait Interface”
          1. 9.3.3.1. Oracle wait times
          2. 9.3.3.2. Differences in queueing theory notation
      4. 9.4. The M/M/m Queueing Model
        1. 9.4.1. M/M/m Systems
        2. 9.4.2. Non-M/M/m Systems
        3. 9.4.3. Exponential Distribution
          1. 9.4.3.1. Poisson-exponential relationship
          2. 9.4.3.2. Testing for fit to exponential distribution
          3. 9.4.3.3. A program to test for exponential distribution
        4. 9.4.4. Behavior of M/M/m Systems
          1. 9.4.4.1. Multi-channel scalability
          2. 9.4.4.2. The knee
          3. 9.4.4.3. Response time fluctuations
          4. 9.4.4.4. Parameter sensitivity
        5. 9.4.5. Using M/M/m: Worked Example
          1. 9.4.5.1. Suitability for modeling with M/M/m
          2. 9.4.5.2. Computing the required number of CPUs
          3. 9.4.5.3. What we can learn from an optimistic model
          4. 9.4.5.4. Negotiating the negotiable parameters
          5. 9.4.5.5. Using Goal Seek in Microsoft Excel
          6. 9.4.5.6. Sensitivity analysis
      5. 9.5. Perspective
      6. 9.6. Exercises
  8. III. Deployment
    1. 10. Working the Resource Profile
      1. 10.1. How to Work a Resource Profile
        1. 10.1.1. Work in Descending Response Time Order
          1. 10.1.1.1. Why targeting is vital
          2. 10.1.1.2. Possible benefits of low-return improvements
        2. 10.1.2. Eliminate Unnecessary Calls
          1. 10.1.2.1. Why workload elimination works so well
          2. 10.1.2.2. Supply and demand in the technology stack
          3. 10.1.2.3. How to eliminate calls
          4. 10.1.2.4. Thinking in a bigger box
        3. 10.1.3. Eliminate Inter-Process Competition
          1. 10.1.3.1. How to attack a latency problem
          2. 10.1.3.2. How to find competing workload
        4. 10.1.4. Upgrade Capacity
      2. 10.2. How to Forecast Improvement
      3. 10.3. How to Tell When Your Work Is Done
    2. 11. Responding to the Diagnosis
      1. 11.1. Beyond the Resource Profile
      2. 11.2. Response Time Components
        1. 11.2.1. Oracle Pseudoevents
          1. 11.2.1.1. CPU service
          2. 11.2.1.2. unaccounted-for
        2. 11.2.2. No Event Is Inherently “Unimportant”
          1. 11.2.2.1. Responding to large SQL*Net response time contributions
          2. 11.2.2.2. Responding to large response time contributions from other events
      3. 11.3. Eliminating Wasteful Work
        1. 11.3.1. Logical I/O Optimization
          1. 11.3.1.1. Why LIO problems are so common
          2. 11.3.1.2. How to optimize SQL
        2. 11.3.2. Parse Optimization
        3. 11.3.3. Write Optimization
      4. 11.4. Attributes of a Scalable Application
    3. 12. Case Studies
      1. 12.1. Case 1: Misled by System-Wide Data
        1. 12.1.1. Targeting
        2. 12.1.2. Diagnosis and Response
        3. 12.1.3. Results
        4. 12.1.4. Lessons Learned
      2. 12.2. Case 2: Large CPU Service Duration
        1. 12.2.1. Targeting
        2. 12.2.2. Diagnosis and Response
        3. 12.2.3. Results
        4. 12.2.4. Lessons Learned
      3. 12.3. Case 3: Large SQL*Net Event Duration
        1. 12.3.1. Targeting
        2. 12.3.2. Diagnosis and Response
        3. 12.3.3. Results
        4. 12.3.4. Lessons Learned
      4. 12.4. Case 4: Large Read Event Duration
        1. 12.4.1. Targeting
        2. 12.4.2. Diagnosis and Repair
        3. 12.4.3. Results
        4. 12.4.4. Lessons Learned
      5. 12.5. Conclusion
  9. IV. Appendixes
    1. Glossary
    2. A. Greek Alphabet
    3. B. Optimizing Your Database Buffer Cache Hit Ratio
    4. C. M/M/m Queueing Theory Formulas
    5. D. References
  10. Index
  11. About the Authors
  12. Colophon
  13. Copyright