You are previewing Video Game Optimization.
O'Reilly logo
Video Game Optimization

Book Description

Video Game Optimization describes a process for increasing the performance of a video game for better gameplay and visual experience. Very few game developers understand the process of optimizing an entire video game, yet learning the process is surprisingly simple and applicable to a broad audience. The book tackles the process of optimization by first describing how to determine where a game is limited and then providing detailed solutions and examples to solving this limitation. All the examples covered in the book can be applied to a variety of game types and coverage of how to optimize system memory, CPU processing, graphics, and shaders is included.

Table of Contents

  1. Copyright
    1. Dedication
  2. Acknowledgments
  3. About the Authors
  4. Introduction
    1. What You’ll Find in This Book
    2. Who This Book Is For
    3. How This Book Is Organized
  5. 1. The Basics of Optimization
    1. Getting to Better Optimization
    2. Optimization Lifecycle
      1. 1: Benchmark
      2. 2: Detection
      3. 3: Solve
      4. 4: Check
      5. 5: Repeat
    3. Hotspots and Bottlenecks
      1. Hotspots
      2. Bottlenecks
    4. Trade-Offs
    5. Levels of Optimization
      1. System Level
      2. Algorithmic Level
      3. Micro-Level
    6. Optimization Pitfalls
      1. Assumptions
      2. Premature Optimization
      3. Optimizing on Only One Machine
      4. Optimizing Debug Builds
      5. Bad Benchmarks
    7. Concurrency
    8. Middleware
    9. Big O Notation
    10. Conclusion
  6. 2. Planning for Your Project
    1. Project Lifecycle
    2. The Performance Budget
      1. Setting Specifications
      2. Developing Line Items
    3. Typical Project Trajectory
    4. Maximizing Return on Investment
    5. Visualizing Performance
    6. Understanding Slowness
      1. High Frame Rate
      2. The Value of Consistency
    7. Conclusion
  7. 3. The Tools
    1. Intrusiveness
    2. Types of Tools
      1. Profilers
      2. System Monitors
      3. System Adjusters
    3. Timers 101
    4. Code Instrumentation
      1. Simple Timing
      2. Hierarchical Profiling
      3. Counters
      4. Reports
    5. Tool Spotlight
      1. Intel VTune
        1. Counter Monitor
        2. Sampling
        3. Call Graph
      2. Microsoft PIX for Windows
      3. NVIDIA PerfHUD
      4. NVIDIA FX Composer
      5. DirectX Debug Runtime
      6. gprof
      7. AMD CodeAnalyst
      8. AMD GPU PerfStudio
    6. Conclusion
    7. Sources Cited
  8. 4. Hardware Fundamentals
    1. Memory
      1. Registers and Caches
      2. Memory Mapping
      3. Dynamic Random Access Memory
      4. Direct Memory Access
      5. Virtual Memory
      6. GPU and Memory
      7. Alignment and Fetching
      8. Caching
    2. CPU
      1. Lifecycle of an Instruction
        1. Load/Fetch/Decode
        2. Execution
        3. Retirement
      2. Running Out of Order
      3. Data Dependencies
      4. Branching and Branch Prediction
      5. Simultaneous Multi-Threading
      6. Multi-Core
    3. GPU: From API to Pixel
      1. Application Calls API
      2. Geometry
      3. Rasterization
    4. GPU Performance Terms
      1. GPU Programmability
      2. Shader Hardware
      3. Shader Languages
      4. Shader Models
      5. Shaders and Stream Processing
    5. Conclusion
    6. Works Cited
  9. 5. Holistic Video Game Optimization
    1. Holistic—The Optimal Approach
      1. Parallelism and a Holistic Approach
      2. The Power Is in the System
    2. The Process
    3. The Benchmark
    4. GPU Utilization
      1. The Decision
      2. The Tools
    5. CPU Bound: Overview
    6. CPU: Source Bound
      1. What to Expect
      2. The Tools
      3. Third-Party Module Bound
    7. GPU Bound
      1. Pre-Unified Shader Architecture
        1. The Tools
      2. Unified Shader Architecture
        1. The Tools
      3. Kernels
        1. Balancing Within the GPU
        2. Fragment Occlusion
    8. Graphics Bus
    9. Example
    10. Conclusion
    11. Works Cited
  10. 6. CPU Bound: Memory
    1. Detecting Memory Problems
    2. Solutions
    3. Pre-Fetching
    4. Access Patterns and Cache
      1. Randomness
      2. Streams
      3. AOS vs. SOA
      4. Solution: Strip Mining
    5. Stack, Global, and Heap
      1. Stack
      2. Global
      3. Heap
      4. Solution: Don’t Allocate
      5. Solution: Linearize Allocation
      6. Solution: Memory Pools
      7. Solution: Don’t Construct or Destruct
      8. Solution: Time-Scoped Pools
    6. Runtime Performance
      1. Aliasing
      2. Runtime Memory Alignment
      3. Fix Critical Stride Issues
    7. SSE Loads and Pre-Fetches
    8. Write-Combined Memory
    9. Conclusion
  11. 7. CPU Bound: Compute
    1. Micro-Optimizations
    2. Compute Bound
    3. Lookup Table
    4. Memoization
    5. Function Inlining
    6. Branch Prediction
      1. Make Branches More Predictable
      2. Remove Branches
      3. Profile-Guided Optimization
    7. Loop Unrolling
    8. Floating-Point Math
    9. Slow Instructions
      1. Square Root
      2. Bitwise Operations
      3. Datatype Conversions
    10. SSE Instructions
      1. History
      2. Basics
      3. Example: Adding with SIMD
    11. Trusting the Compiler
      1. Removing Loop Invariant Code
      2. Consolidating Redundant Functions
      3. Loop Unrolling
      4. Cross-.Obj Optimizations
      5. Hardware-Specific Optimizations
    12. Conclusion
    13. Works Cited
  12. 8. From CPU to GPU
    1. Project Lifecycle and You
    2. Points of Project Failure
      1. Synchronization
      2. Caps Management
      3. Resource Management
      4. Global Ordering
      5. Instrumentation
      6. Debugging
    3. Managing the API
      1. Assume Nothing
      2. Build Correct Wrappers
      3. State Changes
      4. Draw Calls
      5. State Blocks
      6. Instancing and Batching
      7. Render Managers
      8. Render Queues
    4. Managing VRAM
      1. Dealing with Device Resets
      2. Resource Uploads/Locks
      3. Resource Lifespans
      4. Look Out for Fragmentation
    5. Other Tricks
      1. Frame Run-Ahead
      2. Lock Culling
      3. Stupid Texture (Debug) Tricks
    6. Conclusion
  13. 9. The GPU
    1. Categories of GPU
    2. 3D Pipeline
    3. I’m GPU Bound!?
    4. What Does One Frame Look Like?
    5. Front End vs. Back End
      1. Back End
        1. Fill-Rate
        2. Render Target Format
        3. Blending
        4. Shading
        5. Texture Sampling
        6. Z/Stencil Culling
        7. Clearing
      2. Front End
        1. Vertex Transformation
        2. Vertex Fetching and Caching
        3. Tessellation
    6. Special Cases
      1. MSAA
      2. Lights and Shadows
      3. Forward vs. Deferred Rendering
      4. MRT
    7. Conclusion
  14. 10. Shaders
    1. Shader Assembly
    2. Full Circle
    3. Find Your Bottleneck
    4. Memory
      1. Inter-Shader Communication
      2. Texture Sampling
    5. Compute
      1. Hide Behind Latency
      2. Sacrifice Quality
      3. Trade Space for Time
      4. Flow Control
    6. Constants
    7. Runtime Considerations
    8. Conclusion
  15. 11. Networking
    1. Fundamental Issues
    2. Types of Traffic
    3. Game State and Events
    4. Bandwidth and Bit Packing and Packets, Oh My!
    5. How to Optimize Networking
    6. Embrace Failure
    7. Lie to the User
    8. Typical Scenarios
      1. Asset Download
      2. Streaming Audio/Video
      3. Chat
      4. Gameplay
    9. Profiling Networking
    10. How to Build Good Networking into Your Game
    11. Conclusion
  16. 12. Mass Storage
    1. What Are the Performance Issues?
    2. How to Profile
      1. Worst Case
      2. Best Case
      3. What About Fragmentation?
      4. SSDs to the Rescue!
      5. The Actual Data
      6. Bottom Line
    3. A Caveat on Profiling
    4. What Are the Big Opportunities?
      1. Hide Latency, Avoid Hitches
      2. Minimize Reads and Writes
      3. Asynchronous Access
      4. Optimize File Order
      5. Optimize Data for Fast Loading
    5. Tips and Tricks
      1. Know Your Disk Budget
      2. Filters
      3. Support Development and Runtime File Formats
      4. Support Dynamic Reloading
      5. Automate Resource Processing
      6. Centralized Resource Loading
      7. Preload When Appropriate
      8. For Those Who Stream
      9. Downloadable Content
    6. Conclusion
  17. 13. Concurrency
    1. Why Multi-Core?
    2. Why Multi-Threading Is Difficult
    3. Data and Task Parallelism
    4. Performance
      1. Scalability
      2. Contention
      3. Balancing
    5. Thread Creation
    6. Thread Destruction
    7. Thread Management
    8. Semaphore
    9. Win32 Synchronization
      1. Critical Sections and Mutex
      2. Semaphore
      3. Events
      4. WaitFor*Object Calls
    10. Multi-Threading Problems
      1. Race Condition
      2. Sharing/False Sharing
      3. Deadlock
      4. Balancing
      5. Practical Limits
      6. How Do We Measure?
      7. Solutions
    11. Example
      1. Naïve ReadWriter Implementation
      2. Array Implementation
      3. Batched Array
      4. Thread Count
      5. Sharing, Balancing, and Synchronization
    12. Conclusion
  18. 14. Consoles
    1. Know Your Console
    2. Keep Your Fundamentals Strong
    3. Push It to the Limit
    4. Trimming It Down
    5. Mind the Dip: Middleware
    6. Support Exists
    7. Understand the Contract
    8. RAM and the Bus
    9. Console GPUs Are Crazy
    10. Pushing the Algorithms
    11. Fixed Output, Tuned Data
    12. Specialized Tools
    13. Conclusion
  19. 15. Managed Languages
    1. Characteristics of a Managed Language
    2. Concerns for Profiling
    3. Change Your Assumptions
      1. What Should Be Implemented in a Managed Language?
      2. What Should Not Be Implemented in a Managed Language?
    4. Dealing with the Garbage Collector
      1. Under Pressure
      2. The Write Barrier
      3. Strategies for Better GC Behavior
    5. Dealing with JIT
      1. When Is JIT Active?
      2. Analyzing the JIT
    6. Practical Examples—ActionScript 3 and C#
      1. Watch Out for Function Call Overhead
      2. Language Features Can Be Traps
      3. Boxing and Unboxing
    7. Conclusion
  20. 16. GPGPU
    1. What Is GPGPU?
      1. When Is It Appropriate to Process on the GPU?
      2. How Fast Is the GPU for Things Other Than Graphics?
    2. GPU System Execution
    3. Architecture
      1. Unified Cores and Kernels
      2. Execution: From Bottom to Top
      3. Warps
      4. Block
      5. Grid
      6. Kernel
    4. Bottlenecks
      1. Host Communication
      2. Memory and Compute
    5. Conclusion