You are previewing Surreptitious Software.
O'Reilly logo
Surreptitious Software

Book Description

“This book gives thorough, scholarly coverage of an area of growing importance in computer security and is a ‘must have’ for every researcher, student, and practicing professional in software protection.”
    —Mikhail Atallah, Distinguished Professor of Computer Science at Purdue University

Theory, Techniques, and Tools for Fighting Software Piracy, Tampering, and Malicious Reverse Engineering

The last decade has seen significant progress in the development of techniques for resisting software piracy and tampering. These techniques are indispensable for software developers seeking to protect vital intellectual property. Surreptitious Software is the first authoritative, comprehensive resource for researchers, developers, and students who want to understand these approaches, the level of security they afford, and the performance  penalty they incur.

Christian Collberg and Jasvir Nagra bring together techniques drawn from related areas of computer science, including cryptography, steganography, watermarking, software metrics, reverse engineering, and compiler optimization. Using extensive sample code, they show readers how to implement protection schemes ranging from code obfuscation and software fingerprinting to tamperproofing and birthmarking, and discuss the theoretical and practical limitations of these techniques.

Coverage includes

  • Mastering techniques that both attackers and defenders use to analyze programs

  • Using code obfuscation to make software harder to analyze and understand

  • Fingerprinting software to identify its author and to trace software pirates

  • Tamperproofing software using guards that detect and respond to illegal modifications of code and data

  • Strengthening content protection through dynamic watermarking and dynamic obfuscation

  • Detecting code theft via software similarity analysis and birthmarking algorithms

  • Using hardware techniques to defend software and media against piracy and tampering

  • Detecting software tampering in distributed system

  • Understanding the theoretical limits of code obfuscation

  • Table of Contents

    1. Copyright
      1. Dedication
    2. Addison-Wesley Software Security Series
      1. Titles in the Series
    3. Preface
      1. Why Should You Read This Book?
      2. Who Uses Surreptitious Software?
      3. What’s the Goal of This Book?
    4. About the Authors
    5. Acknowledgments
    6. 1. What Is Surreptitious Software?
      1. 1.1. Setting the Scene
      2. 1.2. Attack and Defense
      3. 1.3. Program Analysis
        1. 1.3.1. A Simple Reverse Engineering Example
      4. 1.4. Code Obfuscation
        1. 1.4.1. Applications of Code Obfuscation
          1. 1.4.1.1. Malicious Reverse Engineering
          2. 1.4.1.2. Digital Rights Management
          3. 1.4.1.3. Mobile Agent Computing
          4. 1.4.1.4. Grid Computing
          5. 1.4.1.5. Artificial Diversity
        2. 1.4.2. Obfuscating Transformations
        3. 1.4.3. Black Hat Code Obfuscation
          1. 1.4.3.1. Misdirection—Stealthy Obfuscation
          2. 1.4.3.2. Obfuscating Viruses
      5. 1.5. Tamperproofing
        1. 1.5.1. Applications of Tamperproofing
        2. 1.5.2. An Example
      6. 1.6. Software Watermarking
        1. 1.6.1. An Example
        2. 1.6.2. Attacks on Watermarking Systems
      7. 1.7. Software Similarity
        1. 1.7.1. Plagiarism
        2. 1.7.2. Software Forensics
        3. 1.7.3. Birthmarking
        4. 1.7.4. A Birthmarking Example
      8. 1.8. Hardware-Based Protection Techniques
        1. 1.8.1. Distribution with Physical Token
        2. 1.8.2. Tying the Program to the CPU
        3. 1.8.3. Ensuring Safe Execution Environment
        4. 1.8.4. Encrypted Execution
        5. 1.8.5. Physical Barriers
      9. 1.9. Discussion
        1. 1.9.1. Reasons to Use Software Protection...
        2. 1.9.2. . . . and Reasons Not To
        3. 1.9.3. So Which Algorithms Should I Use?
      10. 1.10. Notation
    7. 2. Methods of Attack and Defense
      1. 2.1. Attack Strategies
        1. 2.1.1. A Prototypical Cracking Target
        2. 2.1.2. What’s the Adversary’s Motivation?
        3. 2.1.3. What Does the Adversary Get to Crack?
          1. 2.1.3.1. Static vs. Dynamic and Stripped vs. Unstripped
          2. 2.1.3.2. Architecture-Neutral Distribution Formats
        4. 2.1.4. What’s the Adversary’s Attack Methodology?
          1. 2.1.4.1. Dynamic Analysis—Cracking vs. Debugging
          2. 2.1.4.2. Dynamic Analysis—Exploiting Cracks in the Black Box
          3. 2.1.4.3. Static Analysis
          4. 2.1.4.4. Editing
          5. 2.1.4.5. Automation
        5. 2.1.5. What Tools Does the Adversary Use?
        6. 2.1.6. What Techniques Does the Adversary Use?
          1. 2.1.6.1. Learning About the Executable
          2. 2.1.6.2. Breaking on Library Functions
          3. 2.1.6.3. Static Pattern Matching
          4. 2.1.6.4. Watching Memory
          5. 2.1.6.5. Recovering Internal Data
          6. 2.1.6.6. Tampering with the Environment
          7. 2.1.6.7. Dynamic Pattern Matching
          8. 2.1.6.8. Differential Attacks
          9. 2.1.6.9. Recovering Algorithms Through Decompilation
        7. 2.1.7. Discussion
      2. 2.2. Defense Strategies
        1. 2.2.1. Notation
        2. 2.2.2. The cover Primitive
        3. 2.2.3. The duplicate Primitive
        4. 2.2.4. The split and merge Primitives
        5. 2.2.5. The reorder Primitive
        6. 2.2.6. The map Primitive
        7. 2.2.7. The indirect Primitive
        8. 2.2.8. The mimic Primitive
        9. 2.2.9. The advertise Primitive
        10. 2.2.10. The detect-respond Primitive
        11. 2.2.11. The dynamic Primitive
        12. 2.2.12. Discussion
      3. 2.3. Discussion
        1. 2.3.1. What Do We Need from Attack and Defense Models?
        2. 2.3.2. How Do We Use the Models to Devise Algorithms?
    8. 3. Program Analysis
      1. 3.1. Static Analysis
        1. 3.1.1. Control Flow Analysis
          1. 3.1.1.1. Dealing with Exceptions
          2. 3.1.1.2. Algorithm REAMB: Dealing with Self-Modifying Code
          3. 3.1.1.3. Identifying Loops
          4. 3.1.1.4. Interprocedural Control Flow
        2. 3.1.2. Data Flow Analysis
        3. 3.1.3. Data Dependence Analysis
        4. 3.1.4. Alias Analysis
          1. 3.1.4.1. Where Does Aliasing Occur?
          2. 3.1.4.2. Classifying Alias Analysis Problems
          3. 3.1.4.3. Alias Analysis Algorithms
        5. 3.1.5. Slicing
        6. 3.1.6. Abstract Interpretation
      2. 3.2. Dynamic Analysis
        1. 3.2.1. Debugging
          1. 3.2.1.1. Software vs. Hardware Breakpoints
          2. 3.2.1.2. Algorithm REBB: Reverse Debugging
          3. 3.2.1.3. Relative Debugging
        2. 3.2.2. Profiling
          1. 3.2.2.1. Profiler Implementation
        3. 3.2.3. Tracing
          1. 3.2.3.1. Algorithm RELJ: Compressing Traces
        4. 3.2.4. Emulation
      3. 3.3. Reconstituting Source
        1. 3.3.1. Disassembly
          1. 3.3.1.1. Linear vs. Recursive Traversal
          2. 3.3.1.2. Algorithm REHM: Disassembling Stripped Binaries
        2. 3.3.2. Decompilation
          1. 3.3.2.1. Algorithm RECG: Recovering High-Level Control Flow
          2. 3.3.2.2. Decompiling High-Level Languages
      4. 3.4. Pragmatic Analysis
        1. 3.4.1. Style Metrics
        2. 3.4.2. Software Complexity Metrics
        3. 3.4.3. Software Visualization
      5. 3.5. Discussion
    9. 4. Code Obfuscation
      1. 4.1. Semantics-Preserving Obfuscating Transformations
        1. 4.1.1. Algorithm OBFCF: Diversifying Transformations
          1. 4.1.1.1. Expression Equivalence
          2. 4.1.1.2. Algorithm OBFCFrecorder: Reordering Code and Data
          3. 4.1.1.3. Algorithm OBFCFinoutline: Splitting and Merging Functions
          4. 4.1.1.4. Algorithm OBFCFcopy: Copying Code
          5. 4.1.1.5. Algorithm OBFCFinterp: Interpretation
        2. 4.1.2. Algorithm OBFTP: Identifier Renaming
        3. 4.1.3. Obfuscation Executives
          1. 4.1.3.1. Algorithm OBFCTJOE: Maximizing Bang for the Buck
          2. 4.1.3.2. Algorithm OBFHC: Modeling Dependencies
      2. 4.2. Definitions
        1. 4.2.1. Potent Obfuscating Transformations
        2. 4.2.2. Efficient Obfuscating Transformations
        3. 4.2.3. Stealth
        4. 4.2.4. Other Definitions
      3. 4.3. Complicating Control Flow
        1. 4.3.1. Opaque Expressions
        2. 4.3.2. Algorithm OBFWHKD: Control-Flow Flattening
        3. 4.3.3. Introducing Aliasing
          1. 4.3.3.1. Algorithm OBFCTJalias: Adding Spurious Aliases
          2. 4.3.3.2. Algorithm OBFWHKDalias: Control-Flow Flattening, Take 2
        4. 4.3.4. Algorithm OBFCTJbogus: Inserting Bogus Control Flow
          1. 4.3.4.1. Irreducible Graphs
          2. 4.3.4.2. Mutually Dependent Opaque Predicates
        5. 4.3.5. Algorithm OBFLDK: Jumps Through Branch Functions
        6. 4.3.6. Attacks
          1. 4.3.6.1. Algorithm REUDM: Dynamic Attacks Against Control-Flow Flattening
          2. 4.3.6.2. Algorithm REMASB: Dynamic Attacks Against Branch Functions
      4. 4.4. Opaque Predicates
        1. 4.4.1. Algorithm OBFCTJpointer: Opaque Predicates from Pointer Aliasing
        2. 4.4.2. OBFWHKDopaque: Opaque Values from Array Aliasing
        3. 4.4.3. Algorithm OBFCTJthread: Opaque Predicates from Concurrency
        4. 4.4.4. Breaking Opaque Predicates
          1. 4.4.4.1. Algorithm REPMBG: Breaking: n| p(x)
          2. 4.4.4.2. Algorithm OBFCTJslice: Preventing Slicing Attacks
      5. 4.5. Data Encodings
        1. 4.5.1. Encoding Integers
          1. 4.5.1.1. Algorithm OBFBDKMRVnum: Number-Theoretic Tricks
          2. 4.5.1.2. Algorithm OBFBDKMRVcrypto: Encrypting Integers
        2. 4.5.2. Encoding Booleans
          1. 4.5.2.1. Algorithm OBFBDKMRVbool [28]: Multiple Values
          2. 4.5.2.2. Algorithm OBFCTJbool: Splitting Booleans
        3. 4.5.3. Encoding Literal Data
        4. 4.5.4. Encoding Arrays
          1. 4.5.4.1. Algorithm OBFZCW: Array Permutation
          2. 4.5.4.2. Algorithm OBFCTJarray: Array Restructuring
      6. 4.6. Breaking Abstractions
        1. 4.6.1. Algorithm OBFWCsig: Merging Function Signatures
        2. 4.6.2. Algorithm OBFCTJclass: Splitting and Merging Classes
        3. 4.6.3. Algorithm OBFDMRVSL: Destroying High-Level Structures
          1. 4.6.3.1. An Example
          2. 4.6.3.2. Evaluation
        4. 4.6.4. Algorithm OBFAJV: Modifying Instruction Encodings
      7. 4.7. Discussion
    10. 5. Obfuscation Theory
      1. 5.1. Definitions
      2. 5.2. Provably Secure Obfuscation: Possible or Impossible?
        1. 5.2.1. Turing’s Halting Problem
        2. 5.2.2. Algorithm REAA: De-obfuscating Programs
      3. 5.3. Provably Secure Obfuscation: It’s Possible (Sometimes)!
        1. 5.3.1. Algorithm OBFLBS: Obfuscating with Point Functions
          1. 5.3.1.1. Obfuscating access control
          2. 5.3.1.2. Obfuscating Regular Expressions
        2. 5.3.2. Algorithm OBFNS: Obfuscating Databases
        3. 5.3.3. Algorithm OBFPP: Homomorphic Encryption
        4. 5.3.4. Algorithm OBFCEJO: Whitebox DES
          1. 5.3.4.1. Traditional DES
          2. 5.3.4.2. Obfuscating DES
      4. 5.4. Provably Secure Obfuscation: It’s Impossible (Sometimes)!
        1. 5.4.1. A General Obfuscator
        2. 5.4.2. Obfuscating Learnable Functions
        3. 5.4.3. Proving that Obfuscation Is Impossible
        4. 5.4.4. Discussion
      5. 5.5. Provably Secure Obfuscation: Can It Be Saved?
        1. 5.5.1. Overcoming Impossibility
        2. 5.5.2. Definitions Revisited: Make Obfuscation Interactive
          1. 5.5.2.1. Cheapskate Problem
          2. 5.5.2.2. Millionaire Problem
        3. 5.5.3. Definition Revisited: Make Obfuscation Non-Semantics Preserving
          1. 5.5.3.1. Remote Procedure Call
          2. 5.5.3.2. Whitebox Remote Program Execution
      6. 5.6. Discussion
    11. 6. Dynamic Obfuscation
      1. 6.1. Definitions
      2. 6.2. Moving Code Around
        1. 6.2.1. Algorithm OBFKMNM: Replacing Instructions
        2. 6.2.2. OBFAGswap:Self-Modifying State Machine
          1. 6.2.2.1. Building the State Machine
          2. 6.2.2.2. Example Execution
          3. 6.2.2.3. Coding the Example
        3. 6.2.3. OBFMAMDSB: Dynamic Code Merging
      3. 6.3. Encryption
        1. 6.3.1. OBFCKSP: Code as Key Material
        2. 6.3.2. OBFAGcrypt: Combining Self-Modification and Encryption
          1. 6.3.2.1. An Example
          2. 6.3.2.2. Deriving the Keystream
          3. 6.3.2.3. Coding the Example
      4. 6.4. Discussion
    12. 7. Software Tamperproofing
      1. 7.1. Definitions
        1. 7.1.1. Checking for Tampering
        2. 7.1.2. Responding to Tampering
        3. 7.1.3. System Design
      2. 7.2. Introspection
        1. 7.2.1. Algorithm TPCA: Checker Network
        2. 7.2.2. Generating Hash Functions
        3. 7.2.3. Algorithm TPHMST: Hiding Hash Values
          1. 7.2.3.1. System Design
          2. 7.2.3.2. Interval Construction
          3. 7.2.3.3. Computing Corrector Slot Values
        4. 7.2.4. The Skype Obfuscated Protocol
          1. 7.2.4.1. Algorithm REBD: Attacking the Skype Client
        5. 7.2.5. Algorithm REWOS: Attacking Self-Hashing Algorithms
          1. 7.2.5.1. Algorithm TPGCK: Detecting Memory Splits
        6. 7.2.6. Discussion
        7. 7.3. Algorithm RETCJ: Response Mechanisms
      3. 7.4. State Inspection
        1. 7.4.1. Algorithm TPCVCPSJ: Oblivious Hash Functions
        2. 7.4.2. Algorithm TPJJV: Overlapping Instructions
      4. 7.5. Remote Tamperproofing
        1. 7.5.1. Distributed Check and Respond
        2. 7.5.2. Solution Strategies
        3. 7.5.3. Algorithm TPZG: Slicing Functions
        4. 7.5.4. Algorithm TPSLSPDK: Measuring Remote Hardware
          1. 7.5.4.1. Applications
          2. 7.5.4.2. The Pioneer Protocol
        5. 7.5.5. TPCNS: Continuous Replacement
      5. 7.6. Discussion
    13. 8. Software Watermarking
      1. 8.1. History and Applications
        1. 8.1.1. Applications
          1. 8.1.1.1. Visible vs. Invisible Marks
          2. 8.1.1.2. Robust vs. Fragile Marks
          3. 8.1.1.3. Authorship Marks
          4. 8.1.1.4. Fingerprint Marks
          5. 8.1.1.5. Licensing Marks
          6. 8.1.1.6. Meta-Data Marks
          7. 8.1.1.7. Validation Marks
          8. 8.1.1.8. Filtering Marks
          9. 8.1.1.9. Secret Marks
          10. 8.1.1.10. Inadvertent Authorship Marks
        2. 8.1.2. Embedding a Mark in Audio
        3. 8.1.3. Embedding a Mark in an Image
        4. 8.1.4. Embedding a Mark in Natural-Language Text
      2. 8.2. Watermarking Software
      3. 8.3. Definitions
        1. 8.3.1. Watermark Credibility
        2. 8.3.2. Attacks
        3. 8.3.3. Watermarking vs. Fingerprinting
      4. 8.4. Watermarking by Permutation
        1. 8.4.1. Algorithm WMDM: Reordering Basic Blocks
        2. 8.4.2. Renumbering
        3. 8.4.3. Algorithm WMQP: Improving Credibility
      5. 8.5. Tamperproofing Watermarks
        1. 8.5.1. Algorithm WMMC: Embedding Media Watermarks
      6. 8.6. Improving Resilience
        1. 8.6.1. Algorithm WMSHKQ: Statistical Watermarking
          1. 8.6.1.1. Embedding
          2. 8.6.1.2. Recognition
          3. 8.6.1.3. Discussion
      7. 8.7. Improving Stealth
        1. 8.7.1. Algorithm WMMIMIT: Mapping Instructions
        2. 8.7.2. Algorithm WMVVS: Watermarks in CFGs
          1. 8.7.2.1. Embedding
          2. 8.7.2.2. Recognition
          3. 8.7.2.3. Discussion
        3. 8.7.3. Algorithm WMCC: Abstract Interpretation
          1. 8.7.3.1. Embedding
          2. 8.7.3.2. Recognition
          3. 8.7.3.3. Discussion
      8. 8.8. Steganographic Embeddings
        1. 8.8.1. Algorithm WMASB: The Compiler as Embedder
          1. 8.8.1.1. Embedding
          2. 8.8.1.2. Recognition
          3. 8.8.1.3. Discussion
      9. 8.9. Splitting Watermark Integers
        1. 8.9.1. Splitting a Large Mark into Small Pieces
        2. 8.9.2. Redundant Watermark Pieces
          1. 8.9.2.1. Filtering Out Bogus Pieces
        3. 8.9.3. Sparse Codes for Increased Credibility
      10. 8.10. Graph Codecs
        1. 8.10.1. Oriented Parent-Pointer Tree
        2. 8.10.2. Radix Graphs
        3. 8.10.3. Permutation Graphs
        4. 8.10.4. Planted Plane Cubic Trees
        5. 8.10.5. Reducible Permutation Graphs
      11. 8.11. Discussion
        1. 8.11.1. Embedding Techniques
        2. 8.11.2. Attack Models
    14. 9. Dynamic Watermarking
      1. 9.1. Algorithm WMCT: Exploiting Aliasing
        1. 9.1.1. A Simple Example
        2. 9.1.2. Recognition Problems
        3. 9.1.3. Increasing Bitrate
          1. 9.1.3.1. Choosing an Efficient Graph Encoding
          2. 9.1.3.2. Generating Minimal Code
          3. 9.1.3.3. Increasing Stealth and Bitrate by Graph Splitting
        4. 9.1.4. Increasing Resilience to Attack
          1. 9.1.4.1. Protecting against Edge Flips
          2. 9.1.4.2. Protecting Against Node Splitting
          3. 9.1.4.3. Protecting Against Alias Analysis
        5. 9.1.5. Increasing Stealth
          1. 9.1.5.1. Avoiding Global Variables
          2. 9.1.5.2. Avoiding Unstealthy Node Classes
          3. 9.1.5.3. Avoiding Weak Cuts
        6. 9.1.6. Discussion
      2. 9.2. Algorithm WMNT: Exploiting Parallelism
        1. 9.2.1. Embedding Watermarking Widgets
        2. 9.2.2. Embedding Example
        3. 9.2.3. Recognition
        4. 9.2.4. Avoiding Pattern-Matching Attacks
        5. 9.2.5. Tamperproofing Widgets
        6. 9.2.6. Discussion
      3. 9.3. Algorithm WMCCDKHLSpaths: Expanding Execution Paths
        1. 9.3.1. Encoding and Embedding
        2. 9.3.2. Recognition
        3. 9.3.3. Discussion
      4. 9.4. Algorithm WMCCDKHLSbf: Tamperproofing Execution Paths
        1. 9.4.1. Embedding
        2. 9.4.2. Recognition
        3. 9.4.3. Tamperproofing the Branches
        4. 9.4.4. Discussion
      5. 9.5. Discussion
    15. 10. Software Similarity Analysis
      1. 10.1. Applications
        1. 10.1.1. Clone Detection
        2. 10.1.2. Software Forensics
        3. 10.1.3. Plagiarism Detection
        4. 10.1.4. Birthmark Detection
      2. 10.2. Definitions
        1. 10.2.1. Similarity Measures
          1. 10.2.1.1. Sequence Similarity
          2. 10.2.1.2. Set Similarity
          3. 10.2.1.3. Graph Similarity
      3. 10.3. k-gram-Based Analysis
        1. 10.3.1. SSSWAWINNOW: Selecting k-gram Hashes
        2. 10.3.2. SSSWAMOSS: Software Plagiarism Detection
          1. 10.3.2.1. Example—Source Code
        3. 10.3.3. SSMCkgram: k-gram Java Bytecode Birthmarks
      4. 10.4. API-Based Analysis
        1. 10.4.1. SSTNMM: Object-Oriented Birthmarks
        2. 10.4.2. SSTONMM: Dynamic Function Call Birthmarks
        3. 10.4.3. SSSDL: Dynamic k-gram API Birthmarks
      5. 10.5. Tree-Based Analysis
        1. 10.5.1. SSEFM: AST-Based Clone Detection
      6. 10.6. Graph-Based Analysis
        1. 10.6.1. SSKH: PDG-Based Clone Detection
        2. 10.6.2. SSLCHY: PDG-Based Plagiarism Detection
        3. 10.6.3. SSMCwpp: Dynamic Whole Program Birthmarks
      7. 10.7. Metrics-Based Analysis
        1. 10.7.1. SSKK: Metrics-Based Clone Detection
        2. 10.7.2. SSLM: Metrics-Based Authorship Analysis
          1. 10.7.2.1. Classification Using Histograms
          2. 10.7.2.2. Selecting the Right Metrics
      8. 10.8. Discussion
    16. 11. Hardware for Protecting Software
      1. 11.1. Anti-Piracy by Physical Distribution
        1. 11.1.1. Distribution Disk Protection
          1. 11.1.1.1. Audio CD Protection
          2. 11.1.1.2. CD-ROM Protection
          3. 11.1.1.3. Floppy Disk Protection
          4. 11.1.1.4. DVD Movie Protection
          5. 11.1.1.5. High-Definition Movie Protection
        2. 11.1.2. Dongles and Tokens
          1. 11.1.2.1. A Typical Dongle API
          2. 11.1.2.2. Attacking Dongles
      2. 11.2. Authenticated Boot Using a Trusted Platform Module
        1. 11.2.1. Trusted Boot
        2. 11.2.2. Taking Measurements
        3. 11.2.3. The TPM
        4. 11.2.4. The Challenge
        5. 11.2.5. Social Trust and Privacy Issues
        6. 11.2.6. Applications and Controversies
      3. 11.3. Encrypted Execution
        1. 11.3.1. The XOM Architecture
          1. 11.3.1.1. ISA Modifications
        2. 11.3.2. Preventing Replay Attacks
        3. 11.3.3. Fixing a Leaky Address Bus
        4. 11.3.4. Fixing a Leaky Data Bus
        5. 11.3.5. Discussion
      4. 11.4. Attacks on Tamperproof Devices
        1. 11.4.1. Tapping the Bus—The Microsoft XBOX Hack
        2. 11.4.2. Injecting Ciphertext—Dallas Semiconductor DS5002FP
        3. 11.4.3. Hacking Smartcards
          1. 11.4.3.1. Invasive Attacks
        4. 11.4.4. Non-Invasive Attacks
          1. 11.4.4.1. Fault Induction Attacks
          2. 11.4.4.2. Timing Attacks
          3. 11.4.4.3. Power Analysis Attacks
          4. 11.4.4.4. Countermeasures
        5. 11.4.5. Board-Level Protection
      5. 11.5. Discussion
    17. Bibliography