You are previewing Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation.
O'Reilly logo
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Book Description

The main characteristic of Reconfigurable Computing is the presence of hardware that can be reconfigured to implement specific functionality more suitable for specially tailored hardware than on a simple uniprocessor. Reconfigurable computing systems join microprocessors and programmable hardware in order to take advantage of the combined strengths of hardware and software and have been used in applications ranging from embedded systems to high performance computing. Many of the fundamental theories have been identified and used by the Hardware/Software Co-Design research field. Although the same background ideas are shared in both areas, they have different goals and use different approaches.This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology. It will take a reader with a background in the basics of digital design and software programming and provide them with the knowledge needed to be an effective designer or researcher in this rapidly evolving field.

Table of Contents

  1. Copyright
  2. The Morgan Kaufmann Series in Systems on Silicon
  3. List of Contributors
  4. Preface
    1. Acknowledgments
  5. Introduction
  6. I. Reconfigurable Computing Hardware
    1. 1. Device Architecture
      1. 1.1. Logic—The Computational Fabric
        1. 1.1.1. Logic Elements
        2. 1.1.2. Programmability
      2. 1.2. The Array and Interconnect
        1. 1.2.1. Interconnect Structures
          1. Nearest neighbor
          2. Segmented
          3. Hierarchical
        2. 1.2.2. Programmability
        3. 1.2.3. Summary
      3. 1.3. Extending Logic
        1. 1.3.1. Extended Logic Elements
          1. Fast carry chain
          2. Multipliers
          3. RAM
          4. Processor blocks
        2. 1.3.2. Summary
      4. 1.4. Configuration
        1. 1.4.1. SRAM
        2. 1.4.2. Flash Memory
        3. 1.4.3. Antifuse
        4. 1.4.4. Summary
      5. 1.5. Case Studies
        1. 1.5.1. Altera Stratix
          1. Logic architecture
          2. Routing architecture
        2. 1.5.2. Xilinx Virtex-II Pro
          1. Logic architecture
          2. Routing architecture
      6. 1.6. Summary
      7. References
    2. 2. Reconfigurable Computing Architectures
      1. 2.1. Reconfigurable Processing Fabric Architectures
        1. 2.1.1. Fine-grained
          1. Garp’s nonsymmetrical RPF
        2. 2.1.2. Coarse-grained
          1. PipeRench
      2. 2.2. RPF Integration into Traditional Computing Systems
        1. 2.2.1. Independent Reconfigurable Coprocessor Architectures
          1. RaPiD
        2. 2.2.2. Processor + RPF Architectures
          1. Loosely coupled RPF and processor architecture
          2. Tightly coupled RPF and processor
          3. Chimaera
      3. 2.3. Summary and Future Work
      4. References
    3. 3. Reconfigurable Computing Systems
      1. 3.1. Early Systems
      2. 3.2. PAM, VCC, and Splash
        1. 3.2.1. PAM
        2. 3.2.2. Virtual Computer
        3. 3.2.3. Splash
      3. 3.3. Small-scale Reconfigurable Systems
        1. 3.3.1. PRISM
        2. 3.3.2. CAL and XC6200
        3. 3.3.3. Cloning
      4. 3.4. Circuit Emulation
        1. 3.4.1. AMD/Intel
        2. 3.4.2. Virtual Wires
      5. 3.5. Accelerating Technology
        1. 3.5.1. Teramac
      6. 3.6. Reconfigurable Supercomputing
        1. 3.6.1. Cray, SRC, and Silicon Graphics
        2. 3.6.2. The CMX-2X
      7. 3.7. Non-FPGA Research
      8. 3.8. Other System Issues
      9. 3.9. The Future of Reconfigurable Systems
      10. References
    4. 4. Reconfiguration Management
      1. 4.1. Reconfiguration
      2. 4.2. Configuration Architectures
        1. 4.2.1. Single-context
        2. 4.2.2. Multi-context
        3. 4.2.3. Partially Reconfigurable
        4. 4.2.4. Relocation and Defragmentation
        5. 4.2.5. Pipeline Reconfigurable
        6. 4.2.6. Block Reconfigurable
        7. 4.2.7. Summary
      3. 4.3. Managing the Reconfiguration Process
        1. 4.3.1. Configuration Grouping
        2. 4.3.2. Configuration Caching
        3. 4.3.3. Configuration Scheduling
        4. 4.3.4. Software-based Relocation and Defragmentation
        5. 4.3.5. Context Switching
      4. 4.4. Reducing Configuration Transfer Time
        1. 4.4.1. Architectural Approaches
        2. 4.4.2. Configuration Compression
        3. 4.4.3. Configuration Data Reuse
      5. 4.5. Configuration Security
      6. 4.6. Summary
      7. References
  7. II. Programming Reconfigurable Systems
    1. 5. Compute Models and System Architectures
      1. 5.1. Compute Models
        1. 5.1.1. Challenges
        2. 5.1.2. Common Primitives
          1. Function
          2. Transform or object
        3. 5.1.3. Dataflow
          1. Single-rate synchronous dataflow
          2. Synchronous dataflow
          3. Dynamic streaming dataflow
          4. Dynamic Streaming Dataflow with Peeks
          5. Streaming dataflow with allocation
          6. General dataflow
        4. 5.1.4. Sequential Control
          1. Finite state
          2. Sequential control with allocation
          3. Single memory pool
        5. 5.1.5. Data Parallel
        6. 5.1.6. Data-centric
        7. 5.1.7. Multi-threaded
        8. 5.1.8. Other Compute Models
      2. 5.2. System Architectures
        1. 5.2.1. Streaming Dataflow
          1. Data presence
          2. Datapath sharing
          3. Streaming coprocessors
          4. Interconnect sharing
        2. 5.2.2. Sequential Control
          1. FSMD
          2. VLIW datapath control
          3. Processor
          4. Instruction augmentation
            1. Functional Unit model
            2. Coprocessor model
          5. Phased reconfiguration manager
          6. Worker farm
        3. 5.2.3. Bulk Synchronous Parallelism
        4. 5.2.4. Data Parallel
          1. Single program, multiple data
          2. Single-instruction multiple data
          3. Vector
          4. Vector coprocessors
        5. 5.2.5. Cellular Automata
          1. Folded CA
        6. 5.2.6. Multi-threaded
          1. Communicating FSMs with datapaths
          2. Processors with channels
          3. Message passing
          4. Shared memory
        7. 5.2.7. Hierarchical Composition
      3. References
    2. 6. Programming FPGA Applications in VHDL
      1. 6.1. VHDL Programming
        1. 6.1.1. Structural Description
        2. 6.1.2. RTL Description
        3. 6.1.3. Parametric Hardware Generation
        4. 6.1.4. Finite-state Machine Datapath Example
        5. 6.1.5. Advanced Topics
          1. Delta delay
          2. Multivalued logic
      2. 6.2. Hardware Compilation Flow
        1. 6.2.1. Constraints
      3. 6.3. Limitations of VHDL
      4. References
    3. 7. Compiling C for Spatial Computing
      1. 7.1. Overview of How C Code Runs on Spatial Hardware
        1. 7.1.1. Data Connections between Operations
        2. 7.1.2. Memory
        3. 7.1.3. If-then-else Using Multiplexers
        4. 7.1.4. Actual Control Flow
        5. 7.1.5. Optimizing the Common Path
        6. 7.1.6. Summary and Challenges
      2. 7.2. Automatic Compilation
        1. 7.2.1. Hyperblocks
        2. 7.2.2. Building a Dataflow Graph for a Hyperblock
          1. Top-level build algorithms
          2. Building data edges
          3. Building muxes
          4. Predicates
          5. Ordering edges
          6. Live variables at exits
          7. Scalar variables in memory
        3. 7.2.3. DFG Optimization
          1. Constant folding
          2. Identity simplification
          3. Strength reduction
          4. Dead node elimination
          5. Common subexpression elimination
          6. Boolean value identification
          7. Type-based operator size reduction
          8. Dataflow analysis-based operator size reduction
          9. Memory access optimization
          10. Removing redundant loads
        4. 7.2.4. From DFG to Reconfigurable Fabric
          1. Packing operations into clock cycles
          2. Scheduling
          3. Pipelined scheduling
          4. Connecting memory nodes to the memory ports
          5. What next?
      3. 7.3. Uses and Variations of C Compilation to Hardware
        1. 7.3.1. Automatic HW/SW Partitioning
        2. 7.3.2. Programmer Assistance
          1. Useful code changes
            1. Loop interchange, reversal, and other transforms
            2. Loop fusion and fission
            3. Local arrays
            4. Control structure
            5. Address indirection
            6. Declaration of data sizes
          2. Useful annotations
          3. Integrating operator-level modules
          4. Integrating large blocks
      4. 7.4. Summary
      5. References
    4. 8. Programming Streaming FPGA Applications Using Block Diagrams in Simulink
      1. 8.1. Designing High-performance Datapaths Using Stream-based Operators
      2. 8.2. An Image-processing Design Driver
        1. 8.2.1. Converting RGB Video to Grayscale
        2. 8.2.2. Two-dimensional Video Filtering
        3. 8.2.3. Mapping the Video Filter to the BEE2 FPGA Platform
      3. 8.3. Specifying Control in Simulink
        1. 8.3.1. Explicit Controller Design with Simulink Blocks
        2. 8.3.2. Controller Design Using the Matlab M Language
        3. 8.3.3. Controller Design Using VHDL or Verilog
        4. 8.3.4. Controller Design Using Embedded Microprocessors
      4. 8.4. Component Reuse: Libraries of Simple and Complex Subsystems
        1. 8.4.1. Signal-processing Primitives
        2. 8.4.2. Tiled Subsystems
      5. 8.5. Summary
        1. Acknowledgments
      6. References
    5. 9. Stream Computations Organized for Reconfigurable Execution
      1. 9.1. Programming
        1. 9.1.1. Task Description Format
        2. 9.1.2. C++ Integration and Composition
      2. 9.2. System Architecture and Execution Patterns
        1. 9.2.1. Stream Support
        2. 9.2.2. Phased Reconfiguration
        3. 9.2.3. Sequential versus Parallel
        4. 9.2.4. Fixed-size and Standard I/O Page
      3. 9.3. Compilation
      4. 9.4. Runtime
        1. 9.4.1. Scheduling
        2. 9.4.2. Placement
        3. 9.4.3. Routing
      5. 9.5. Highlights
      6. References
    6. 10. Programming Data Parallel FPGA Applications Using the SIMD/Vector Model
      1. 10.1. SIMD Computing on FPGAs: An Example
      2. 10.2. SIMD Processing Architectures
      3. 10.3. Data Parallel Languages
      4. 10.4. Reconfigurable Computers for SIMD/Vector Processing
      5. 10.5. Variations of SIMD/Vector Computing
        1. 10.5.1. Multiple SIMD Engines
        2. 10.5.2. A Multi-SIMD Coarse-grained Array
        3. 10.5.3. SPMD Model
      6. 10.6. Pipelined SIMD/Vector Processing
      7. 10.7. Summary
        1. Acknowledgments
      8. References
    7. 11. Operating System Support for Reconfigurable Computing
      1. 11.1. History
      2. 11.2. Abstracted Hardware Resources
        1. 11.2.1. Programming Model
      3. 11.3. Flexible Binding
        1. 11.3.1. Install Time Binding
        2. 11.3.2. Runtime Binding
        3. 11.3.3. Fast CAD for Flexible Binding
      4. 11.4. Scheduling
        1. 11.4.1. On-demand Scheduling
        2. 11.4.2. Static Scheduling
        3. 11.4.3. Dynamic Scheduling
        4. 11.4.4. Quasi-static Scheduling
        5. 11.4.5. Real-time Scheduling
        6. 11.4.6. Preemption
      5. 11.5. Communication
        1. 11.5.1. Communication Styles
          1. Shared memory
          2. Method calls
          3. Streams
        2. 11.5.2. Virtual Memory
        3. 11.5.3. I/O
        4. 11.5.4. Uncertain Communication Latency
      6. 11.6. Synchronization
        1. 11.6.1. Explicit Synchronization
        2. 11.6.2. Implicit Synchronization
        3. 11.6.3. Deadlock Prevention
      7. 11.7. Protection
        1. 11.7.1. Hardware Protection
        2. 11.7.2. Intertask Communication
        3. 11.7.3. Task Configuration Protection
      8. 11.8. Summary
      9. References
    8. 12. The JHDL Design and Debug System
      1. 12.1. JHDL Background and Motivation
      2. 12.2. The JHDL Design Language
        1. 12.2.1. Level-1 Design: Primitive Instantiation
        2. 12.2.2. Level-2 Design: Using the Logic Class and Its Provided Methods
        3. 12.2.3. Level-3 Design: Programmatic Circuit Generation (Module Generators)
        4. 12.2.4. JHDL Is a Structural Design Language
        5. 12.2.5. JHDL Is a Programmatic Circuit Design Language
      3. 12.3. The JHDL CAD System
        1. 12.3.1. Testbenches in JHDL
        2. 12.3.2. The cvt Class
      4. 12.4. JHDL’s Hardware Mode
      5. 12.5. Advanced JHDL Capabilities
        1. 12.5.1. Dynamic Testbenches
        2. 12.5.2. Behavioral Synthesis
        3. 12.5.3. Advanced Debugging Capabilities
          1. Debug circuitry synthesis
          2. Checkpointing, context switching, and remote access
      6. 12.6. Summary
      7. References
  8. III. Mapping Designs to Reconfigurable Platforms
    1. 13. Technology Mapping
      1. 13.1. Structural Mapping Algorithms
        1. 13.1.1. Cut Generation
        2. 13.1.2. Area-oriented Mapping
        3. 13.1.3. Performance-driven Mapping
        4. 13.1.4. Power-aware Mapping
      2. 13.2. Integrated Mapping Algorithms
        1. 13.2.1. Simultaneous Logic Synthesis, Mapping
        2. 13.2.2. Integrated Retiming, Mapping
        3. 13.2.3. Placement-driven Mapping
      3. 13.3. Mapping Algorithms for Heterogeneous Resources
        1. 13.3.1. Mapping to LUTs of Different Input Sizes
        2. 13.3.2. Mapping to Complex Logic Blocks
        3. 13.3.3. Mapping Logic to Embedded Memory Blocks
        4. 13.3.4. Mapping to Macrocells
      4. 13.4. Summary
      5. References
    2. FPGA Placement
    3. 14. Placement for General-purpose FPGAs
      1. 14.1. The FPGA Placement Problem
        1. 14.1.1. Device Legality Constraints
        2. 14.1.2. Optimization Goals
        3. 14.1.3. Designer Placement Directives
      2. 14.2. Clustering
      3. 14.3. Simulated Annealing for Placement
        1. 14.3.1. VPR and Related Annealing Algorithms
        2. 14.3.2. Simultaneous Placement and Routing with Annealing
      4. 14.4. Partition-based Placement
      5. 14.5. Analytic Placement
      6. 14.6. Further Reading and Open Challenges
      7. References
    4. 15. Datapath Composition
      1. 15.1. Fundamentals
        1. 15.1.1. Regularity
        2. 15.1.2. Datapath Layout
      2. 15.2. Tool Flow Overview
      3. 15.3. The Impact of Device Architecture
        1. 15.3.1. Architecture Irregularities
      4. 15.4. The Interface to Module Generators
        1. 15.4.1. The Flow Interface
        2. 15.4.2. The Data Model
        3. 15.4.3. The Library Specification
        4. 15.4.4. The Intra-module Layout
      5. 15.5. The Mapping
        1. 15.5.1. 1:1 Mapping
        2. 15.5.2. N:1 Mapping
        3. 15.5.3. The Combined Approach
      6. 15.6. Placement
        1. 15.6.1. Linear Placement
        2. 15.6.2. Constrained Two-dimensional Placement
        3. 15.6.3. Two-dimensional Placement
      7. 15.7. Compaction
        1. 15.7.1. Selecting HWOPs for Compaction
        2. 15.7.2. Regularity Analysis
        3. 15.7.3. Optimization Techniques
          1. Word-level optimization
          2. Context-sensitive optimization
          3. Logic optimization
        4. 15.7.4. Building the Super-HWOP
        5. 15.7.5. Discussion
      8. 15.8. Summary and Future Work
      9. References
    5. 16. Specifying Circuit Layout on FPGAs
      1. 16.1. The Problem
      2. 16.2. Explicit Cartesian Layout Specification
      3. 16.3. Algebraic Layout Specification
        1. 16.3.1. Case Study: Batcher’s Bitonic Sorter
      4. 16.4. Layout Verification for Parameterized Designs
      5. 16.5. Summary
      6. References
    6. 17. PathFinder: A Negotiation-based, Performance-driven Router for FPGAs
      1. 17.1. The History of PathFinder
      2. 17.2. The PathFinder Algorithm
        1. 17.2.1. The Circuit Graph Model
        2. 17.2.2. A Negotiated Congestion Router
        3. 17.2.3. The Negotiated Congestion/Delay Router
        4. 17.2.4. Applying A* to PathFinder
      3. 17.3. Enhancements and Extensions to PathFinder
        1. 17.3.1. Incremental Rerouting
        2. 17.3.2. The Cost Function
        3. 17.3.3. Resource Cost
        4. 17.3.4. The Relationship of PathFinder to Lagrangian Relaxation
        5. 17.3.5. Circuit Graph Extensions
          1. Symmetric device inputs
          2. De-multiplexers
          3. Bidirectional switches
      4. 17.4. Parallel PathFinder
      5. 17.5. Other Applications of the PathFinder Algorithm
      6. 17.6. Summary
        1. Acknowledgments
      7. References
    7. 18. Retiming, Repipelining, and C-slow Retiming
      1. 18.1. Retiming: Concepts, Algorithm, and Restrictions
      2. 18.2. Repipelining and C-slow Retiming
        1. 18.2.1. Repipelining
        2. 18.2.2. C-slow Retiming
      3. 18.3. Implementations of Retiming
      4. 18.4. Retiming on Fixed-frequency FPGAs
      5. 18.5. C-slowing as Multi-threading
      6. 18.6. Why Isn’t Retiming Ubiquitous?
      7. References
    8. 19. Configuration Bitstream Generation
      1. 19.1. The Bitstream
      2. 19.2. Downloading Mechanisms
      3. 19.3. Software to Generate Configuration Data
      4. 19.4. Summary
      5. References
    9. 20. Fast Compilation Techniques
      1. 20.1. Accelerating Classical Techniques
        1. 20.1.1. Accelerating Simulated Annealing
        2. 20.1.2. Accelerating PathFinder
      2. 20.2. Alternative Algorithms
        1. 20.2.1. Multiphase Solutions
        2. 20.2.2. Incremental Place and Route
      3. 20.3. Effect of Architecture
      4. 20.4. Summary
      5. References
  9. IV. Application Development
    1. 21. Implementing Applications with FPGAs
      1. 21.1. Strengths and Weaknesses of FPGAs
        1. 21.1.1. Time to Market
        2. 21.1.2. Cost
        3. 21.1.3. Development Time
        4. 21.1.4. Power Consumption
        5. 21.1.5. Debug and Verification
        6. 21.1.6. FPGAs and Microprocessors
      2. 21.2. Application Characteristics and Performance
        1. 21.2.1. Computational Characteristics and Performance
          1. Data parallelism
          2. Data element size and arithmetic complexity
          3. Pipelining
          4. Simple control requirements
        2. 21.2.2. I/O and Performance
      3. 21.3. General Implementation Strategies for FPGA-based Systems
        1. 21.3.1. Configure-once
        2. 21.3.2. Runtime Reconfiguration
          1. Global RTR
          2. Local RTR
          3. RTR applications
        3. 21.3.3. Summary of Implementation Issues
      4. 21.4. Implementing Arithmetic in FPGAs
        1. 21.4.1. Fixed-point Number Representation and Arithmetic
        2. 21.4.2. Floating-point Arithmetic
        3. 21.4.3. Block Floating Point
        4. 21.4.4. Constant Folding and Data-oriented Specialization
      5. 21.5. Summary
      6. References
    2. 22. Instance-specific Design
      1. 22.1. Instance-specific Design
        1. 22.1.1. Taxonomy
          1. Types of instance-specific optimizations
            1. Constant folding
            2. Function adaptation
            3. Architecture adaptation
        2. 22.1.2. Approaches
        3. 22.1.3. Examples of Instance-specific Designs
          1. Constant coefficient multipliers
          2. Key-specific crypto-processors
          3. Network intrusion detection
          4. Customizable instruction processors
      2. 22.2. Partial Evaluation
        1. 22.2.1. Motivation
        2. 22.2.2. Process of Specialization
        3. 22.2.3. Partial Evaluation in Practice
          1. Constant folding in logical expressions
          2. Unnecessary logic removal
        4. 22.2.4. Partial Evaluation of a Multiplier
          1. Optimizing a simple description
          2. Functional specialization for constant inputs
          3. Geometric specialization
        5. 22.2.5. Partial Evaluation at Runtime
        6. 22.2.6. FPGA-specific Concerns
          1. LUT mapping
          2. Static resources
          3. Verification of runtime specialization
      3. 22.3. Summary
      4. References
    3. 23. Precision Analysis for Fixed-point Computation
      1. 23.1. Fixed-point Number System
        1. 23.1.1. Multiple-wordlength Paradigm
        2. 23.1.2. Optimization for Multiple Wordlength
      2. 23.2. Peak Value Estimation
        1. 23.2.1. Analytic Peak Estimation
          1. Linear time-invariant systems
            1. Transfer function calculation
            2. Example
            3. Scaling with transfer functions
          2. Data range propagation
            1. Forward propagation
        2. 23.2.2. Simulation-based Peak Estimation
        3. 23.2.3. Summary of Peak Estimation
      3. 23.3. Wordlength Optimization
        1. 23.3.1. Error Estimation and Area Models
          1. Simulation-based methods
          2. An analytic technique for linear time-invariant systems
            1. Noise model
            2. Noise propagation and power estimation
          3. A hybrid approach for nonlinear differentiable systems
            1. Perturbation analysis
            2. Derivative monitors
            3. Linearization
            4. Noise injection
          4. High-level area models
        2. 23.3.2. Search Techniques
          1. A heuristic search procedure
          2. Alternative search procedures
      4. 23.4. Summary
      5. References
    4. 24. Distributed Arithmetic
      1. 24.1. Theory
      2. 24.2. DA Implementation
      3. 24.3. Mapping DA onto FPGAs
      4. 24.4. Improving DA Performance
      5. 24.5. An Application of DA on an FPGA
      6. References
    5. 25. CORDIC Architectures for FPGA Computing
      1. 25.1. CORDIC Algorithm
        1. 25.1.1. Rotation Mode
        2. 25.1.2. Scaling Considerations
        3. 25.1.3. Vectoring Mode
        4. 25.1.4. Multiple Coordinate Systems and a Unified Description
        5. 25.1.5. Computational Accuracy
          1. Angle approximation error
          2. Datapath rounding error
      2. 25.2. Architectural Design
      3. 25.3. FPGA Implementation of CORDIC Processors
        1. 25.3.1. Convergence
        2. 25.3.2. Folded CORDIC
        3. 25.3.3. Parallel Linear Array
        4. 25.3.4. Scaling Compensation
      4. 25.4. Summary
      5. References
    6. 26. Hardware/Software Partitioning
      1. 26.1. The Trend Toward Automatic Partitioning
      2. 26.2. Partitioning of Sequential Programs
        1. 26.2.1. Granularity
        2. 26.2.2. Partition Evaluation
        3. 26.2.3. Alternative Region Implementations
        4. 26.2.4. Implementation Models
        5. 26.2.5. Exploration
          1. Simple formulation
          2. Formulation with asymmetric communication and greedy/nongreedy automated heuristics
          3. Complex formulations and powerful automated heuristics
          4. Other issues
      3. 26.3. Partitioning of Parallel Programs
        1. 26.3.1. Differences among Parallel Programming Models
          1. Granularity
          2. Evaluation
          3. Alternative region implementations
          4. Implementation models
          5. Exploration
      4. 26.4. Summary and Directions
      5. References
  10. V. Case Studies of FPGA Applications
    1. 27. SPIHT Image Compression
      1. 27.1. Background
      2. 27.2. SPIHT Algorithm
        1. 27.2.1. Wavelets and the Discrete Wavelet Transform
        2. 27.2.2. SPIHT Coding Engine
      3. 27.3. Design Considerations and Modifications
        1. 27.3.1. Discrete Wavelet Transform Architectures
        2. 27.3.2. Fixed-point Precision Analysis
        3. 27.3.3. Fixed Order SPIHT
      4. 27.4. Hardware Implementation
        1. 27.4.1. Target Hardware Platform
        2. 27.4.2. Design Overview
        3. 27.4.3. Discrete Wavelet Transform Phase
        4. 27.4.4. Maximum Magnitude Phase
        5. 27.4.5. The SPIHT Coding Phase
      5. 27.5. Design Results
      6. 27.6. Summary and Future Work
      7. References
    2. 28. Automatic Target Recognition Systems on Reconfigurable Devices
      1. 28.1. Automatic Target Recognition Algorithms
        1. 28.1.1. Focus of Attention
        2. 28.1.2. Second-level Detection
      2. 28.2. Dynamically Reconfigurable Designs
        1. 28.2.1. Algorithm Modifications
        2. 28.2.2. Image Correlation Circuit
        3. 28.2.3. Performance Analysis
        4. 28.2.4. Template Partitioning
        5. 28.2.5. Implementation Method
      3. 28.3. Reconfigurable Static Design
        1. 28.3.1. Design-specific Parameters
        2. 28.3.2. Order of Correlation Tasks
          1. Zero mask rows
        3. 28.3.3. Reconfigurable Image Correlator
        4. 28.3.4. Application-specific Computation Unit
      4. 28.4. ATR Implementations
        1. 28.4.1. A Dynamically Reconfigurable System
        2. 28.4.2. A Statically Reconfigurable System
        3. 28.4.3. Reconfigurable Computing Models
      5. 28.5. Summary
        1. Acknowledgments
      6. References
    3. 29. Boolean Satisfiability: Creating Solvers Optimized for Specific Problem Instances
      1. 29.1. Boolean Satisfiability Basics
        1. 29.1.1. Problem Formulation
        2. 29.1.2. SAT Applications
      2. 29.2. SAT-solving Algorithms
        1. 29.2.1. Basic Backtrack Algorithm
        2. 29.2.2. Improving the Backtrack Algorithm
      3. 29.3. A Reconfigurable SAT Solver Generated According to an SAT Instance
        1. 29.3.1. Problem Analysis
        2. 29.3.2. Implementing a Basic Backtrack Algorithm with Reconfigurable Hardware
        3. 29.3.3. Implementing an Improved Backtrack Algorithm with Reconfigurable Hardware
      4. 29.4. A Different Approach to Reduce Compilation Time and Improve Algorithm Efficiency
        1. 29.4.1. System Architecture
        2. 29.4.2. Performance
        3. 29.4.3. Implementation Issues
      5. 29.5. Discussion
      6. References
    4. 30. Multi-FPGA Systems: Logic Emulation
      1. 30.1. Background
      2. 30.2. Uses of Logic Emulation Systems
      3. 30.3. Types of Logic Emulation Systems
        1. 30.3.1. Single-FPGA Emulation
        2. 30.3.2. Multi-FPGA Emulation
        3. 30.3.3. Design-mapping Overview
        4. 30.3.4. Multi-FPGA Partitioning and Placement Approaches
        5. 30.3.5. Multi-FPGA Routing Approaches
      4. 30.4. Issues Related to Contemporary Logic Emulation
        1. 30.4.1. In-circuit Emulation
        2. 30.4.2. Coverification
        3. 30.4.3. Logic Analysis
      5. 30.5. The Need for Fast FPGA Mapping
      6. 30.6. Case Study: The VirtuaLogic VLE Emulation System
        1. 30.6.1. The VirtuaLogic VLE Emulation System Structure
        2. 30.6.2. The VirtuaLogic Emulation Software Flow
        3. 30.6.3. Multiported Memory Mapping
        4. 30.6.4. Design Mapping with Multiple Asynchronous Clocks
        5. 30.6.5. Incremental Compilation of Designs
        6. 30.6.6. VLE Interfaces for Coverification
        7. 30.6.7. Parallel FPGA Compilation for the VLE System
      7. 30.7. Future Trends
      8. 30.8. Summary
      9. References
    5. 31. The Implications of Floating Point for FPGAs
      1. 31.1. Why Is Floating Point Difficult?
        1. 31.1.1. General Implementation Considerations
        2. 31.1.2. Adder Implementation
        3. 31.1.3. Multiplier Implementation
      2. 31.2. Floating-point Application Case Studies
        1. 31.2.1. Matrix Multiply
          1. FPGA implementation
          2. Performance
        2. 31.2.2. Dot Product
          1. FPGA implementation
          2. Performance
        3. 31.2.3. Fast Fourier Transform
          1. FPGA implementation
            1. Parallel architecture
            2. Pipelined architecture
            3. Parallel–pipelined architecture
          2. Performance
      3. 31.3. Summary
      4. References
    6. 32. Finite Difference Time Domain: A Case Study Using FPGAs
      1. 32.1. The FDTD Method
        1. 32.1.1. Background
        2. 32.1.2. The FDTD Algorithm
        3. 32.1.3. FDTD Applications
          1. Ground-penetrating radar
          2. Breast cancer detection
          3. Spiral antenna model
        4. 32.1.4. The Advantages of FDTD on an FPGA
          1. Parallelism and deep pipelining
          2. Fixed-point arithmetic
      2. 32.2. FDTD Hardware Design Case Study
        1. 32.2.1. The WildStar-II Pro FPGA Computing Board
        2. 32.2.2. Data Analysis and Fixed-point Quantization
        3. 32.2.3. Hardware Implementation
          1. Memory hierarchy and memory interface
          2. Managed-cache module
            1. Memory transfer bottleneck
            2. Dataflow and processing core optimization
            3. Expansion to three dimensions
          3. Pipelining and parallelism
            1. Pipelining
            2. Parallelism
            3. Two hardware implementations
        4. 32.2.4. Performance Results
      3. 32.3. Summary
      4. References
    7. 33. Evolvable FPGAs
      1. 33.1. The POE Model of Bioinspired Design Methodologies
      2. 33.2. Artificial Evolution
        1. 33.2.1. Genetic Algorithms
      3. 33.3. Evolvable Hardware
        1. 33.3.1. Genome Encoding
          1. High-level languages
          2. Low-level languages
          3. Fitness calculation
      4. 33.4. Evolvable Hardware: A Taxonomy
        1. 33.4.1. Extrinsic Evolution
        2. 33.4.2. Intrinsic Evolution
        3. 33.4.3. Complete Evolution
          1. Centralized evolution
          2. Population-oriented evolution
        4. 33.4.4. Open-ended Evolution
      5. 33.5. Evolvable Hardware Digital Platforms
        1. 33.5.1. Xilinx XC6200 Family
        2. 33.5.2. Evolution on Commercial FPGAs
          1. Virtual reconfiguration
          2. Dynamic partial reconfiguration
        3. 33.5.3. Custom Evolvable FPGAs
      6. 33.6. Conclusions and Future Directions
      7. References
    8. 34. Network Packet Processing in Reconfigurable Hardware
      1. 34.1. Networking with Reconfigurable Hardware
        1. 34.1.1. The Motivation for Building Networks with Reconfigurable Hardware
        2. 34.1.2. Hardware and Software for Packet Processing
        3. 34.1.3. Network Data Processing with FPGAs
        4. 34.1.4. Network Processing System Modularity
      2. 34.2. Network Protocol Processing
        1. 34.2.1. Internet Protocol Wrappers
        2. 34.2.2. TCP Wrappers
        3. 34.2.3. Payload-processing Modules
        4. 34.2.4. Payload Processing with Regular Expression Scanning
        5. 34.2.5. Payload Scanning with Bloom Filters
      3. 34.3. Intrusion Detection and Prevention
        1. 34.3.1. Worm and Virus Protection
        2. 34.3.2. An Integrated Header, Payload, and Queuing System
        3. 34.3.3. Automated Worm Detection
      4. 34.4. Semantic Processing
        1. 34.4.1. Language Identification
        2. 34.4.2. Semantic Processing of TCP Data
      5. 34.5. Complete Networking System Issues
        1. 34.5.1. The Rack-mount Chassis Form Factor
        2. 34.5.2. Network Control and Configuration
        3. 34.5.3. A Reconfiguration Mechanism
        4. 34.5.4. Dynamic Hardware Plug-ins
        5. 34.5.5. Partial Bitfile Generation
        6. 34.5.6. Control Channel Security
      6. 34.6. Summary
      7. References
    9. 35. Active Pages: Memory-centric Computation
      1. 35.1. Active Pages
        1. 35.1.1. DRAM Hardware Design
        2. 35.1.2. Hardware Interface
        3. 35.1.3. Programming Model
      2. 35.2. Performance Results
        1. 35.2.1. Speedup over Conventional Systems
        2. 35.2.2. Processor–Memory Nonoverlap
        3. 35.2.3. Summary
      3. 35.3. Algorithmic Complexity
        1. 35.3.1. Algorithms
        2. 35.3.2. Array-Insert
        3. 35.3.3. LCS (Two-dimensional Dynamic Programming)
        4. 35.3.4. Summary
      4. 35.4. Exploring Parallelism
        1. 35.4.1. Speedup over Conventional
        2. 35.4.2. Multiplexing Performance
        3. 35.4.3. Processor Width Performance
        4. 35.4.4. Processor Width versus Multiplexing
          1. Nonactive memory
          2. Active Pages processing time
          3. Partitioning
        5. 35.4.5. Summary
      5. 35.5. Defect Tolerance
      6. 35.6. Related Work
      7. 35.7. Summary
        1. Acknowledgments
      8. References
  11. VI. Theoretical Underpinnings and Future Directions
    1. 36. Theoretical Underpinnings
      1. 36.1. General Computational Array Model
      2. 36.2. Implications of the General Model
        1. 36.2.1. Instruction Distribution
        2. 36.2.2. Instruction Storage
      3. 36.3. Induced Architectural Models
        1. 36.3.1. Fixed Instructions (FPGA)
        2. 36.3.2. Shared Instructions (SIMD Processors)
      4. 36.4. Modeling Architectural Space
        1. 36.4.1. Raw Density from Architecture
        2. 36.4.2. Efficiency
          1. Mismatch in Wsimd
          2. Mismatch in Ninstr
          3. Composite effects
          4. Efficiency of processors and FPGAs
        3. 36.4.3. Caveats
      5. 36.5. Implications
        1. 36.5.1. Density of Computation versus Description
        2. 36.5.2. Historical Appropriateness
        3. 36.5.3. Reconfigurable Applications
      6. References
    2. 37. Defect and Fault Tolerance
      1. 37.1. Defects and Faults
      2. 37.2. Defect Tolerance
        1. 37.2.1. Basic Idea
        2. 37.2.2. Substitutable Resources
        3. 37.2.3. Yield
          1. Perfect yield
          2. Yield with sparing
        4. 37.2.4. Defect Tolerance through Sparing
          1. Testing
          2. Global sparing
          3. Perfect component model
          4. Local sparing
        5. 37.2.5. Defect Tolerance with Matching
          1. Matching formulation
          2. Fine-grained Pterm matching
          3. FPGA component level
      3. 37.3. Transient Fault Tolerance
        1. 37.3.1. Feedforward Correction
          1. Memory
        2. 37.3.2. Rollback Error Recovery
          1. Detection
          2. Recovery
          3. Communications
      4. 37.4. Lifetime Defects
        1. 37.4.1. Detection
        2. 37.4.2. Repair
      5. 37.5. Configuration Upsets
      6. 37.6. Outlook
      7. References
    3. 38. Reconfigurable Computing and Nanoscale Architecture
      1. 38.1. Trends in Lithographic Scaling
      2. 38.2. Bottom-up Technology
        1. 38.2.1. Nanowires
        2. 38.2.2. Nanowire Assembly
        3. 38.2.3. Crosspoints
      3. 38.3. Challenges
      4. 38.4. Nanowire Circuits
        1. 38.4.1. Wired-OR Diode Logic Array
        2. 38.4.2. Restoration
      5. 38.5. Statistical Assembly
      6. 38.6. nanoPLA Architecture
        1. 38.6.1. Basic Logic Block
        2. 38.6.2. Interconnect Architecture
        3. 38.6.3. Memories
        4. 38.6.4. Defect Tolerance
        5. 38.6.5. Design Mapping
        6. 38.6.6. Density Benefits
      7. 38.7. Nanoscale Design Alternatives
        1. 38.7.1. Imprint Lithography
        2. 38.7.2. Interfacing
        3. 38.7.3. Restoration
      8. 38.8. Summary
      9. References