You are previewing Practical FPGA Programming in C.
O'Reilly logo
Practical FPGA Programming in C

Book Description

C-based techniques for building high-performance, FPGA-accelerated software applications

Circuits, Devices, and Systems

C-based Techniques for Optimizing FPGA Performance, Design Flexibility, and Time to Market

Forward written by Clive "Max" Maxfield.

High-performance FPGA-accelerated software applications are a growing demand in fields ranging from communications and image processing to biomedical and scientific computing. This book introduces powerful, C-based parallel-programming techniques for creating these applications, verifying them, and moving them into FPGA hardware.

The authors bridge the chasm between "conventional" software development and the methods and philosophies of FPGA-based digital design. Software engineers will learn to look at FPGAs as "just another programmable computing resource," while achieving phenomenal performance because much of their code is running directly in hardware. Hardware engineers will master techniques that perfectly complement their existing HDL expertise, while allowing them to explore design alternatives and create prototypes far more rapidly. Both groups will learn how to leverage C to support efficient hardware/software co-design and improve compilation, debugging, and testing.

  • Understand when C makes sense in FPGA development and where it fits into your existing processes

  • Leverage C to implement software applications directly onto mixed hardware/software platforms

  • Execute and test the same C algorithms in desktop PC environments and in-system using embedded processors

  • Master new, C-based programming models and techniques optimized for highly parallel FPGA platforms

  • Supercharge performance by optimizing through automated compilation

  • Use multiple-process streaming programming models to deliver truly astonishing performance

  • Preview the future of FPGA computing

  • Study an extensive set of realistic C code examples

  • Table of Contents

    1. Copyright
      1. Dedication
    2. Prentice Hall Modern Semiconductor Design Series
    3. Foreword
      1. Why is this book of interest to the hardware folks?
      2. And what about the software guys and gals?
      3. So what's the catch?
    4. Preface
      1. C Language for FPGA-Based Hardware Design?
      2. Compelling Platforms for Software Acceleration
      3. The Power to Experiment
      4. How This Book Is Organized
      5. Where This Book Came From
    5. Acknowledgments
    6. 1. The FPGA as a Computing Platform
      1. 1.1. A Quick Introduction to FPGAs
        1. Common FPGA Characteristics
        2. FPGA Programming Technologies
      2. 1.2. FPGA-Based Programmable Hardware Platforms
      3. 1.3. Increasing Performance While Lowering Costs
      4. 1.4. The Role of Tools
        1. An Emphasis on Software-Based Methods
      5. 1.5. The FPGA as an Embedded Software Platform
      6. 1.6. The Importance of a Programming Abstraction
      7. 1.7. When Is C Language Appropriate for FPGA Design?
      8. 1.8. How to Use This Book
    7. 2. A Brief History of Programmable Platforms
      1. 2.1. The Origins of Programmable Logic
        1. New Methods of Design Are Required
        2. Early HDLs Increase Design Abstraction
        3. Larger Devices, More-Complex Programming Tools
      2. 2.2. Reprogrammability, HDLs, and the Rise of the FPGA
      3. 2.3. Systems on a Programmable Chip
        1. Toward Faster System Prototypes and Higher Performance
      4. 2.4. FPGAs for Parallel Computing
        1. Reconfigurable Computing and the FPGA
      5. 2.5. Summary
    8. 3. A Programming Model for FPGA-Based Applications
      1. 3.1. Parallel Processing Models
        1. SISD: The Original Processor Machine Model
        2. The SIMD Machine Model
        3. MIMD Machines and the Transputer
        4. Shared Memory MIMD Architectures
      2. 3.2. FPGAs as Parallel Computing Machines
        1. Adding Soft Processors to the Mix
      3. 3.3. Programming for Parallelism
      4. 3.4. Communicating Process Programming Models
      5. 3.5. The Impulse C Programming Model
      6. 3.6. Summary
    9. 4. An Introduction to Impulse C
      1. 4.1. The Motivation Behind Impulse C
      2. 4.2. The Impulse C Programming Model
      3. 4.3. A Minimal Impulse C Program
        1. The Software Source File: HelloFPGA_sw.c
        2. The Hardware Source File: HelloFPGA_hw.c
      4. 4.4. Processes Streams Signals, and Memory
      5. 4.5. Impulse C Signed and Unsigned Datatypes
      6. 4.6. Understanding Processes
        1. Creating Processes
      7. 4.7. Understanding Streams
        1. Stream I/O
      8. 4.8. Using Output Streams
      9. 4.9. Using Input Streams
        1. Checking for End of Stream
        2. Efficient Use of Stream Reads
          1. Method 1 (preferred method)
          2. Method 2 (acceptable method)
          3. Method 3 (less acceptable)
      10. 4.10. Avoiding Stream Deadlocks
        1. Using Nonblocking Stream Reads
        2. Deadlocks and the PIPELINE Pragma
      11. 4.11. Creading and Using Signals
      12. 4.12. Understanting Registers
        1. Controlling Registers from Software Processes
      13. 4.13. Using Shared Memories
      14. 4.14. Memory and Stream Performance Considerations
        1. Micro-Benchmark Introduction
        2. Memory Test Results for the Altera Nios Platform
        3. Memory Test Results for the Xilinx PowerPC Platform
        4. Memory Test Results for the Xilinx MicroBlaze Platform
      15. 4.15. Summary
    10. 5. Describing a FIR Filter
      1. 5.1. Design Overview
      2. 5.2. The FIR Filter Hardware Process
      3. 5.3. The Software Test Bench
        1. The Producer Process
        2. The Consumer Process
      4. 5.4. Desktop Simulation
      5. 5.5. Application Monitoring
        1. Monitoring with Log Windows
      6. 5.6. Summary
    11. 6. Generating FPGA Hardware
      1. 6.1. The Hardware Generation Flow
        1. Optimization and Hardware Generation
      2. 6.2. Understanding the Generated Structure
      3. 6.3. Stream and Signal Interfaces
        1. Stream and Signal Protocols
        2. Streams Used in Write Mode
        3. Streams Used in Read Mode
        4. Signals Used in Post (Write) Mode
        5. Signals Used in Wait (Read) Mode
      4. 6.4. Using HDL Simulation to Understand Stream Protocols
      5. 6.5. Debugging the Generated Hardware
      6. 6.6. Hardware Generation Notes
        1. Instruction-Level Pragmas
        2. Pragma CO PIPELINE
        3. Pragma CO UNROLL
        4. Pragma CO SET StageDelay
        5. Understanding Latency and Rate
        6. Controlling Stage Delays
      7. 6.7. Making Efficient Use of the Optimizers
        1. The Stage Master Optimizer
        2. Instruction Scheduling and Assignments
        3. Impacts of Memory Access
      8. 6.8. Language Constraints for Hardware Processes
        1. No Support for Hardware Function Calls
        2. Integer Math Operations
        3. Shift Operations
        4. Support for Datatypes Is Limited
        5. Limited Support for Pointers
        6. Using Pointers with Multidimensional Arrays
      9. 6.9. Summary
    12. 7. Increasing Statement-Level Parallelism
      1. 7.1. A Model of FPGA Computation
      2. 7.2. C Language Semantics and Parallelism
      3. 7.3. Exploiting Instruction-Level Parallelism
        1. Instruction Scheduling
        2. Pipeline Generation
        3. Optimizer Operation
        4. Expression-Level Optimizations
        5. Optimization Within Basic Blocks
      4. 7.4. Limiting Instruction Stages
        1. Reduce Memory Accesses for Higher Performance
        2. Array Splitting
      5. 7.5. Unrolling Loops
      6. 7.6. Pipelining Explained
        1. Pipeline Rate
      7. 7.7. Summary
    13. 8. Porting a Legacy Application to Impulse C
      1. 8.1. The Triple-DES Algorithm
      2. 8.2. Converting the Algorithm to a Streaming Model
      3. 8.3. Performing Software Simulation
      4. 8.4. Compiling To Hardware
      5. 8.5. Preliminary Hardware Analysis
        1. Initial Results: 10.6X Performance Increase
      6. 8.6. Summary
    14. 9. Creating an Embedded Test Bench
      1. 9.1. A Mixed Hardware and Software Approach
        1. Considering Data Transfer Overhead
      2. 9.2. The Embedded Processor as a Test Generator
        1. A Unit Test Philosophy
        2. Desktop Simulation Versus Embedded Test Benches
        3. Data Throughput and Processor Selection
        4. Moving Test Generators to Hardware
      3. 9.3. The Role of Hardware Simulators
      4. 9.4. Testing the Triple-DES Algorithm in Hardware
        1. Platform Selection
        2. Software and Hardware Algorithm Comparison
      5. 9.5. Software Stream Macro Interfaces
      6. 9.6. Building the Test System
        1. Specifying the Platform Support Package
        2. Generating HDL for the Hardware Process
        3. Creating the Platform Using the Xilinx Tools
        4. Creating a Platform Studio Project and Choosing a Board
        5. Configuring the MicroBlaze Processor
        6. Exporting Files from the Impulse Tools
        7. Importing the Generated Hardware
        8. Adding the 3DES Hardware IP Core
        9. Setting the FSL Bus Parameters and Connections for the Core
        10. Adding and Configuring an OPB Timer Core
        11. Specifying the Peripheral Addresses
        12. Specifying the System Clock Pin
        13. Importing the Application Software
        14. Generating the FPGA Bitmap
        15. Downloading and Running the Application
      7. 9.7. Summary
    15. 10. Optimizing C for FPGA Performance
      1. 10.1. Rethinking an Algorithm for Performance
      2. 10.2. Refinement 1: Reducing Size by Introducing a Loop
      3. 10.3. Refinement 2: Array Splitting
      4. 10.4. Refinement 3: Improving Streaming Performance
      5. 10.5. Refinement 4: Loop Unrolling
      6. 10.6. Refinement 5: Pipelining the Main Loop
        1. Using Stage Master Explorer
        2. The Goal of Pipelining
      7. 10.7. Summary
    16. 11. Describing System-Level Parallelism
      1. 11.1. Design Overview
      2. 11.2. Performing Desktop Simulation
      3. 11.3. Refinement 1: Creating Parallel 8-Bit Filters
      4. 11.4. Refinement 2: Creating a System-Level Pipeline
        1. The DMA Input Process
        2. The Column Generator Process
        3. The Image Filter Process
        4. The Stream to Memory Process
        5. The Configuration Function
      5. 11.5. Moving the Application to Hardware
        1. Generating the FPGA Hardware
        2. Exporting the Generated Files
        3. Creating a New Quartus and SOPC Builder Project
        4. Creating the New Platform Using SOPC Builder
        5. Configuring the FPGA Platform
        6. Adding Nios II Peripherals
          1. The Timer Peripheral
          2. External Flash Memory Interface
          3. External RAM Interface
          4. JTAG UART Interface
          5. External RAM Bus (Avalon Tri-State Bridge)
        7. Adding the Hardware Process Module (img_arch)
        8. Setting Additional CPU Settings
        9. Generating the System
        10. Connecting the Generated System to FPGA Pins
        11. FPGA Pin Assignment
        12. Generating the FPGA Bitmap
        13. Running the Test Application on the Platform
      6. 11.6. Summary
    17. 12. Combining Impulse C with an Embedded Operating System
      1. 12.1. The uClinux Operating System
        1. Combining uClinux and Impulse C
      2. 12.2. A uClinux Demonstration Project
        1. The Impulse C Project Files
        2. Building the Application for the Target Platform
        3. Copying the Sample uClinux Platform Files
        4. Specifying the Platform Support Package
        5. Building the Image Filter Hardware
        6. Exporting the Software and Hardware
        7. Modifying the Sample uClinux Platform
        8. Configuring the Platform with the Image Filter IP Core
        9. Making the FSL Connections
        10. Connecting the Image Filter Clock and Reset Lines
        11. Generating the FPGA Bitmap
        12. Downloading the Kernel Image
        13. Building and Testing the Image Filter Software
        14. Using TFTP to Transfer Files to the Board
        15. Running the Image Filter Program
      3. 12.3. Summary
    18. 13. Mandelbrot Image Generation
      1. 13.1. Design Overview
        1. The Mandelbrot Set
        2. Accuracy Versus Processing Requirements
      2. 13.2. Expressing the Algorithm in C
      3. 13.3. Creating a Fixed-Point Equivalent
      4. 13.4. Creating a Streaming Version
        1. The Impulse C Process
      5. 13.5. Parallelizing the Algorithm
        1. Partitioning the Problem
        2. Output Synchronization
        3. The Configuration Function
      6. 13.6. Future Refinements
      7. 13.7. Summary
    19. 14. The Future of FPGA Computing
      1. 14.1. The FPGA as a High-Performance Computer
        1. Taking Parallelism to Extreme Levels
        2. Many Platforms to Choose From
        3. Taking a Software Approach
      2. 14.2. The Future of FPGA Computing
        1. Bigger FPGAs and Increased System Integration
      3. 14.3. Summary
    20. A. Getting the Most Out of Embedded FPGA Processors
      1. A.1. FPGA Embedded Processor Overview
        1. Soft Versus Hard Processor
        2. Advantages of an FPGA Embedded Processor
          1. Customization
          2. Obsolescence Mitigation
          3. Component and Cost Reduction
          4. Hardware Acceleration
        3. Disadvantages
      2. A.2. Peripherals and Memory Controllers
        1. Peripheral Types
        2. Memory Controllers
      3. A.3. Increasing Processor Performance
        1. Manufacturers' Benchmarks
        2. Performance-Enhancing Techniques
      4. A.4. Optimization Techniques That Are Not FPGA-Specific
        1. Code Manipulation
          1. Optimization Level
          2. Use of Manufacturer-Optimized Instructions
          3. Assembly
          4. Miscellaneous
        2. Memory Usage
          1. Local Memory Only
          2. External Memory Only
          3. Cache External Memory
          4. Partitioning Code into Internal, External, and Cached Memory
      5. A.5. FPGA-Specific Optimization Techniques
        1. Increasing the FPGA's Operating Frequency
          1. Logic Optimization and Reduction
          2. Area and Timing Constraints
        2. Hardware Acceleration
          1. Turn on the Hardware Divider and Barrel-Shifter
          2. Software Bottlenecks Converted to Coprocessing Hardware
      6. A.6. Summary
        1. Have Reasonable Expectations
        2. Optimization Through Experimentation Yields the Best Results
        3. Take Advantage of Superior Flexibility in FPGAs
        4. Acknowledgments
    21. B. Creating a Custom Stream Interface
      1. B.1. Application Overview
      2. B.2. The DS92LV16 Serial Link for Data Streaming
        1. Initializing the Serial Connection
      3. B.3. Stream Interface State Machine Description
        1. State: send_sync
        2. State: send_sync_240 State: send_flags_240
        3. State: send_ack1
        4. State: send_ack2
        5. State: connected
      4. B.4. Data Transmission
        1. Future Refinements
      5. B.5. Summary
    22. C. Impulse C Function Reference
      1. CO_ARCHITECTURE_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      2. CO_BIT_EXTRACT
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      3. CO_BIT_EXTRACT_U
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      4. CO_BIT_INSERT
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      5. CO_BIT_INSERT_U
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      6. CO_EXECUTE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      7. CO_INITIALIZE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      8. CO_MEMORY_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      9. CO_MEMORY_PTR
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      10. CO_MEMORY_READBLOCK
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      11. CO_MEMORY_WRITEBLOCK
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      12. CO_PAR_BREAK
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      13. CO_PROCESS_CONFIG
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      14. CO_PROCESS_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      15. CO_REGISTER_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      16. CO_REGISTER_GET
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      17. CO_REGISTER_PUT
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      18. CO_REGISTER_READ
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      19. CO_REGISTER_WRITE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      20. CO_SIGNAL_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      21. CO_SIGNAL_POST
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      22. CO_SIGNAL_WAIT
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      23. CO_STREAM_CLOSE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      24. CO_STREAM_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      25. CO_STREAM_EOS
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      26. CO_STREAM_OPEN
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      27. CO_STREAM_READ
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      28. CO_STREAM_READ_NB
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      29. CO_STREAM_WRITE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
        5. Notes
      30. COSIM_LOGWINDOW_CREATE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      31. COSIM_LOGWINDOW_FWRITE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      32. COSIM_LOGWINDOW_INIT
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
      33. COSIM_LOGWINDOW_WRITE
        1. Header File
        2. Description
        3. Arguments
        4. Return Value
    23. D. Triple-DES Source Listings
      1. DES_HW.C
      2. DES.C
      3. DES_SW.C
      4. DES.H
    24. E. Image Filter Listings
      1. IMG_HW.C
      2. IMG_SW.C
      3. IMG.H
    25. F. Selected References