You are previewing Software Development for Embedded Multi-core Systems: A Practical Guide Using Embedded Intel® Architecture.
O'Reilly logo
Software Development for Embedded Multi-core Systems: A Practical Guide Using Embedded Intel® Architecture

Book Description

The multicore revolution has reached the deployment stage in embedded systems ranging from small ultramobile devices to large telecommunication servers. The transition from single to multicore processors, motivated by the need to increase performance while conserving power, has placed great responsibility on the shoulders of software engineers. In this new embedded multicore era, the toughest task is the development of code to support more sophisticated systems. This book provides embedded engineers with solid grounding in the skills required to develop software targeting multicore processors. Within the text, the author undertakes an in-depth exploration of performance analysis, and a close-up look at the tools of the trade. Both general multicore design principles and processor-specific optimization techniques are revealed. Detailed coverage of critical issues for multicore employment within embedded systems is provided, including the Threading Development Cycle, with discussions of analysis, design, development, debugging, and performance tuning of threaded applications. Software development techniques engendering optimal mobility and energy efficiency are highlighted through multiple case studies, which provide practical "how-to" advice on implementing the latest multicore processors. Finally, future trends are discussed, including terascale, speculative multithreading, transactional memory, interconnects, and the software-specific implications of these looming architectural developments.

Table of Contents
Chapter 1 - Introduction
Chapter 2 - Basic System and Processor Architecture
Chapter 3 - Multi-core Processors & Embedded
Chapter 4 -Moving To Multi-core Intel Architecture
Chapter 5 - Scalar Optimization & Usability
Chapter 6 - Parallel Optimization Using Threads
Chapter 7 - Case Study: Data Decomposition
Chapter 8 - Case Study: Functional Decomposition
Chapter 9 - Virtualization & Partitioning
Chapter 10 - Getting Ready For Low Power Intel Architecture
Chapter 11 - Summary, Trends, and Conclusions
Appendix I
Glossary
References

* Get up to speed on multicore design! This is the only book to explain software optimization for embedded multicore systems
* Helpful tips, tricks and design secrets from an Intel programming expert, with detailed examples using the popular X86 architecture
* Covers hot topics including ultramobile devices, low-power designs, Pthreads vs. OpenMP, and heterogeneous cores

Table of Contents

  1. Copyright
  2. Preface
    1. Why This Book?
    2. Intended Audience
  3. Acknowledgments
  4. 1. Introduction
    1. 1.1. Motivation
    2. 1.2. The Advent of Multi-core Processors
    3. 1.3. Multiprocessor Systems Are Not New
    4. 1.4. Applications Will Need to be Multi-threaded
    5. 1.5. Software Burden or Opportunity
    6. 1.6. What is Embedded?
    7. 1.7. What is Unique About Embedded?
    8. Chapter Summary
      1. References
  5. 2. Basic System and Processor Architecture
    1. 2.1. Performance
    2. 2.2. Brief History of Embedded Intel® Architecture Processors
      1. 2.2.1. Intel® 186 Processor
      2. 2.2.2. Intel386™ Processor
        1. 2.2.2.1. 32-Bit Processor
        2. 2.2.2.2. Protected Memory Model
      3. 2.2.3. Intel486™ Processor
        1. 2.2.3.1. Floating Point
        2. 2.2.3.2. Cache Memory
        3. 2.2.3.3. Pipelining
      4. 2.2.4. Intel® Pentium Processor
        1. 2.2.4.1. Superscalar Execution
        2. 2.2.4.2. Performance Monitoring Counters
      5. 2.2.5. The Intel® Pentium III Processor
        1. 2.2.5.1. OOO Execution
        2. 2.2.5.2. Streaming SIMD Extensions
      6. 2.2.6. The Intel Pentium® 4 Processor
        1. 2.2.6.1. Hyper-threading Technology
      7. 2.2.7. The Intel Pentium® M Processor
        1. 2.2.7.1. Power Utilization
      8. 2.2.8. Dual-Core Intel Xeon® Processors LV and ULV and Dual-Core Intel® Xeon® Processor 5100 Series
      9. 2.2.9. Intel® Core™ 2 Duo Processors for Embedded Computing
        1. 2.2.9.1. Intel® 64 ISA
      10. 2.2.10. Quad-Core Intel® Xeon® Processor 5300 Series
    3. 2.3. Embedded Trends and Near Term Processor Impact
      1. 2.3.1. Future 45 nm Embedded Processor
      2. 2.3.2. Intel® Atom™ Processor Core
      3. 2.3.3. Tolapai SOC Accelerator
    4. 2.4. Tutorial on x86 Assembly Language
      1. 2.4.1. X86 Assembly Basics
      2. 2.4.2. Tip #1 – Focus on Small Regions
      3. 2.4.3. Tip #2 – Quickly Identify Source and Destination
      4. 2.4.4. Tip #3 – Learn Basic Registers and Memory References
      5. 2.4.5. Tip #4 – Learn Basic Frequently Used Operations
      6. 2.4.6. Tip #5 – Your Friendly Neighborhood Reference Manual
      7. 2.4.7. Tip #6 – Beware of Compiler Optimization
      8. 2.4.8. Tip #7 – Correlating Disassembly to Source
      9. 2.4.9. Sample Assembly Walkthrough
    5. Chapter Summary
    6. Related Reading
      1. References
  6. 3. Multi-core Processors and Embedded
    1. 3.1. Motivation for Multi-core Processors
    2. 3.2. Multi-core Processor Architecture
      1. 3.2.1. Homogeneous Multi-core and Heterogeneous Multi-core
      2. 3.2.2. Symmetric and Asymmetric Multi-core
    3. 3.3. Benefits of Multi-core Processors in Embedded
    4. 3.4. Embedded Market Segments and Multi-core Processors
      1. 3.4.1. Wireless Telecommunications Infrastructure
      2. 3.4.2. Industrial Control
      3. 3.4.3. Federal (Military, Aerospace, Government)
      4. 3.4.4. Enterprise Infrastructure Security
      5. 3.4.5. In-vehicle Infotainment
      6. 3.4.6. Interactive Clients
      7. 3.4.7. Voice and Converged Communications
      8. 3.4.8. Digital Security Surveillance
      9. 3.4.9. Storage
      10. 3.4.10. Medical
    5. 3.5. Evaluating Performance of Multi-core Processors
      1. 3.5.1. Single-core Performance Benchmark Suites
      2. 3.5.2. Multi-core Performance Benchmarks
      3. 3.5.3. Power Benchmarks
      4. 3.5.4. Estimating Application Performance
      5. 3.5.5. Characterizing Embedded System Performance
      6. 3.5.6. Reviewing Benchmark Data
    6. Chapter Summary
    7. Related Reading
      1. References
  7. 4. Moving to Multi-core Intel Architecture
    1. 4.1. Migrating to Intel Architecture
      1. 4.1.1. 32-Bit Versus 64-Bit Support
        1. 4.1.1.1. 32-Bit x86 ISA
        2. 4.1.1.2. Intel® 64 Instruction Set Architecture
      2. 4.1.2. Endianness: Big to Little
        1. 4.1.2.1. Endianness Assumption
          1. Endian Neutral Code
          2. QuickTransit * Technology
          3. BiEndian Technology
      3. 4.1.3. BIOS and OSes Considerations
        1. 4.1.3.1. Instruction Set Extension Support
        2. 4.1.3.2. Symmetric Multiprocessing (SMP) Support
        3. 4.1.3.3. Advanced Technologies (*Ts)
        4. 4.1.3.4. Basic Input/Output System (BIOS)
        5. 4.1.3.5. Extensible Firmware Interface (EFI)
        6. 4.1.3.6. Desktop/Server OS
        7. 4.1.3.7. Embedded Linux OS
        8. 4.1.3.8. Embedded Windows
          1. Embedded Windows With a Real-time Extension
        9. 4.1.3.9. Embedded OS
        10. 4.1.3.10. Proprietary OS
    2. 4.2. Enabling an SMP OS
      1. 4.2.1. Basic MESI Protocol
      2. 4.2.2. Device Driver and Kernel Programming Considerations
    3. 4.3. Tools for Multi-core Processor Development
      1. 4.3.1. OpenMP
      2. 4.3.2. Automatic Parallelization
      3. 4.3.3. Speculative Precomputation
      4. 4.3.4. Thread Libraries
        1. 4.3.4.1. Pthreads and Win32 Threads
        2. 4.3.4.2. Intel® Threading Building Blocks
        3. 4.3.4.3. Multi-threaded Domain-Specific Libraries
        4. 4.3.4.4. C++ Thread Library
      5. 4.3.5. Graphical Design Tools
      6. 4.3.6. Debugger
        1. 4.3.6.1. Multi-core Aware Debugging
        2. 4.3.6.2. Thread Verification Tools
      7. 4.3.7. Performance Analysis Tools
        1. 4.3.7.1. Profiling
        2. 4.3.7.2. Thread Profiling
    4. Chapter Summary
    5. Related Reading
      1. References
  8. 5. Scalar Optimization and Usability
    1. 5.1. Compiler Optimizations
      1. 5.1.1. General Optimizations
      2. 5.1.2. Advanced Optimizations
        1. 5.1.2.1. Automatic Vectorization
        2. 5.1.2.2. Interprocedural Optimization
        3. 5.1.2.3. Profile-guided Optimization
      3. 5.1.3. Advanced Optimization Options
      4. 5.1.4. Aiding Optimizations
    2. 5.2. Optimization Process
    3. 5.3. Usability
      1. 5.3.1. Diagnostics
      2. 5.3.2. Compatibility
      3. 5.3.3. Compile Time
      4. 5.3.4. PCH Files
      5. 5.3.5. Parallel Build
      6. 5.3.6. Code Size
      7. 5.3.7. Code Coverage
      8. 5.3.8. Optimization Effects on Debug
    4. Chapter Summary
    5. Related Reading
      1. References
  9. 6. Parallel Optimization Using Threads
    1. 6.1. Parallelism Primer
      1. 6.1.1. Thread
      2. 6.1.2. Decomposition
      3. 6.1.3. Scalability
      4. 6.1.4. Parallel Execution Limiters
      5. 6.1.5. Threading Technology Requirements
    2. 6.2. Threading Development Cycle
      1. 6.2.1. Analysis
        1. 6.2.1.1. Benchmark
        2. 6.2.1.2. Tune for Serial Performance
        3. 6.2.1.3. Collect an Execution Time Profile
        4. 6.2.1.4. Collect a Call Graph Profile
        5. 6.2.1.5. Flow Chart Hotspots
        6. 6.2.1.6. Classify Most Frequently Executed Loops
      2. 6.2.2. Design and Implementation
        1. 6.2.2.1. Code Modifications
      3. 6.2.3. Debug
        1. 6.2.3.1. Basic Multi-threaded Debugging
        2. 6.2.3.2. Thread-related Bugs
          1. Data Race
          2. Thread Stall
          3. Deadlock
        3. 6.2.3.3. Logging
        4. 6.2.3.4. Finding Thread-related Bugs
      4. 6.2.4. Tune
        1. 6.2.4.1. Synchronization
        2. 6.2.4.2. Memory Hierarchy
    3. Chapter Summary
    4. Related Reading
      1. References
  10. 7. Case Study: Data Decomposition
    1. 7.1. A Medical Imaging Data Examiner
      1. 7.1.1. Build Procedure
      2. 7.1.2. Analysis
        1. 7.1.2.1. Serial Optimization
        2. 7.1.2.2. Benchmark
        3. 7.1.2.3. Serial Optimization Results
        4. 7.1.2.4. Execution Time Profile
        5. 7.1.2.5. Collect a Call Graph Profile
        6. 7.1.2.6. Flow Chart Hotspots
        7. 7.1.2.7. Classify Most Frequently Executed Loops
      3. 7.1.3. Design and Implement
      4. 7.1.4. Debug
        1. 7.1.4.1. AMIDE Loop #1 Debug
        2. 7.1.4.2. Resolving Debug Issues
      5. 7.1.5. Tune
    2. Chapter Summary
      1. References
  11. 8. Case Study: Functional Decomposition
    1. 8.1. Snort
      1. 8.1.1. Application Overview
      2. 8.1.2. Build Procedure
    2. 8.2. Analysis
      1. 8.2.1. Serial Optimization
      2. 8.2.2. Benchmark
      3. 8.2.3. Serial Optimization Results
      4. 8.2.4. Execution Time Profile
      5. 8.2.5. Call Graph Profile
    3. 8.3. Design and Implement
      1. 8.3.1. Threading Snort
      2. 8.3.2. Code Modifications
      3. 8.3.3. Flow Pinning
      4. 8.3.4. Code Modifications for Flow Pinning
    4. 8.4. Snort Debug
      1. 8.4.1. Filtering Out False Errors
    5. 8.5. Tune
    6. Chapter Summary
      1. References
  12. 9. Virtualization and Partitioning
    1. 9.1. Overview
      1. 9.1.1. Preamble
    2. 9.2. Virtualization and Partitioning
      1. 9.2.1. VMM Architectures
      2. 9.2.2. Virtualization’s Use Models and Benefits
        1. 9.2.2.1. Workload Isolation
        2. 9.2.2.2. Workload Consolidation
        3. 9.2.2.3. Workload Migration
        4. 9.2.2.4. Clustering Over SMP ... and More
        5. 9.2.2.5. Platform Management
      3. 9.2.3. Motivation in Telecom/Embedded
        1. 9.2.3.1. Business Considerations
        2. 9.2.3.2. Real-Time OS Support
        3. 9.2.3.3. High Availability
        4. 9.2.3.4. Break Through the Performance Scaling Barrier
    3. 9.3. Techniques and Design Considerations
      1. 9.3.1. Virtualization’s Challenges on x86 Processors
        1. 9.3.1.1. Privilege Management
        2. 9.3.1.2. Memory Management
        3. 9.3.1.3. Interrupts and Exceptions Management
        4. 9.3.1.4. Missing Support and Future Roadmap
    4. 9.4. Telecom Use Case of Virtualization
      1. 9.4.1. Setup and Configuration BKMs
      2. 9.4.2. Compute and Network I/O Performance
    5. Chapter Summary
    6. Related Reading
      1. References
  13. 10. Getting Ready for Low Power Intel Architecture
    1. 10.1. Architecture
      1. 10.1.1. In-order Execution
        1. Miscellaneous Techniques for Energy Efficient Software
    2. 10.2. Debugging Embedded Systems
      1. 10.2.1. Brief History of Embedded Systems Debugging
      2. 10.2.2. JTAG and Future Trends in Embedded Debugging
      3. 10.2.3. Hardware Platform Bringup
      4. 10.2.4. OS and Device Driver Debugging
        1. 10.2.4.1. Case Study: OS Kernel Debugging
        2. 10.2.4.2. Case Study: Debugging Services and Drivers
      5. 10.2.5. Application Debugging
      6. 10.2.6. Considerations for Multi-core Debugging
        1. 10.2.6.1. JTAG
        2. 10.2.6.2. Debug Handler/Application
    3. Chapter Summary
      1. References
  14. 11. Summary, Trends, and Conclusions
    1. 11.1. Trends
      1. 11.1.1. Processor Trends
      2. 11.1.2. Software Challenges
        1. 11.1.2.1. Intel® QuickAssist Technology
        2. 11.1.2.2. Multicore Association Communication API (MCAPI)
        3. 11.1.2.3. Software Transactional Memory
      3. 11.1.3. Bandwidth Challenges
    2. 11.2. Conclusions
  15. Appendix A
  16. Glossary