You are previewing 32/64-Bit 80x86 Assembly Language Architecture.
O'Reilly logo
32/64-Bit 80x86 Assembly Language Architecture

Book Description

The increasing complexity of programming environments provides a number of opportunities for assembly language programmers. 32/64-Bit 80x86 Assembly Language Architecture attempts to break through that complexity by providing a step-by-step understanding of programming Intel and AMD 80x86 processors in assembly language. This book explains 32-bit and 64-bit 80x86 assembly language programming inclusive of the SIMD (single instruction multiple data) instruction supersets that bring the 80x86 processor into the realm of the supercomputer, gives insight into the FPU (floating-point unit) chip in every Pentium processor, and offers strategies for optimizing code.

Table of Contents

  1. Copyright
  2. Preface
  3. 1. Introduction
    1. Conventions Used in This Book
      1. Companion Code
      2. Image Patterning
      3. Processor Legend
      4. Notes, Tips, and Hints
      5. Pseudo Vec
      6. Pseudo Vec (x86) (3DNow!) (3DNow!+) (MMX) (MMX+) (SSE) (SSE2) (SSE3)
      7. Graphics 101 (x86) (3DNow!) (MMX)
      8. Algebraic Law
      9. I-VU-Q
  4. 2. Coding Standards
    1. Constants
    2. Data Alignment
    3. Stacks and Vectors
      1. 3D Vector (Floating-Point)
      2. 3D Quad Vector (Floating-Point)
    4. Compiler Data
    5. Assertions
    6. Memory Systems
      1. RamTest Memory Alignment Test
      2. Memory Header
      3. Allocate Memory (Malloc Wrapper)
      4. Release Memory (Free Wrapper)
      5. Allocate Memory
      6. Allocate (Cleared) Memory
      7. Free Memory — Pointer Is Set to NULL
    7. Exercises
  5. 3. Processor Differential Insight
    1. Processor Overview
    2. History
    3. The 64-Bit Processor
    4. 80x86 Registers
      1. General-Purpose Registers
      2. REX
      3. Segment/Selector Registers
      4. MMX Registers
      5. XMM Registers
    5. CPU Status Registers (EFLAGS/64-Bit RFLAGS)
      1. LAHF — Load AH Flags
      2. SAHF — Save AH Flags
      3. PUSHF/PUSHFD — Push EFLAGS onto Stack
      4. PUSHFQ — Push RFLAGS onto Stack
      5. POPF/POPFD — Pop EFLAGS from Stack
      6. POPFQ — Pop RFLAGS from Stack
      7. CLC — Clear (Reset) Carry Flag
      8. STC — Set Carry Flag
      9. CMC — Complement Carry Flag
    6. NOP — No Operation
    7. Floating-Point 101
    8. Processor Data Type Encoding
    9. EMMS — Enter/Leave MMX State
    10. FEMMS — Enter/Leave MMX State
    11. Destination/Source Orientations
    12. Big/Little-Endian
    13. Alignment Quickie
    14. (Un)aligned Memory Access
      1. MOV/MOVQ — Move Data
      2. Move (Unaligned)
      3. Move (Aligned)
      4. Misaligned SSE(2) (128-bit)
      5. Aligned SSE(2) (128-bit)
      6. XCHG — Exchange (Swap) Data
    15. System Level Functionality
    16. Indirect Memory Addressing
      1. uint32 OddTable[ ]
      2. LEA — Load Effective Address
    17. Translation Table
      1. XLAT/XLATB — Translation Table Lookup
    18. String Instructions
      1. LODSB/LODSW/LODSD/LODSQ — Load String
      2. REP LODSx
      3. STOSB/STOSW/STOSD/STOSQ — Save String
      4. REP/REPE/REPZ/REPNE/REPNZ — Repeat String
      5. REP STOSx
      6. MOVSB/MOVSW/MOVSD/MOVSQ — Move String
      7. REP MOVSx
      8. CLD/STD — Clear/Set Direction Flag
    19. Special (Non-Temporal) Memory Instructions
      1. MOVNTx — Copy Using Non-Temporal Hint
      2. MOVNTPS — Copy 4 × SPFP Using Non-Temporal Hint
      3. MOVNTPD — Copy 2×DPFP Using Non-Temporal Hint
      4. MASKMOVQ/MASKMOVDQU — Copy Selected Bytes
    20. Exercises
  6. 4. Bit Mangling
    1. Boolean Logical AND
      1. Pseudo Vec
        1. Logical Packed AND D=(AB)
        2. vmp_pand (x86) 32-bit
        3. vmp_pand (x86) 64-bit
        4. vmp_pand (MMX)
        5. vmp_pand (SSE2) Aligned Memory
        6. vmp_pand (SSE2) Unaligned Memory
    2. Boolean Logical OR
      1. Pseudo Vec
        1. Logical Packed OR D=(AB)
    3. Boolean Logical XOR (Exclusive OR)
      1. Pseudo Vec
        1. Logical Packed XOR D=(AB)
      2. NOT — One's Complement Negation
      3. NEG — Two's Complement Negation
      4. ToolBox Snippet — The Butterfly Switch
      5. I-VU-Q
    4. Boolean Logical ANDC
      1. Pseudo Vec
        1. Logical Packed ANDC D=(AB′)
    5. Exercises
  7. 5. Bit Wrangling
    1. Logical Left Shifting
      1. SHL/SAL - Shift (Logical/Arithmetic) Left
      2. SHLD — Shift (Logical) Left (Double)
      3. PSLLx — Parallel Shift Left (Logical)
      4. Pseudo Vec
        1. Packed Shift Left Logical 16×8-bit by n:{0...7}
      5. Pseudo Vec (x86)
        1. vmp_psllB (MMX) 16×8-bit Vector
    2. Logical Right Shifting
      1. SHR — Shift (Logical) Right
      2. SHRD — Shift (Logical) Right (Double)
      3. PSRLx — Parallel Shift Right (Logical)
      4. Pseudo Vec
        1. Packed Shift Right Logical 16×8-bit by n:{0...7}
    3. Arithmetic Right Shifting
      1. SAR — Shift (Arithmetic) Right
      2. PSRAx — Packed Shift Right (Arithmetic)
      3. Pseudo Vec
    4. Rotate Left (or n-Right)
      1. ROL — Rotate Left
      2. RCL — Rotate Carry Left
    5. Rotate Right
      1. ROR — Rotate Right
      2. RCR — Rotate Carry Right
    6. Bit Scanning
      1. BSF — Bit Scan Forward
      2. BSR — Bit Scan Reverse
      3. ToolBox Snippet — Get Bit Count
      4. Graphics 101 — Palette Bits
    7. Exercises
  8. 6. Data Conversion
    1. Data Interlacing, Exchanging, Unpacking, and Merging
    2. Byte Swapping
      1. Little-Endian
      2. (Big/Little)-Endian to (Big/Little)-Endian Data Relationship Macros
      3. BSWAP — Byte Swap
      4. PSWAPD — Packed Swap Double Word
    3. Data Interlacing
      1. PUNPCKLBW — Parallel Extend Lower from Byte
      2. PUNPCKHBW — Parallel Extend Upper from Byte
      3. PUNPCKLWD — Parallel Extend Lower from 16-Bit
      4. PUNPCKHWD — Parallel Extend Upper from 16-Bit
      5. PUNPCKLDQ — Parallel Extend Lower from 32-Bit
        1. Also: (Unpack and Interleave Low Packed SPFP)
      6. PUNPCKHDQ — Parallel Extend Upper from 32-Bit
        1. ALSO: (Unpack and Interleave High Packed SPFP)
      7. MOVSS — Move Scalar (SPFP)
      8. MOVQ2DQ — Move Scalar (1×32-Bit) MMX to XMM
      9. MOVDQ2Q — Move Scalar (1×32-bit) XMM to MMX
      10. MOVLPS — Move Low Packed (2×SPFP)
      11. MOVHPS — Move High Packed (2×SPFP)
      12. MOVLHPS — Move Low to High Packed (2×SPFP)
      13. MOVHLPS — Move High to Low Packed (2×SPFP)
      14. MOVSD — Move Scalar (1×DPFP)
      15. MOVLPD — Move Low Packed (1×DPFP)
      16. MOVHPD — Move High Packed (1×DPFP)
      17. PUNPCKLQDQ — Parallel Copy Lower (2×64-Bit)
        1. Also: (Unpack and Interleave Low Packed Double-Precision Floating-Point Values)
      18. PUNPCKHQDQ — Parallel Copy Upper (2×64-Bit)
        1. Also: (Unpack and Interleave High Packed Double-Precision Floating-Point Values)
    4. Swizzle, Shuffle, and Splat
      1. PINSRW — Shuffle (1×16-Bit) to (4×16-Bit)
      2. PSHUFW — Shuffle Packed Words (4×16-Bit)
      3. PSHUFLW — Shuffle Packed Low Words (4×16-Bit)
      4. PSHUFHW — Shuffle Packed High Words (4×16-Bit)
      5. PSHUFD — Shuffle Packed Double Words (4×32-Bit)
      6. SHUFPS — Shuffle Packed SPFP Values (4×SPFP)
      7. MOVSLDUP — Splat Packed Even SPFP to (4×SPFP)
      8. MOVSHDUP — Splat Packed Odd SPFP to (4×SPFP)
      9. MOVDDUP — Splat Lower DPFP to Packed (2×DPFP)
      10. SHUFPD — Shuffle Packed DPFP (2×64-Bit)
    5. Data Bit Expansion
      1. CBW Convert Signed AL (Byte) to AX (Word)
      2. CWDE Convert Signed AX (Word) to EAX (DWord)
      3. CDQE Convert Signed EAX (DWord) to RAX (QWord)
      4. MOVSX/MOVSXD — Move with Sign Extension
      5. MOVZX — Move with Zero Extension
      6. CWD — Convert Signed AX (Word) to DX:AX
      7. CDQ — Convert Signed EAX (DWord) to EDX:EAX
      8. CQO — Convert Signed RAX (QWord) to RDX:RAX
      9. PEXTRW — Extract (4×16-bit) into Integer to (1×16)
    6. Data Bit Reduction (with Saturation)
      1. PACKSSWB — Packed Signed int16 to int8 with Saturation
      2. PACKUSWB — Packed uint16 to uint8 with Saturation
      3. PACKSSDW — Packed int32 to int16 with Saturation
    7. Data Conversion (Integer : Float, Float : Integer, Float : Float)
      1. PI2FW — Convert Packed Even int16 to SPFP
      2. CVTDQ2PS — Convert Packed int32 to SPFP
      3. CVTPS2DQ — Convert Packed SPFP to int32
      4. CVTPI2PS — Convert Lo Packed int32 to SPFP
      5. CVTPS2PI — Convert Lo Packed SPFP to int32
      6. CVTSI2SS — Convert Scalar int32 to SPFP
      7. CVTDQ2PD — Convert Even Packed int32 to DPFP
      8. CVTPD2DQ — Convert Packed DPFP to Even int32
      9. CVTPD2PS — Convert Packed DPFP to Lo SPFP
      10. CVTPS2PD — Convert Lo Packed SPFP to DPFP
      11. CVTPD2PI — Convert Packed DPFP to int32
      12. CVTPI2PD — Convert Packed int32 to DPFP
      13. CVTSS2SI — Convert Scalar SPFP to int32/64
      14. CVTSD2SI — Convert Scalar DPFP to Int
      15. CVTSI2SD — Convert Scalar Int to DPFP
      16. CVTSD2SS — Convert Scalar DPFP to SPFP
      17. CVTSS2SD — Convert Scalar SPFP to DPFP
    8. Exercises
  9. 7. Interger Math
    1. General Integer Math
      1. ADD — Add
      2. ADC — Add with Carry
      3. INC — Increment by 1
        1. Thread-Safe Increment
      4. XADD — Exchange and Add
      5. SUB — Subtract
      6. SBB — Subtract with Borrow
      7. DEC — Decrement by 1
        1. Thread-Safe Decrement
    2. Packed Addition and Subtraction
      1. PADDB/PADDW/PADDD/PADDQ Integer Addition
      2. Vector_{8/16/32/64}-Bit_Int_Addition_with_Saturation
      3. PSUBB/PSUBW/PSUBD/PSUBQ Integer Subtraction
      4. Vector {8/16/32/64}-Bit Integer Subtraction with Saturation
    3. Vector Addition and Subtraction (Fixed Point)
      1. Pseudo Vec
      2. Pseudo Vec (x86)
        1. vmp_paddB (MMX) 16×8-Bit
        2. vmp_paddB (SSE2) 16×8-Bit
    4. Averages
      1. PAVGB/PAVGUSB — N×8-Bit [Un]signed Integer Average
      2. PAVGW — N×16-Bit [Un]signed Integer Average
    5. Sum of Absolute Differences
      1. PSADBW — N×8-Bit Sum of Absolute Differences
    6. Integer Multiplication
      1. MUL — Unsigned Muliplication (Scalar)
      2. IMUL — Signed Multiplication (Scalar)
    7. Packed Integer Multiplication
      1. PMULLW — N×16-Bit Parallel Multiplication (Lower)
      2. PMULHW/PMULHUW — N×16-Bit Parallel Multiplication (Upper)
      3. PMULHRW — Signed 4×16-Bit Multiplication with Rounding (Upper)
      4. Pseudo Vec (x86)
        1. (MMX Mul Low) 4×16-Bit
        2. (SSE2 Mul Low) 8×16-Bit
      5. PMULUDQ — Unsigned N×32-Bit Multiply Even
      6. PMADDWD — Signed N×16-Bit Parallel Multiplication — ADD
    8. Integer Division
      1. DIV — Unsigned Division
      2. IDIV — Signed Division
    9. Exercises
  10. 8. Floating-Point Anyone?
    1. The Floating-Point Number
      1. FPU Registers
    2. Loading/Storing Numbers and the FPU Stack
      1. FLD — Floating-Point Load
      2. FST/FSTP — FPU Floating-Point Save
      3. FILD — FPU Integer Load
      4. FIST/FISTP/FISTTP — FPU Integer Save
      5. FPU Constants
      6. FXCH
      7. FINCSTP — FPU Increment Stack Pointer
      8. FDECSTP — FPU Decrement Stack Pointer
      9. FWAIT/WAIT
      10. EMMS/FEMMS
      11. FNOP
    3. General Math Instructions
      1. FCHS — FPU Two's Complement ST(0) = – ST(0)
      2. FABS — FPU Absolute Value ST(0) = |ST(0)|
      3. FADD/FADDP/FIADD — FPU Addition D = ST(0) + A
      4. FSUB/FSUBP/FISUB — FPU Subtraction D = ST(0) – A
      5. FSUBR/FSUBRP/FISUBR — FPU Reverse Subtraction D = A – ST(0)
      6. FMUL/FMULP/FIMUL — FPU Multiplication D = ST(0) × A
      7. FDIV/FDIVP/FIDIV — FPU Division D = Dst ÷ Src
      8. FDIVR/FDIVRP/FIDIVR — FPU Reverse Division D = Src ÷ Dst
      9. FPREM — FPU Partial Remainder
      10. FPREM1 — FPU Partial Remainder
      11. FRNDINT — FPU Round to Integer
    4. Advanced Math Instructions
      1. FSQRT — FPU ST(0) Square Root
      2. FSCALE — FPU Scale ST(0) = ST(0) << ST(1)
      3. F2XM1 — FPU ST(0) = 2ST(0) – 1
      4. FYL2X — FPU ST(0) = y log2x
      5. FYL2XP1 — FPU ST(0) = y log2(x+1)
      6. FXTRACT — FPU Extract Exponent and Significand
    5. Floating-Point Comparison
      1. FTST — FPU Test If Zero
      2. FCOM/FCOMP/FCOMPP — FPU Unordered CMP FP
      3. FUCOM/FUCOMP/FUCOMPP — FPU Unordered CMP FP
      4. FCOMI/FCOMIP/FUCOMI/FUCOMIP — FPU A ? B and EFLAGS
      5. FICOM/FICOMP — FPU A ? B
      6. FCMOVcc — FPU Conditional Move
      7. FXAM — FPU Examine
    6. FPU BCD (Binary-Coded Decimal)
      1. FBLD — FPU (BCD Load)
      2. FBSTP — FPU (BCD Save and Pop ST(0))
    7. FPU Trigonometry
      1. FPTAN — FPU Partial Tangent
      2. FPATAN — FPU Partial Arctangent
      3. FSINCOS — Sine and Cosine
      4. Pseudo Vec
      5. Pseudo Vec (x86)
        1. AMD-SDK
        2. 3DSMax-SDK
      6. FSIN — FPU Sine
      7. FCOS — FPU Cosine
      8. FSINCOS — FPU Sine and Cosine
        1. vmp_SinCos (3DNow!)
    8. FPU System Instructions
      1. FINIT/FNINIT — FPU Init
      2. FCLEX/FNCLEX — FPU Clear Exceptions
      3. FFREE — FPU Free FP Register
      4. FSAVE/FNSAVE — FPU Save X87 FPU, MMX, SSE, SSE2
      5. FRSTOR — FPU Restore x87 State
      6. FXSAVE — FPU Save x87 FPU, MMX, SSE, SSE2, SSE3
      7. FXRSTOR — FPU Restore x87 FPU, MMX, SSE, SSE2, SSE3
      8. FSTENV/FNSTENV — FPU Store x87 Environment
      9. FLDENV — FPU Load x87 Environment
      10. FSTCW/FNSTCW — FPU Store x87 Control Word
      11. FLDCW — FPU Load x87 Control Word
      12. FSTSW/FNSTSW — FPU Store x87 Status Word
    9. Validating (Invalid) Floating-Point
    10. Exercises
  11. 9. Comparison
    1. TEST — Logical Compare A B
    2. Indexed Bit Testing
      1. BT — Bit Test
      2. BTC — Bit Test and Complement
      3. BTR — Bit Test and Reset (Clear)
      4. BTS — Bit Test and Set
    3. SETcc — Set Byte on Condition
    4. Comparing Operands and Setting EFLAGS
      1. CMP — Compare Two Operands
      2. COMISS — Compare Scalar SPFP, Set EFLAGS
      3. COMISD — Compare Scalar DPFP, Set EFLAGS
      4. UCOMISS — Unordered Cmp Scalar SPFP, Set EFLAGS
      5. UCOMISD — Unordered Cmp Scalar DPFP, Set EFLAGS
      6. CMPSB/CMPSW/CMPSD/CMPSQ — Compare String Operands
    5. CMP — Packed Comparison
      1. CMPPS/CMPSS/CMPPD/CMPSD —Floating-Point
      2. Packed Compare if Equal to (=)
      3. Packed Compare if Greater Than or Equal (≥)
      4. Packed Compare if Greater Than (>)
    6. Extract Packed Sign Masks
      1. PMOVMSKB — Extract Packed Byte (Sign) Mask
      2. MOVMSKPS — Extract Packed SPFP Sign Mask
      3. MOVMSKPD — Extract Packed DPFP Sign Mask
    7. SCAS/SCASB/SCASW/SCASD/SCASQ —Scan String
      1. REP SCASx
    8. CMOVcc — Conditional Move
    9. CMPXCHG — Compare and Exchange
      1. CMPXCHG8B — Compare and Exchange 64 Bits
      2. CMPXCHG16B — Compare and Exchange 128 Bits
    10. Boolean Operations upon Floating-Point Numbers
      1. ANDPS — Logical AND of Packed SPFP D = A B
      2. ANDPD — Logical AND of Packed DPFP
      3. Pseudo Vec — (XMM) FABS — FP Absolute A = | A |
      4. Pseudo Vec — (3DNow!) FABS — FP Absolute A = | A |
      5. ORPS — Logical OR of Packed SPFP D = A B
      6. ORPD — Logical OR of Packed DPFP
      7. XORPS — Logical XOR of Packed SPFP D = A B
      8. XORPD — Logical XOR of Packed DPFP
      9. Pseudo Vec — FCHS — FP Change Sign A = – A
        1. Pseudo Vec (XMM)
        2. Pseudo Vec — (3DNow!)
      10. ANDNPS — Logical ANDC of Packed SPFP D = A (¬B)
      11. ANDNPD — Logical ANDC of Packed DPFP
    11. Min — Minimum
      1. Pseudo Vec
      2. N×8-Bit Integer Minimum
      3. N×16-Bit Integer Minimum
      4. N×SPFP Minimum
      5. 1×SPFP Scalar Minimum
      6. 2×DPFP Minimum
      7. 1×DPFP Scalar Minimum
    12. Max — Maximum
      1. N×8-Bit Integer Maximum
      2. N×16-Bit Integer Maximum
      3. N×SPFP Maximum
      4. 1×SPFP Scalar Maximum
      5. 2×DPFP Maximum
      6. 1×DPFP Scalar Maximum
  12. 10. Branching
    1. Jump Unconditionally
      1. JMP — Jump
        1. Delta JMP
        2. Protected Mode JMP (NEAR)
        3. Protected Mode JMP (FAR)
        4. Real Mode JMP (NEAR)
        5. Real Mode JMP (FAR)
      2. Delta JMP
      3. Protected Mode JMP (NEAR)
      4. Protected Mode JMP (FAR)
      5. Real Mode — NEAR or FAR has Same Opcodes
      6. Protected Mode — NEAR or FAR has Same Opcodes
      7. Real Mode — NEAR or FAR has Same Opcodes
      8. Protected Mode — NEAR or FAR has Same Opcodes
    2. Jump Conditionally
      1. Jcc — Branching
      2. Delta JMP
    3. Branch Prediction
      1. Intel Branch Prediction
      2. Static Branch Prediction
        1. Back-Branch-Taken
        2. Forward-Branch-Not-Taken
        3. Branching Hints
      3. AMD Branch Prediction
      4. Branch Optimization
    4. PAUSE — (Spin Loop Hint)
      1. I-VU-Q
      2. JECXZ/JCXZ — Jump if ECX/CX Is Zero
      3. LOOPcc
      4. LOOP
    5. Pancake Memory LIFO Queue
    6. Stack
      1. PUSH — Push Value onto Stack
      2. POP — Pop Value off Stack
      3. PUSHA/PUSHAD — Push All General-Purpose Registers
      4. POPA/POPAD — Pop All General-Purpose Registers
      5. PUSHFD/PUSHFQ and POPFD/POPFQ
      6. ENTER — Allocate Stack Frame for Procedure ARGS
      7. LEAVE — Deallocate Stack Frame of Procedure ARGS
    7. CALL Procedure (Function)
      1. CALL
        1. Delta CALL
        2. Protected Mode CALL (NEAR)
        3. Protected Mode CALL (FAR)
        4. Real Mode CALL (FAR)
      2. Protected Mode CALL (NEAR)
      3. Protected Mode CALL (FAR)
      4. Real Mode — NEAR or FAR has Same Opcodes
      5. Protected Mode — NEAR or FAR has Same Opcodes
      6. Real Mode — NEAR or FAR has Same Opcodes
      7. Protected Mode — NEAR or FAR has Same Opcodes
      8. RET/RETF — Return
    8. Calling Conventions (Stack Argument Methods)
      1. C Declaration (_CDECL)
      2. Standard Declaration (_STDCALL)
      3. Fast Call Declaration (_FASTCALL)
    9. Interrupt Handling
      1. INT/INTO - Call Interrupt Procedure
      2. IRET/IRETD/IRETQ — Interrupt Return
      3. CLI/STI — Clear (Reset)/Set Interrupt Flag
  13. 11. Branchless
    1. Function y=ABS(x) 'Absolute' D = | A |
    2. Function y=MIN(p, q) 'Minimum'
    3. Function y=MAX(p, q) 'Maximum'
      1. Graphics 101 — Quick 2D Distance
  14. 12. Floating-Point Vector Addition and Subtraction
    1. Floating-Point Vector Addition and Subtraction
      1. Vector Floating-Point Addition
      2. Vector Floating-Point Addition with Scalar
      3. Vector Floating-Point Subtraction
      4. Vector Floating-Point Subtraction with Scalar
      5. Vector Floating-Point Reverse Subtraction
      6. Pseudo Vec
        1. Single-Precision Float Addition
        2. Single-Precision Float Subtraction
        3. Single-Precision Vector Float Addition
        4. Single-Precision Vector Float Subtraction
        5. Single-Precision Quad Vector Float Addition
        6. Single-Precision Quad Vector Float Subtraction
      7. Pseudo Vec (×86)
        1. vmp_VecAdd (3DNow!)
        2. vmp_VecSub (3DNow!)
        3. vmp_QVecAdd (3DNow!)
        4. vmp_VecAdd (SSE) Unaligned
        5. vmp_VecAdd (SSE) Aligned
        6. vmp_QVecAdd (SSE) Aligned
    2. Vector Scalar Addition and Subtraction
      1. Single-Precision Quad Vector Float Scalar Addition
      2. Single-Precision Quad Vector Float Scalar Subtraction
    3. Special — FP Vector Addition and Subtraction
      1. Vector Floating-Point Addition and Subtraction
      2. HADDPS/HADDPD/PFACC — Vector Floating-Point Horizontal Addition
      3. HSUBPS/HSUBPD/PFNACC — Vector Floating-Point Horizontal Subtraction
      4. PFPNACC — Vector Floating-Point Horizontal Add/Sub
    4. Exercises
  15. 13. FP Vector Multiplication and Division
    1. Floating-Point Multiplication
      1. Vector Floating-Point Multiplication
      2. (Semi-Vector) DPFP Multiplication
      3. SPFP Scalar Multiplication
      4. DPFP Scalar Multiplication
      5. Vector (Float) Multiplication — ADD
      6. Pseudo Vec
        1. Single-Precision Float Multiplication
        2. Single-Precision Vector Float Multiplication
        3. Single-Precision Quad Vector Float Multiplication
        4. Single-Precision Quad Vector Float Multiplication-Add
      7. Pseudo Vec (x86)
        1. vmp_VecMul (3DNow!)
        2. vmp_QVecMul (3DNow!)
        3. vmp_QVecMAdd (3DNow!)
        4. vmp_VecMul (SSE)
        5. vmp_VecMul (SSE) Aligned
        6. vmp_QVecMul (SSE) Aligned
        7. vmp_QVecMAdd (SSE) Aligned
    2. Vector Scalar Multiplication
      1. Pseudo Vec
        1. Single-Precision Vector Float Multiplication with Scalar
        2. Single-Precision Quad Vector Float Multiplication with Scalar
      2. Pseudo Vec (x86)
        1. vmp_VecScale (3DNow!)
        2. vmp_VecScale (SSE) Aligned
        3. vmp_QVecScale (SSE) Aligned
      3. I-VU-Q
      4. Graphics 101 — Dot Product
      5. Pseudo Vec
        1. Single-Precision Dot Product
      6. Pseudo Vec (x86)
        1. vmp_DotProduct (3DNow!)
        2. vmp_DotProduct (SSE) Aligned
      7. Graphics 101 — Cross Product
        1. vmp_CrossProduct (3DNow!)
        2. vmp_CrossProduct (SSE) Aligned
    3. Vector Floating-Point Division
      1. (Vector) SPFP Division
      2. (Semi-Vector) DPFP Division
      3. SPFP Scalar Division
      4. DPFP Scalar Division
      5. N×SPFP Reciprocal
      6. 1×SPFP Reciprocal (14-Bit)
        1. vmp_FDiv (3DNow!) Fast Float Division 14-Bit Precision
      7. SPFP Reciprocal (2 Stage) (24-Bit)
        1. vmp_FDiv (3DNow!) Standard Float Division 24-Bit Precision
        2. vmp_FDiv (SSE) Standard Float Division 24-Bit Precision
      8. Pseudo Vec
        1. Single-Precision Vector Float Scalar Division
      9. Pseudo Vec (x86)
        1. vmp_QVecDiv (3DNow!) Fast Quad Float Division 14-Bit Precision
        2. vmp_QVecDiv (3DNow!) Standard Quad Float Division 24-Bit Precision
        3. vmp_QVecDiv (SSE) Standard Quad Float Division 24-Bit Precision
    4. Exercises
  16. 14. Floating-Point Deux
    1. SQRT — Square Root
      1. 1×SPFP Scalar Square Root
      2. 4×SPFP Square Root
      3. 1×DPFP Scalar Square Root
      4. 2×DPFP Square Root
      5. 1×SPFP Scalar Reciprocal Square Root (15-Bit)
      6. Pseudo Vec
        1. (Float) Square Root
      7. Pseudo Vec (x86)
        1. vmp_FSqrt (3DNow!) Fast Float 15-Bit Precision
      8. SPFP Square Root (2 Stage) (24-Bit)
      9. vmp_FSqrt (3DNow!) Standard Float 24-Bit Precision
        1. vmp_FSqrt (SSE) Float Sqrt 24-Bit Precision
        2. Vector Square Root
      10. Pseudo Vec
        1. Vector Square Root
        2. Quad Vector Square Root
      11. Pseudo Vec (x86)
        1. vmp_QVecSqrt (3DNow!) Fast Quad Float SQRT 15-Bit Precision
        2. vmp_QVecSqrt (3DNow!) Quad Float Sqrt 24-Bit Precision
        3. vmp_QVecSqrt (SSE) Float Sqrt 24-Bit Precision
        4. vmp_QVecSqrtFast (SSE) Float Sqrt Approximate
      12. Graphics 101 — Vector Magnitude (aka 3D Pythagorean Theorem)
      13. Pseudo Vec
      14. Pseudo Vec (x86)
        1. vmp_VecMagnitude (3DNow!)
        2. vmp_VecMagnitude (SSE) Aligned
    2. Vector Normalize
      1. Pseudo Vec
      2. Pseudo Vec (x86)
        1. vmp_VecNormalize (3DNow!)
        2. vmp_VecNormalize (SSE) Aligned
  17. 15. Binary-Coded Decimal (BCD)
    1. BCD
      1. DAA — Decimal Adjust AL (After) Addition
      2. DAS — Decimal Adjust AL (After) Subtraction
      3. AAA — ASCII Adjust (After) Addition
      4. AAS — ASCII Adjust AL (After) Subtraction
      5. AAM — ASCII Adjust AX (After) Multiplication
      6. AAD — ASCII Adjust AX (Before) Division
      7. FBLD — FPU (BCD Load)
    2. Graphics 101
      1. ASCII String to Double-Precision Float
      2. ASCII to Double
  18. 16. What CPUID?
    1. CPUID
      1. Standard CPUID EDX-Feature Flags
      2. Intel — Standard CPUID ECX-Feature Flags
      3. Intel — Extended #1 CPUID EDX-Feature Flags
      4. AMD — Extended #1 CPUID EDX-Feature Flags
    2. PIII Serial License
    3. Sample CPU Detection Code
      1. x86 CPU Detect — Bit Flags
      2. x86 CPU Detect — Vendors
      3. Cpu Detect — Information
  19. 17. PC I/O
    1. IN — Input from Port
      1. Vertical Sync
    2. OUT — Output to Port
    3. INSx — Input from Port to String
    4. OUTSx — Output String to Port
      1. Serial/Parallel Port for IBM PC
      2. Parallel Port
      3. Parallel Port Dip Switches
      4. Serial Port
  20. 18. System
    1. System "Lite"
    2. System Timing Instructions
      1. RDPMC — Read Performance — Monitoring Counters
      2. RDTSC — Read Time-Stamp Counter
      3. Calculating Processor Speed
      4. 80×86 Architecture
      5. CPU Status Registers (32-Bit EFLAGS/64-Bit RFLAGS)
      6. Protection Rings
      7. Control Registers
        1. (TPR) Task Priority Registers — (CR8)
      8. Debug Registers
    3. Cache Manipulation
      1. Cache Sizes
      2. Cache Line Sizes
      3. PREFETCHx — Prefetch Data into Caches
      4. LFENCE — Load Fence
      5. SFENCE — Store Fence
      6. MFENCE — Memory Fence
      7. CLFLUSH — Flush Cache Line
      8. INVD — Invalidate Cache (WO/Writeback)
      9. WBINVD — Write Back and Invalidate Cache
    4. System Instructions
      1. ARPL — Adjust Requested Privilege Level
      2. BOUND — Check Array Index For Bounding Error
      3. CLTS — Clear Task Switch Flag
      4. HLT — Halt Processor
      5. UD2 — Undefined Instruction
      6. INVLPG — Invalidate TLB
      7. LAR — Load Access Rights
      8. LOCK — Assert Lock # Signal Prefix
      9. LSL — Load Segment Limit
      10. MOV — Move To/From Control Registers
      11. MOV — Move To/From Debug Registers
      12. STMXCSR — Save MXCSR Register State
      13. LDMXCSR — Load MXCSR Register State
      14. SGDT/SIDT — Save Global/Interrupt Descriptor Table
      15. LGDT/LIDT — Load Global/Interrupt Descriptor Table
      16. SLDT — Save Local Descriptor Table
      17. LLDT — Load Local Descriptor Table
      18. SMSW — Save Machine Status Word
      19. LMSW — Load Machine Status Word
      20. STR — Save Task Register
      21. LTR — Load Task Register
      22. RDMSR — Read from Model Specific Register
      23. WRMSR — Write to Model Specific Register
      24. SWAPGS — Swap GS Base Register
      25. SYSCALL — 64-Bit Fast System Call
      26. SYSRET — Fast Return from 64-Bit Fast System Call
      27. SYSENTER — Fast System Call
      28. SYSEXIT — Fast Return from Fast System Call
      29. RSM — Resume from System Management Mode
      30. VERR/VERW — Verify Segment for Reading
      31. LDS/LES/LFS/LGS/LSS — Load Far Pointer
    5. Hyperthreading Instructions
      1. MONITOR — Monitor
      2. MWAIT — Wait
  21. 19. Gfx 'R' Asm
    1. Setting Memory
    2. Copying Memory
    3. Speed Freak
    4. Graphics 101 — Frame Buffer
    5. Graphics 101 — Blit
      1. Copy Blit
      2. Transparent Blit
    6. Graphics 101 — Blit (MMX)
      1. Graphics Engine — Sprite Layered
      2. Graphics Engine — Sprite Overlay
    7. Graphics 101 — Clipping Blit
  22. 20. MASM vs. NASM vs. TASM vs. WASM
    1. MASM — Microsoft Macro Assembler
      1. REPEAT
      2. WHILE
      3. FOR
    2. Compiler Intrinsics
  23. 21. Debugging Functions
    1. Guidelines of Assembly Development
    2. Visual C++
    3. Tuning and Optimization
    4. Exception Handling — AKA: Dang that 1.#QNAN
      1. Exceptions
        1. Hardware Exception
        2. Software Exception
      2. FPU Versus MMX
    5. Print Output
      1. Float Array Print
      2. Vector Print
      3. Quad Vector Print
      4. Quaternion Print
      5. Matrix Print
      6. Memory Dump
    6. Test Jigs
      1. Matrix Test Fill
      2. Matrix Splat
  24. 22. Epilogue
  25. A. Data Structure Definitions
  26. B. Mnemonics
  27. C. Reg/Mem Mapping
  28. Glossary
  29. References
    1. ASE File Format
    2. Game Development Links
    3. MPEG
    4. SHA-1
    5. Personal Computer