processor, also called the
Central Processing Unit), is the brain of the
PC. It performs all general computing tasks and coordinates tasks
done by memory, video, disk storage, and other system components. The
CPU is a very complex chip that resides directly on the motherboard
of most PCs, but may sometimes reside on a daughtercard that connects
to the motherboard via a dedicated specialized slot.
A processor executes programs—including the operating system
itself and the user applications—all of which perform useful
work. From the processor’s point of view, a program
is simply a group of low-level instructions that it executes more or
less in sequence as it receives them. How efficiently and effectively
the processor executes instructions is determined by its internal
design, also called its
architecture. The CPU
architecture, in conjunction with CPU speed, determines how fast the
CPU executes instructions of various types. The external design of
the processor, specifically its external interfaces, determines how
fast it communicates information back and forth with external cache,
main memory, the chipset, and other system components.
attempts to guess where the program
will jump (branch) next, allowing the
Prefetch and Decode
to retrieve instructions and data in
advance so they will already be available when the CPU requests them.
cache is a small amount of very fast memory that allows
the CPU to retrieve data immediately, rather than waiting for slower
main memory to respond. See Chapter 5.
are the pathways that connect the processor to memory and other
components. For example, modern processors connect to memory via a
dedicated bus called the
bus (FSB) or host bus.
clock coordinates all CPU and memory operations
by periodically generating a time reference signal called a
. Clock frequency is specified in
(MHz), which specifies millions of ticks
per second, or
(GHz), which specifies billions of ticks
per second. Clock speed determines how fast
instructions execute. Some instructions require one tick, others
multiple ticks, and some processors execute multiple instructions
during one tick. Ticks per instruction varies according to
processor architecture, its
instruction set, and the specific
Complex Instruction Set
processors use complex instructions. Each requires many clock cycles
to execute, but accomplishes a lot of work.
Instruction Set Computer
(RISC) processors use
fewer, simpler instructions. Each takes few ticks but accomplishes
relatively little work.
These differences in efficiency mean that one CPU cannot be directly compared with another purely on clock speed. A 1.4 GHz AMD Athlon, for example, may be faster than a 1.7 GHz Intel Pentium 4, depending on the application. The comparison is complicated because different CPUs have different strengths and weaknesses. For example, the Athlon is generally faster than the Pentium 4 clock-for-clock on both integer and floating-point operations (that is, it does more work per CPU tick), but the Pentium 4 has an extended instruction set that may allow it to run optimized software literally twice as fast as the Athlon. The only safe use of direct clock speed comparisons is within a single family. A 1.2 GHz Tualatin-core Pentium III, for example, is roughly 20% faster than a 1.0 GHz Tualatin-core Pentium III, but even there the relationship is not absolutely linear. And a 1.2 GHz Tualatin-core Pentium III is more than 20% faster than a 1.0 GHz Pentium III that uses the older Coppermine core. Also, even within a family, processors with similar names may substantially differ internally.
Clock speeds increase every year, but the laws of physics limit how fast CPUs can run. If designers depended only on faster clock speeds for better performance, CPU performance would have hit the wall years ago. Instead, designers have improved internal architectures while also increasing clock speeds. Recent CPUs run at 500 times the clock speed of the PC/XT’s 8088, but provide 5,000 or more times the performance. Here are some major architectural improvements that have allowed CPUs to continue to get faster every year:
For a given clock speed, the
amount of work done depends on the amount of data processed in one
operation. Early CPUs processed data in 4-bit
nibble) or 8-bit (
chunks, whereas current CPUs process 32 or 64 bits per operation.
All CPUs work well with integers, but processing floating-point numbers to high precision on a general-purpose CPU requires a huge number of operations. All modern CPUs include a dedicated FPU that handles floating-point operations efficiently.
Early CPUs took five ticks to process an instruction—one each
to load the instruction, decode it, retrieve the data, execute the
instruction, and write the result. Modern CPUs use
which dedicates a separate stage to each process and allows one full
instruction to be executed per clock cycle.
If one pipeline is good, more are better. Using multiple pipelines
allows multiple instructions to be processed in parallel, an
. A superscalar
processor processes multiple instructions per tick.