Store Forwarding Improved

Increased Number of Store Buffers

The number of Store Buffers has been increased from 24 to 32.

Improved Load/Store Scheduling

On the earlier versions of the processor as well as the 90nm version, a Store instruction is decoded into two, separate μops:

  • The Store Address μop which, when executed, produces the start memory address for the store operation.

  • The Store Data μop which, when executed, produces the data to be stored to memory.

These two μops are dispatched and executed simultaneously.

On the earlier versions of the Pentium® 4 processor, the Memory Scheduler made no attempt to relate a load to a store that the load was dependent on. As a result, the load could be dispatched before the store and could not be fulfilled ...

Get The Unabridged Pentium 4 IA32 Processor Genealogy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.