Store Forwarding Improved
Increased Number of Store Buffers
The number of Store Buffers has been increased from 24 to 32.
Improved Load/Store Scheduling
On the earlier versions of the processor as well as the 90nm version, a Store instruction is decoded into two, separate μops:
The Store Address μop which, when executed, produces the start memory address for the store operation.
The Store Data μop which, when executed, produces the data to be stored to memory.
These two μops are dispatched and executed simultaneously.
On the earlier versions of the Pentium® 4 processor, the Memory Scheduler made no attempt to relate a load to a store that the load was dependent on. As a result, the load could be dispatched before the store and could not be fulfilled ...
Get The Unabridged Pentium 4 IA32 Processor Genealogy now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.