Talk:Cache prefetching

Formatting feedback
I'm leading some feedback here to not run into an edit conflict:

--Gryllida (talk) 00:58, 20 October 2016 (UTC)
 * 1) Titles are sentence case. Section titles are too.
 * 2) Images can be right-aligned instead of centering or inline using the "|thumb" argument in the file directive. Like illustrated here. CachePrefetching_StreamBuffers.png
 * 3) Image captions:
 * 4) We don't have to put references in image captions because the discussion in text already has a reference. Naming the author is usually enough. (The image source is listed at the file page.)
 * 5) We complete image captions. "X as originally proposed" isn't complete; "X as originally proposed by Y in year Z" is.
 * 6) Equations usually are not bulleted; add a colon and it will look like this:
 * $$\frac{1}{2}=0.5$$.
 * 1) We have a lot of useful references here, which is good. Thanks!

Instruction prefetch, cache prefetching, and prefetch input queue
Should instruction prefetch redirect to prefetch input queue rather than to here? Or should prefetch input queue, in turn, also be merged into here? The "prefetch input queue" is a small instruction cache, so it could be argued that it belongs in here, but it's a smaller and much more specialized type of cache than a typical CPU cache, although CPU cache mentions the 68010's "loop cache", which is another small specialized instruction cache. Guy Harris (talk) 21:58, 4 August 2017 (UTC)


 * Prefetch input queue looks like it has enough content to exist as a separate article. It isn't entirely clear to me if it is only about caching or also CPU features around that as well (is the PIQ the L1 instruction cache? The micro-op cache?), so I'm not really certain what the best structure is. Having instruction prefetch redirect to prefetch input queue seems reasonable to me though. Wingedsubmariner (talk) 23:41, 4 August 2017 (UTC)


 * The 8086 and 8088 had something that the 1979 8086 Family User's Manual calls the "instruction queue", in the Bus Interface Unit. The description is:


 * "In addition, during periods when the EU is busy executing instructions, the BIU "looks ahead" and fetches more instructions from memory. The instructions are stored in an internal RAM array called the instruction stream queue. The 8088 instruction queue holds up to four bytes of the instruction stream, while the 8086 queue can store up to six instruction bytes. These queue sizes allow the BIU to keep the EU [Execution Unit - gh] supplied with pre-fetched instructions under most conditions without monopolizing the system bus.  The 8088 BIU fetches another instruction byte whenever one byte in its queue is empty and there is no active request for bus access from the EU.  The 8086 BIU operates similarly except that it does not initiate a fetch until there are two empty bytes in its queue.  The 8086 BIU normally obtains two instruction bytes per fetch; if a program transfer forces fetching from an odd address, the 8086 BIU automatically reads one byte from the odd address and then resumes fetching two-byte words from the subsequent even addresses."


 * It later says


 * "When a program transfer occurs, the queue no longer contains the correct instruction, and the BIU obtains the next instruction from memory using the new IP and CS values, passes the instruction directly to the EU, and then begins refilling the queue from the new location."


 * which I infer means that it doesn't function as a loop cache, although an instruction with a REP prefix might not have to repeatedly fetch the instruction being repeated from memory, either because it's fetched from the instruction queue (which presumably stops prefetching if that would cause it to flush the repeated instruction from the queue) or because it's decoded once and repeated.


 * The 80286 has something that the 1987 80286 Hardware Reference Manual calls the "prefetch queue":


 * "When not performing other bus duties, the Bus Unit "looks ahead" and pre-fetches instruc- tions from memory. When prefetching, the Bus Unit assumes that program execution proceeds sequentially; that is, the next instruction follows the preceding one in memory. When the prefetcher reaches the limit of the code segment, it stops prefetching instructions. If a program transfer causes execution to continue from a new program location, the Bus Unit resets the queue and immediately begins fetching instructions from the new program location. The Bus Unit stores these instructions in a 6-byte prefetch queue to be used later by the Instruction Unit. By prefetching instructions, the BU eliminates the idle time that can occur when the CPU must wait for the next sequential instruction to be fetched from memory."


 * This is separate from the "instruction queue":


 * "The Instruction Unit (IU) receives instructions from the prefetch queue, decodes them, and places these fully-decoded instructions into a 3-deep instruction queue for use by the Execution Unit."


 * The 80386 is similar; see 2.2 "Code Prefetch Unit" and 2.3 "Instruction Decode Unit" in the 1986 80386 Hardware Reference Manual (which, interestingly enough, says that "The Instruction Decode Unit takes instruction stream bytes from the Prefetch Queue and translates them into microcode.", which sounds a bit as if this was a predecessor of the P6 "decode to uops" model - it doesn't sound like the usual "jumping to a given location in the processor's microcode", it sounds more like "generating the microcode on the fly", but I digress...).


 * The 80486 had an on-chip unified cache in addition to a prefetch queue. The 1989 i486 Microprocessor manual doesn't say much about those queues except to say the prefetch queue is 32 bytes rather than 16 bytes and that "A jump always needs to execute after modifying code to guarantee correct execution of the new instruction.", suggesting that a jump will flush the prefetch queue. I'm guessing, from stuff in the manual, that the prefetch buffer fetches from the on-chip cache, and that those prefetches will cause cache fetches on a cache miss.


 * The 80586^W(first-generation) Pentium has multiple "prefetch buffers", which I'm guessing are just renamed prefetch queues; some do linear prefetches and others do prefetches for predicted branches. The buffer appear to prefetch from the instruction cache, not from main memory, at least as I read "Prefetch Buffers" on page 31 of Pentium Processor System Architecture.  I'm guessing, perhaps incorrectly, that their prefetches will cause cache fetches on a cache miss, so I-cache prefetches are provoked by the prefetch buffers.


 * So those aren't like traditional instruction caches, they're more like prefetch buffers - the 8086 manual even calls it the "instruction stream queue" when it first refers to it, so it's not being spoken of as a random-access memory.


 * The article sounds as if it's mainly describing the x86 instruction queue/instruction stream queueu/prefetch queue/prefetch buffer. the 1983 68000 manual says that "The MC68000 uses a two-word tightly-coupled instruction prefetch mechanism to enhance performance.", but they don't give it a name such as "prefetch buffer".  The 1985 68010 manual says the same thing, but it adds a "loop mode" where a very small loop can execute entirely out of what it calls the "prefetch queue".


 * So that's a little bit more like an instruction cache than the x86 prefetch queue, but it's still not a full-blown I-cache. The 68020 has an on-chip I-cache; according to the 1988 M68000 Family Reference, "When the cache is enabled, the subsequent prefetch will find the next 16-bit instruction word is already present in the cache and the related bus cycle is saved.", so the prefetch queue fetches from the I-cache, just as it fetches from caches on the 80486 and Pentium.


 * I haven't checked any documentation for later x86 processors, or for various non-x86/non-68k processors (including pre-microprocessor-era processors), but I suspect the ones that man of the ones that do prefetching have a buffer of some sort, separate from the cache (especially on processors without any caches or microprocessors without any on-chip caches), so we should probably keep the prefetch input queue page, although it might deserve a rename if the most common name isn't "prefetch input queue" (there are non-Wikipedia sources using that term, such as this Intel patent), and should perhaps talk more about non-x86 processors.


 * As for instruction prefetch, any discussion of instruction prefetching probably shouldn't imply that a conventional CPU cache is always involved, as it's not involved in the older x86 and 68k processors. Perhaps "buffer" is a better term, although IBM called the CPU cache in the IBM System/360 Model 85 - which I think was the first system with what we think of as a "CPU cache" between the processor and main memory - a "buffer" arther than a "cache".


 * So this all needs a bit more discussion. Guy Harris (talk) 03:33, 5 August 2017 (UTC)


 * Wow, this is a lot of good research. Is there anything from more modern (or bigger) CPUs? It sounds like it is a kind of instruction caching (or buffering at least) with prefetching, but distinct from other systems for instruction prefetching. Wingedsubmariner (talk) 15:31, 8 August 2017 (UTC)