Talk:Branch predictor

comment
It's not true that all pipelined processors have to do branch prediction -- the processor could just wait on the results of the conditional statement before resuming execution. I know of at least one modern microcontroller that does this. The second paragraph should be changed to say that branch prediction is just the best way to handle conditionals --Ryan Stone 18:08, 17 Nov 2004 (UTC)

Which microcontroller? Iain McClatchie 18:43, 17 Nov 2004 (UTC)

all kinds of branches, or just conditional branches ?
My understanding is that MIPS has a branch delay slot for *every* branch, even unconditional branches.


 * Yes, that's true. That's because irrespective of whether the branch is conditional or not, fetching the instruction at the target address takes some time. Another way the problem could be solved is to watch out for unconditional jump instructions right at the IF (instrn. fetch) stage instead of waiting for the ID (decode) stage & start fetching at the target address, but this increases the time taken by the IF stage & thus increases the time taken per instruction.

Are these "93.5%" numbers *just* for conditional branches, or do they also include unconditional branches ?


 * For unconditional branches, there's no use of predicting whether the branch is taken or not. They're always taken! Hence as far as the branch prediction logic goes, these should be treated as normal instructions.


 * A minor point about "unconditional branch" in MIPS. The `B ' "pseudo"-instruction is actually converted by the assembler to `BEQ $0, $0, '. Since register 0 is equal to register 0, the branch is "unconditional". A MIPS processor could just naïvely treat it as a conditional branch and do branch prediction, or it could decode the instruction specially & avoid branch prediction. Note that the "unconditional jump" instruction `J ' is a real unconditional instruction and not assembler-synthesised. BTW the difference between B and J is that for B (used for nearby branches) the target is specified as a relative offset from the current PC whereas for J (used for "long" jumps) the absolute address of the target is specified. -- Paddu 07:42, 20 Feb 2005 (UTC)


 * For unconditional branches, at least on x86 systems, the target still must be predicted. Things like a "switch" statement in a high level language always branch, but it is not clear to where.:


 * In what way does the third paragraph of the article not address this point? Iain McClatchie 03:48, 15 October 2005 (UTC)


 * The article addresses it fine, but the "these should be treated as normal instructions" comment above is wrong. Branch prediction doesn't have to be done after the instruction is decoded, it can be done on the address alone, in which case you must predict the taken despite the fact it is unconditional.

some comments
I think it isn't made clear early enough in the section that the counters are indexed by the PC.

It might be good to move the local/global/combined predictor sections into a subsection of their own for 2-level predictors.

Good references on 2-level branch predictors: A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History, by Yeh and Patt, 1993 (I've found it online as p257-yeh.pdf) Alternative Implementations of Two-Level Adaptive Branch Prediction, by Yeh and Patt, 1992 (p451-yeh.pdf)

A good (the original?) reference on global-history predictors: Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation, by Pan, So, and Rahmeh, 1992 (p76-pan.pdf)

A (the?) reference on gskew: Trading Conflict and Capacity Aliasing in Conditional Branch Predictors, by Michaud, Seznec and Uhlig, 1992 (p292-michaud.pdf)

I think a little too much importance is given to the comparison of the "speed" of the various predictor types (the article already mentions the need for overriding predictors, which arise because all interesting predictors are too slow).

It might be worth explaining more why branch prediction is so important. --CTho 02:20, 23 December 2005 (UTC)

Do we need to mention a little bit about perceptron predictors?

question
the article is not clear. "They allow processors to fetch and execute instructions without waiting for a branch to be resolved." Does the processor then discard the executed instructions if the branch is not taken? is this because logic is a lot slower than execution? — Preceding unsigned comment added by 24.7.86.143 (talk • contribs) 05:47, 12 October 2006 (UTC (UTC)


 * Older processors fetched along the predicted path, but didn't execute. Many now do speculative execution where, yes, the results are discarded when that path isn't taken. Gah4 (talk) 20:41, 18 April 2018 (UTC)

Comments in relation to modern high performance MPUs
Great article, it's quite comprehensive. One area that receives less attention than it should is specialized branch predictors. For instance, many of the advances in Intel's newer microarchitectures are adding more and more prediction mechanisms targeted at particular situations:

Indirect branch predictor - also to be added in AMD's new Barcelona core, mechanism to predict the target address of indirect branches. i.e. it picks from the BHT the most likely target address for a given branch Loop detector - mechanism to predict when a loop will be exited

Dkanter 19:38, 13 December 2006 (UTC)

Bimodal branch prediction using a single bit predictor
Actually one can do better than the article says with a single bit! This isn't something that can go in the main article but you might be amused by this. I wrote about jump prediction in 1979 when saving a bit actually meant something. Only changing the bit with a probability improves the percentage of correct predictions. Changing the bit with a very low probability for instance changes the chance of a correct prediction of a repeat loop size 2 from 0% to 50% and for a repeat loop of 3 from 33% to 55%. -Dmcq 21:31, 4 April 2007 (UTC)

Just noticed the business about the Burroughs B4900 storing bits back into the instruction. I rejected that because the instructions of the ICL 2900, the machine I was interested in, were variable length and it was possible for a program with low security to execute code meant for a higher level - so it could potentially gain control of the machine by altering code if the second half of an instruction looked like a branch. It'd be interesting to know if the B4900 was susceptible to that sort of attack! Dmcq 22:18, 9 July 2007 (UTC)


 * There are some Burroughs machines where all the security is in software. The system will only execute object programs generated by the compiler, and the compiler checks that you don't do things that you aren't supposed to do. I don't remember which machines, though. Gah4 (talk) 06:51, 27 June 2016 (UTC)


 * The Burroughs large systems did that; the B4900 was one of the Burroughs Medium Systems, however, so it might not have supported that.


 * However, if the processor stores prediction data back into the instruction in hardware/microcode, and the hardware/microcode is known to only modify the branch prediction bits, and if the prediction bits are only hints that affect performance but don't change the branch target, that feature would be safe. Guy Harris (talk) 07:17, 27 June 2016 (UTC)

Probably did?
The first paragraph of the Section "History" says that a cpu probably did use a branch predictor. This sounds like speculation or even wild guessing.--Henke37 21:09, 28 June 2007 (UTC)


 * http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=63652 would probably have the answer if anyone has access to it... --CTho 02:40, 29 June 2007 (UTC)


 * Yes, based on that article, the Vax 9000 had a branch predictor. "The Branch Prediction unit can handle two outstanding conditional branches but must stall the XBAR [instruction fetch crossbar] when it encounters a third."  If somebody else wants to edit the main page and give that citation, go ahead.  Daedae 20:27, 2 August 2007 (UTC)

Trivial prediction section heading
The article describes a form of static branch prediction as "Trivial prediction." I don't think this is a commonly used term, in fact I have never seen it used outside of this wikipedia entry. I'm going to change that heading to Static Branch prediction, which I believe is the more commonly used term. Also the existing section on static branch prediction is describing a specific type of static branch prediction, that is Backwards Taken/Fowards Not-taken so I think that heading should also be changed. Wikiuser1239 (talk) 03:56, 11 December 2008 (UTC)


 * Some IBM systems do static prediction on BXH and BXLE (branch index high, and branch index less or equal).  The instructions are commonly used at the beginning and end of for loops, such that BXH predicts not to branch, and BXLE predicts to branch.  There is no requirement that they be used that way, though. Gah4 (talk) 20:47, 18 April 2018 (UTC)

what about GCC __builtin_expect( cond, exp ) ??
In C, there is a syntax to help with branch prediction. The command looks like:

if( __builtin_expect( x == y, true ) ) { foo; }

Can someone please explain how this C keyword relates to CPU branch prediction? —Preceding unsigned comment added by 165.125.160.16 (talk) 07:02, 3 May 2009 (UTC)


 * It's not all that strongly related to CPU branch prediction. I don't know exactly what GCC does now but the sort oif things I've come across for this type facility are
 * It can move the less frequently used code out of line so it isn't included in a loop cache.
 * It can avoid loading variables outside of the code that are only used within the infrequent sections.
 * For small functions that don't call anything outside but might call something infrequently inside one can avoid setting up a stack frame unless it is needed.
 * As to CPU branch prediction it could set a static branch prediction or use knowledge of the default assumptions for how branches are dealt with. For instance if the code is moved out of line it might be moved to the end instead of before the beginning if branches forward are assumed not taken but branches back are assumed taken.
 * Anyway there's lots of good things it can do. Knowing the actual frequency from profiling can improve the decisions. Dmcq (talk) 11:58, 3 May 2009 (UTC)

Article rewritten, Oct. 2009
I have rewritten the article almost completely because it looked like a collection of random facts. Hopefully, this has solved many of the earlier issues on this talk page.

I have tried to improve the text to make it easier to understand for non-experts.

I have added an explanation of the 2-level predictor which was completely missing, even though most predictors are based on this principle today.

The history section still needs to be revised and updated. It may be a good idea to move the descriptions of older processors and obsolete prediction methods into the history section.

The word branch is ambiguous: It may mean (1) each of two alternative sections of code that an algorithm can choose between, or (2) the instruction that does the choosing. To avoid confusion, I have chosen to use the work branch only in the first meaning, and use conditional jump for the second meaning. Other Wikipedia articles make no such distinction and use branch in both meanings.

My drawings may need improvement.

Afog (talk) 17:59, 3 October 2009 (UTC)


 * Thanks, looks good and I like the diagrams. I've moved this to the end of the talk page as that's where new comments normally go on talk pages except for any special tags at the top. Dmcq (talk) 08:59, 4 October 2009 (UTC)

B4900 had a cache
It did, really! I participated in the design of the B4900 from concept through debug. I did not design the cache and do not remember the specs, but I know there was one. Any way, I deleted "cacheless" from the description of the B4900. By the way, we referred to changing the opcode as "code modification" which was something that other programs written for Medium Systems did. I don't remember those details either, but it was a design issue. Scottmunroe (talk) 02:12, 4 March 2010 (UTC)

B4900 Cacheless?
I woke up in the middle of the night thinking, "It wasn't really a cache. It was just a fetch buffer." It was only for code, not data. I suppose the function that would differentiate between a cache and a buffer would be whether or not the logic recognized that the required code was already in the buffer and did not read main memory if it was. I don't remember. So, since I am not certain, I will put "cacheless" back in. Scottmunroe (talk) 23:52, 12 March 2010 (UTC)
 * You might be interested in for a reason to do with security for rejecting updating the instructions. Also the ICL 2900 range could also have read only constants in the executable code and jumping to that could change the constants if one implemented something like the B4900. Was more privileged or shared code only executable by going via some protected vector on the B4900 so this couldn't be done or were they not worried by the possibility? Dmcq (talk) 08:07, 13 March 2010 (UTC)

The IBM 360/85 called it buffer instead of cache, but the idea was there. If it is associative, I would call it a cache. The 360/91 loop buffer stores sequential instruction words, but isn't associative, so probably not a cache. Gah4 (talk) 07:04, 27 June 2016 (UTC)

Article needs work
This page presents the details of branch prediction methods without first giving the reader a better idea of what branch prediction itself is. There is no discussion of how it relates to the concept of a hazard and specifically to branch hazards. For the methods that are detailed the order does not make any sense either. The basic concepts need to be presented first, like say branch history table / branch prediction buffer, and then we can present how they are used in real chips later on. These basic branch prediction concepts can illustrated with scenarios on simpler idealized processors, like the one in the opening picture or maybe a little more complicated. Red Bulls Fan (talk) 01:09, 22 October 2010 (UTC)

Critics?
I'm coming over from http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array where I read that prediction is sometimes very harmful, especially for unsorted arrays. So aren't there any specialists who understand that topic and see that branche prediction is the attempt to predict something that often can't be predicted, and therefore must be considered harmful? I won't say that prediction isn't always harmful, but in some cases it is, and therefore the CPU designers should start to think more critically about what others implemented. Right now our processors do predictions no matter how much it costs, and this needs to be changed in the future. This angers me a bit, because it tells me that a lot of people aren't doing their work correctly. Actually, it's a shame. 178.197.235.31 (talk) 10:29, 24 May 2013 (UTC)
 * From what I understand this isn't a case of prediction is harmful, i.e that branch prediction slows things down. It is more of the case that for unpredictable input, branch prediction cannot do a very good job (for obvious reasons) and therefore things slow down. What the author of the question is seeing is not slowdown because of branch prediction, but the absence of branch prediction speedup. Also, most branches can be predicted reasonably well, especially the type of code used in performance sensitive areas (such as tight loops) usually have simple patterns and can therefore be predicted efficiently.--TowerOfBricks (talk) 13:30, 15 July 2013 (UTC)

Dead link
A note that the reference Championship Branch Prediction (13) is a dead link (or at least the server is not responding at the moment). As I do not know exactly what information the original page contained, I don't want to replace it directly. However this page http://www.jilp.org/jwac-2/ looks like a good reference on the subject. — Preceding unsigned comment added by TowerOfBricks (talk • contribs) 13:34, 15 July 2013 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 1 one external link on Branch predictor. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive http://web.archive.org/web/20100616183714/http://cava.cs.utsa.edu/pdfs/micro03_dist.pdf to http://cava.cs.utsa.edu/pdfs/micro03_dist.pdf

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 01:50, 19 March 2016 (UTC)


 * I found a non-archived version. Guy Harris (talk) 02:59, 19 March 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Branch predictor. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20080528083720/http://camino.rutgers.edu/cbp2/ to http://camino.rutgers.edu/cbp2/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 16:28, 24 July 2017 (UTC)


 * Worked, but I replaced it with a reference to a later CBP competition. Guy Harris (talk) 17:39, 24 July 2017 (UTC)

branch misprediction
There is supposed to be discussion of branch misprediction but I don't see it. Seems to be, it should be a redirect here. Gah4 (talk) 20:51, 18 April 2018 (UTC)
 * I redirected rather than merged, as all of the ideas on the branch misprediction was already included on this page, and there were also no references on it. Klbrain (talk) 09:47, 1 May 2019 (UTC)

How to be sure if it is not parallelization?
If there were the circuits of say one hundred Z80 CPUs in one chip and there was a microcode unit wrapped around which analyzed the fetched instructions for a possibility of parallelization and then distributed them equally to the single CPUs and outwardly emulated another CPU, how could you distinguish the weird behaviour, for example when code was self-modifying, from branch prediction?


 * If by "which analyzed the fetched instructions for a possibility of parallelization and then distributed them equally to the single CPUs" you mean that, for conditional branches, it sends "what we do if the branch is taken" to one CPU and "what we do if the branch isn't taken" to another CPU, that's not branch predictions, that's speculative execution in its eager execution form. Guy Harris (talk) 09:06, 22 November 2019 (UTC)

Overriding branch predictor examples
Hi Folks, I made the following edit which was rejected for COI (thanks MrOllie for flagging).

I would suggest retaining the edit and possibly altering the citation.

Intel's Silvermont (microarchitecture) implements a two-level overriding branch predictor. The first level predictor appears to be a next line predictor that controls instruction fetching. The second level predictor controls which instructions are decoded and can override the first predictor.

The reference is to my own site (www.realworldtech.com), which is pretty widely cited in Wiki articles. Other references include Agner Fog's work (https://www.agner.org/optimize/microarchitecture.pdf), which cites me.

I think it's important to have as an example since Silvermont's overriding predictor actually uses both two different types of predictors (as well as different storage for the histories used to predict).

So - do you think the citation is necessary in this situation? If not, maybe we drop to avoid the COI problem? Alternatively, maybe folks are comfortable with it?

TheKanter (talk) 19:56, 17 October 2021 (UTC)

Presetting the Predictor
Are there any CPUs that actually allow the programmer to preset the branch predictor. I have read about optimizations that use execution profile information to set branch prediction, so there must be some sort of external preset possible. Is there a CPU that lets this be done computationally? A Virtual Machine has an execution loop that has as many jump targets as the VM has instructions. Each of these targets ends with a jump to another target which is a direct result of the bytecode that is being interpreted. But the sequence of jumps is known. If the CPU had an instruction that allowed the next jump's target to be set by the code, the VM could execute without ever needing to "predict" a jump. If such a CPU is available, it should be discussed here. 50.206.176.154 (talk) 18:47, 5 April 2022 (UTC)
 * Some IBM z/Architecture microprocessors presumably do so, given the "branch prediction preload" instructions described on pages 743 through 746 of z/Architecture Principles of Operation, Twelfth Edition; those instructions have a register operand, with the register containing the predicted target address, and a memory operand, pointing to the branch instruction, as well as a third 4-bit immediate operand describing the type of branch instruction (length, indication of whether it's a call/return/other branch/EXECUTE instruction (CISC architecture dating back to the 1960's, of course it has an execute instruction :–)).
 * I don't know of any others. It would be interesting to see whether anybody's studied the benefits of explicit indirect ranch prediction and to see why no other major instruction set seems to provide this (including, as far as I know, IBM's own Power ISA). The main places I'd see it being used would be interpreters, as per your note, and method calls if the method implementation differs from instance to instance. Guy Harris (talk) 19:47, 5 April 2022 (UTC)