Talk:Vector processor

Confusing SIMD categorisation
In most of the computer architecture books that I have read, SIMD is a categorized as type of multiprocessing, not as a type of vectorization. My understanding of the meaning of vectorization is an architecture which streams data into an execution unit. That is, it achieves high performance through high temporal utilization of a single functional unit. SIMD achieves high performance through a different axis, that of replication of functional units. For that reason, I believe this article is confusing SIMD as a type of vectorization. Dyl 23:34, 27 December 2005 (UTC)
 * it's not the article itself per-se, it's that some vendors miscategorised their ISA by using the word "Vector" without actually providing features *of* Vector processors. For example some ISAs took traditional Gather-Scatter operations or permute operations from "true" variable-length Vector ISAs, slammed them into fixed-width SIMD instructions then claimed that they'd made a Vector Extension. in other words just because there are *features* lifted from pure Vector processors and jammed into SIMD does not make SIMD itself a Vector Processor. it just massively confuses things. joy. Lkcl (talk) 20:35, 6 June 2021 (UTC)
 * it's not the article itself per-se, it's that some vendors miscategorised their ISA by using the word "Vector" without actually providing features *of* Vector processors. For example some ISAs took traditional Gather-Scatter operations or permute operations from "true" variable-length Vector ISAs, slammed them into fixed-width SIMD instructions then claimed that they'd made a Vector Extension. in other words just because there are *features* lifted from pure Vector processors and jammed into SIMD does not make SIMD itself a Vector Processor. it just massively confuses things. joy. Lkcl (talk) 20:35, 6 June 2021 (UTC)

8086 and family
If it's allowable to use multiple cycles in data processing then do the x86 family, with things like the string operations, fit into this category? --ToobMug 15:47, 26 May 2007 (UTC)

This is one of the best articles on microcomputer architecture I've ever read. It's descriptions are simple enough for a layman like me to understand, and yet leads the casual reader into a wealth of information.I'm sure other technical articles on Wikipedia could do with emulating this style. Fantastic work !

GPUs?
Isn't a shader in a typical ATI or Nvidia GPU a vector processor? They process pixels and color data as vectors. 76.205.122.29 (talk) 18:48, 26 May 2010 (UTC)
 * yes, although typically the pixel colour data is processed / categorised as "sub-vectors": vec2, vec3, vec4. VEC2 would be XY, VEC3 would be RGB or YUV or XYZ, and VEC4 would be ARGB or XYZW. it's made additionally complicated by these sub-vectors sometimes being treated as independent elements within vectors. RVV has this capability, as does SVP64, the Draft Extension to PowerISA i am developing Lkcl (talk) 20:19, 6 June 2021 (UTC)
 * yes, although typically the pixel colour data is processed / categorised as "sub-vectors": vec2, vec3, vec4. VEC2 would be XY, VEC3 would be RGB or YUV or XYZ, and VEC4 would be ARGB or XYZW. it's made additionally complicated by these sub-vectors sometimes being treated as independent elements within vectors. RVV has this capability, as does SVP64, the Draft Extension to PowerISA i am developing Lkcl (talk) 20:19, 6 June 2021 (UTC)

Difference between Array and Vector processors
Array processors and vector processors are different, Aren't they? I think redirect from Array processor shd be disabled and a separate section for Array processor has to be made —Preceding unsigned comment added by 129.217.129.131 (talk) 20:47, 5 January 2011 (UTC)

Tanenbaum, A.S. 1999. Structured Computer Organization. Prentice Hall. makes a difference between array machines and vector machines (I don't have the book here right now, I might remember incorrectly). I just looked into the new edition via Amazon and there Tanenbaum makes a difference between "SIMD processor" and "vector processor". The former have multiple PEs (processing elements) which have local memory, and are controlled by a single instruction stream (example ILLIAC IV). Vector processors on the other hand have vector registers and a single functional unit to operate on all entries in such a register. Tanenbaum cites the Cray-1 as an example. Other examples are SSE, AVX, AltiVec, NEON. It seems hard to find a consistent differentiation in naming the different SIMD hardware. I do find it important, though, to be clear about the differences there are. Mkretz (talk) 11:17, 29 June 2013 (UTC)
 * there is an actual processor which called itself an Array String Processor, by Aspex Microelectronics. I worked for them back in... mmmm... 2003? i think. that was an insane incredible architecture: 4096 2-bit ALUs with left-right connections, you could do 8192 bit addition or bit-shift, or you could break it down to do thousands of smaller (8-bit, 4-bit) Computations. In some specialist algorithms it was a hundred even a THOUSAND times faster than processors of its era.  I added crossreferences to Academic peer-journal papers by its key architects, and to archive.org.  Fascinating anecdotal tidbit: the ASP was an actual serious contender for the Reagan era "Star Wars" Programme! Only the lasers were the bit that let them down :) Yes, Array Processors have actually been manufactured and sold: they are *not* the same thing as "bare" (non-predicate-capable) SIMD processors, they are more like "true" Vector Processors. Lkcl (talk) 13:43, 6 June 2021 (UTC)
 * there is an actual processor which called itself an Array String Processor, by Aspex Microelectronics. I worked for them back in... mmmm... 2003? i think. that was an insane incredible architecture: 4096 2-bit ALUs with left-right connections, you could do 8192 bit addition or bit-shift, or you could break it down to do thousands of smaller (8-bit, 4-bit) Computations. In some specialist algorithms it was a hundred even a THOUSAND times faster than processors of its era.  I added crossreferences to Academic peer-journal papers by its key architects, and to archive.org.  Fascinating anecdotal tidbit: the ASP was an actual serious contender for the Reagan era "Star Wars" Programme! Only the lasers were the bit that let them down :) Yes, Array Processors have actually been manufactured and sold: they are *not* the same thing as "bare" (non-predicate-capable) SIMD processors, they are more like "true" Vector Processors. Lkcl (talk) 13:43, 6 June 2021 (UTC)

Not the real history
It is a distortion of historical events to characterize on-chip simd operations as vector instructions. The simd concept originated with the early work on parallel computers which was both separate from and earlier than the big-iron vector machines. Jfgrcar (talk) 03:23, 29 January 2011 (UTC)


 * Yes, that is not the right characterization, but the article has bigger problems anyway. History2007 (talk) 21:17, 8 July 2011 (UTC)

Quality?
This page needs real clean up. A simple diagram would do a lot, and there are zero refs now. Unless there are objections I will remove the x86 architecture code that has no place in an encyclopedia. I will have to find a nice image to explain the concept. Does anyone have a nice diagram for this? History2007 (talk) 21:17, 8 July 2011 (UTC)
 * A 4-element SIMD extension like SSE isn't a vector processor anyway, so its irrelevant to this page. A vector processor would be something like an NEC SX-6 or a Cray-2 or some DSPs. 69.54.60.34 (talk) 03:43, 8 September 2011 (UTC)
 * SIMD examples are extremely useful to illustrate starkly and bluntly how truly and horrifically awful SIMD really is, compared to good Vector ISAs. that is not overstated. the incumbent current computing giants have done the world a massive disservice by believing and propagating the SIMD seduction for 30 years.  this is best illustrated in more neutral understated language in the sigarch "SIMD Considered harmful" citation now added to the page, which, on careful reading, is observed to provide stunning statistics such as a 10:1 reduction in the number of instructions executed, and 50% *or greater* savings in the number of instructions needed.  this is a big damn deal that Intel and AMD have a hell of a lot to answer for, and ARM is only just waking up to with the introduction of SVE2. Lkcl (talk) 23:51, 5 June 2021 (UTC)
 * to illustrate how stark this really is i tried compiling the ultra-simple 2-line iaxpy example with x86 gcc 10.3 on godbolt.org with the options "-O3 -march=knl" to allow optimised AVX512. the results? an astounding TWO HUNDRED AND SEVENTY assembler instructions. i mean wtf?!? i won't list them here, but you should be able to use this link https://godbolt.org/z/55Kax4j9f Lkcl (talk) 03:28, 8 June 2021 (UTC)
 * tried the same thing with ARM SVE2 https://godbolt.org/z/nd1aE1vY4 the options given are from the ARM SVE tutorial which are armv8 clang 11 -O3 -march=armv8-a+sve and it's not bad: only 45 instructions. this is however still *double* that of the equivalent RVV number of instructions which can be seen in the sigarch "SIMD considered harmful" link. whoops. Lkcl (talk) 04:16, 8 June 2021 (UTC)
 * to illustrate how stark this really is i tried compiling the ultra-simple 2-line iaxpy example with x86 gcc 10.3 on godbolt.org with the options "-O3 -march=knl" to allow optimised AVX512. the results? an astounding TWO HUNDRED AND SEVENTY assembler instructions. i mean wtf?!? i won't list them here, but you should be able to use this link https://godbolt.org/z/55Kax4j9f Lkcl (talk) 03:28, 8 June 2021 (UTC)
 * tried the same thing with ARM SVE2 https://godbolt.org/z/nd1aE1vY4 the options given are from the ARM SVE tutorial which are armv8 clang 11 -O3 -march=armv8-a+sve and it's not bad: only 45 instructions. this is however still *double* that of the equivalent RVV number of instructions which can be seen in the sigarch "SIMD considered harmful" link. whoops. Lkcl (talk) 04:16, 8 June 2021 (UTC)
 * tried the same thing with ARM SVE2 https://godbolt.org/z/nd1aE1vY4 the options given are from the ARM SVE tutorial which are armv8 clang 11 -O3 -march=armv8-a+sve and it's not bad: only 45 instructions. this is however still *double* that of the equivalent RVV number of instructions which can be seen in the sigarch "SIMD considered harmful" link. whoops. Lkcl (talk) 04:16, 8 June 2021 (UTC)

Two Kinds of Vectors
The article notes that such things as AltiVec and SSE are examples of vector processing, and so it's common on current chips.

But if that is the case, then vector processing goes back long before the STAR-100.

Intel's MMX split up a 64-bit word into multiple 32-bit or 16-bit integers.

With a 36-bit word, the Lincoln Laboratories TX-2 was doing the same thing, as was the AN/FSQ-31 and 32 with a 48-bit word. And those two were derived from IBM's SAGE system, which operated on vectors of two 16-bit numbers at once.

The kind of vector processing that a Cray-I did, on the other hand, isn't nearly as common; right now, the only current system of that general kind is the SX-ACE from NEC. — Preceding unsigned comment added by Quadibloc (talk • contribs) 22:33, 7 August 2016 (UTC)
 * Altivec is miscategorised. just because "VSX" has the word "Vector" in it does not make it Vector Processing: VSX and Altivec are pure fixed-length SIMD, with zero predication, and cause programmers to write the most horrendous general-purpose (variable length) assembler. Actual Vector Processing involves having either a VL (Vector Length) register or at the bare minimum some Vector Predicates which allows mask-out of element operations.  NEON, MMX, SSE, Altivec, VSX, these are SIMD.  AVX, AVX512, ARM SVE2, these are predicated SIMD. Cray, RISCV RVV, LibreSOC's SVP64, SX-ACE (which I had not heard of before, thank you for that one i will look it up), these are all Cray-style Variable Length. Lkcl (talk) 23:58, 5 June 2021 (UTC)
 * Altivec is miscategorised. just because "VSX" has the word "Vector" in it does not make it Vector Processing: VSX and Altivec are pure fixed-length SIMD, with zero predication, and cause programmers to write the most horrendous general-purpose (variable length) assembler. Actual Vector Processing involves having either a VL (Vector Length) register or at the bare minimum some Vector Predicates which allows mask-out of element operations.  NEON, MMX, SSE, Altivec, VSX, these are SIMD.  AVX, AVX512, ARM SVE2, these are predicated SIMD. Cray, RISCV RVV, LibreSOC's SVP64, SX-ACE (which I had not heard of before, thank you for that one i will look it up), these are all Cray-style Variable Length. Lkcl (talk) 23:58, 5 June 2021 (UTC)

Why isn't the word SIMD in this article?
Aren't SIMD and vector processors largely synonymous? Isn't SIMD usually vector? Isn't vector processing usually SIMD? WorldQuestioneer (talk) 20:10, 13 July 2020 (UTC)


 * There seems to be a somewhat arbitrary distinction between "traditional" vector machines and SIMD machines described in SIMD article:
 * "Vector-processing architectures are now considered separate from SIMD computers, based on the fact that vector computers processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD computers process all elements of the vector simultaneously. [some 1998 ref here]"
 * I am no expert on this issue, but this seems... dumb. Like, there's only this much you can do with pipeline and looping on a single ALU, so modern stuff sold as "vector processors" like NEC SX-Aurora TSUBASA use a bunch of SIMD units too. Some phrasing need to be added to accept these sort of stuff. --Artoria2e5 🌉 01:42, 15 July 2020 (UTC)
 * it seems dumb because it's plain wrong. Cray Vector Engines had so many registers and could do so many elements in parallel in a single clock cycle that they had to have external ultra-expensive multi-ported SRAM instead of internal register files. they got away with that because the speed of processing matched speed of memory at the time (a trick that won't work today). Modern Vector Processors actually have predicated SIMD ALU back-ends (called "Lanes"), you just don't get to use them directly because the ISA hides them from you.  the Issue Phase is what chucks variable-length Element operations at the SIMD backends *on your behalf* so you as the programmer don't have to piss about with god-awful stripmining and teardown. Lkcl (talk) 00:07, 6 June 2021 (UTC)
 * SIMD and Cray-style Vector Processing, Vectors are so light-years ahead of SIMD in terms of efficiency and effectiveness it's not even funny. see other comments above. Lkcl (talk) 00:00, 6 June 2021 (UTC)
 * a more direct answer (now in the article) is that modern Vector Processors tend to use SIMD back-ends, fronted by a proper Vector ISA. you as the programmer absolutely do not need to know about this: you use the *Vector* ISA, not a SIMD ISA. those SIMD back-ends have built-in predication masks, which the micro-architecture can use to finish loops by going, "oh, huh, we only have 3 items left to do, and the SIMD units are 8 wide, um, let me do some math, here... that means i have to calculate a predicate mask of 0b00000111 and chuck it at the SIMD ALUs for you". if the Vector operation is itself predicated, that 0b00000111 is simply ANDed with the relevant mask bits. bottom line is that there is absolutely no excuse whatsoever for Intel, AMD and ARM, in the year 2021, 50+ years after Cray Vectors were invented, to be pedalling SIMD as if it was doing us a favour. Lkcl (talk) 03:44, 8 June 2021 (UTC)
 * SIMD and Cray-style Vector Processing, Vectors are so light-years ahead of SIMD in terms of efficiency and effectiveness it's not even funny. see other comments above. Lkcl (talk) 00:00, 6 June 2021 (UTC)
 * a more direct answer (now in the article) is that modern Vector Processors tend to use SIMD back-ends, fronted by a proper Vector ISA. you as the programmer absolutely do not need to know about this: you use the *Vector* ISA, not a SIMD ISA. those SIMD back-ends have built-in predication masks, which the micro-architecture can use to finish loops by going, "oh, huh, we only have 3 items left to do, and the SIMD units are 8 wide, um, let me do some math, here... that means i have to calculate a predicate mask of 0b00000111 and chuck it at the SIMD ALUs for you". if the Vector operation is itself predicated, that 0b00000111 is simply ANDed with the relevant mask bits. bottom line is that there is absolutely no excuse whatsoever for Intel, AMD and ARM, in the year 2021, 50+ years after Cray Vectors were invented, to be pedalling SIMD as if it was doing us a favour. Lkcl (talk) 03:44, 8 June 2021 (UTC)
 * a more direct answer (now in the article) is that modern Vector Processors tend to use SIMD back-ends, fronted by a proper Vector ISA. you as the programmer absolutely do not need to know about this: you use the *Vector* ISA, not a SIMD ISA. those SIMD back-ends have built-in predication masks, which the micro-architecture can use to finish loops by going, "oh, huh, we only have 3 items left to do, and the SIMD units are 8 wide, um, let me do some math, here... that means i have to calculate a predicate mask of 0b00000111 and chuck it at the SIMD ALUs for you". if the Vector operation is itself predicated, that 0b00000111 is simply ANDed with the relevant mask bits. bottom line is that there is absolutely no excuse whatsoever for Intel, AMD and ARM, in the year 2021, 50+ years after Cray Vectors were invented, to be pedalling SIMD as if it was doing us a favour. Lkcl (talk) 03:44, 8 June 2021 (UTC)

Recommendation that importance be set "Top"
the page currently does not have importance set. i recommend it be changed to "top" after a review. basically the page has near zero recognition of the strategic importance of how Vector processing has influenced our lives, in computing. this is actually a cause for some concern, from a sociological and historic perspective. however i am not comfortable setting it myself, would prefer a review. Lkcl (talk) 16:53, 10 June 2021 (UTC)
 * 1) Cray-I supercomputer. says it all.
 * 2) Vector processing as a concept saves so much power, so fewer instructions, so much less absolute hell for programmers it's not funny.  LITERALLY an order of magnitude saving on program size.
 * 3) Vector processing is the basis of every GPU on the planet.  every smartphone, 99% of supercomputers, every Gaming Graphics Card.
 * 4) there are *477* links to this page!

Discernable features
has been quite insistent that vector processors are distinguishable from SIMD and offers a two-point test at the end of the lead for identifying a vector processor. I don't see support for this test in any cited sources. The citations provided are for WP:PRIMARY technical details of individual architectures which is a recipe for WP:SYNTHESIS. We need to reference these assertions to a WP:SECONDARY source like a textbook on processor architecture. ~Kvng (talk) 14:26, 13 June 2021 (UTC)
 * With Vector Processing having been completely forgotten about for nealy 50 years, with only extreme high-end secretive systems only actually properly implementing true Vector ISAs, and with even Sony, IBM, Intel *and ARM* completely misleading absolutely everyone including Academics about this, you're simply not going to find anything. You can clearly see the claim since 2003 by Sony / IBM when the Cell Processor first came out that VSX and Altivec are Vectors: Alti-VEC, and VSX VECTOR Scalar Extension.  with such large companies making such false and misleading claims, and those false claims being propagated throughout literature for decades, it might seem difficult to say otherwise.  However the fact remains that a basic first-principles analysis, as well as comprehensive detailed review of available ISAs, both SIMD (VSX NEON SSE), Predicated SIMD (AVX512, SVE2), and "True" Vector ISAs (Cray, SX-Aurora, RVV) clearly shows the difference, namely that they borrowed features from Vector ISAs.  I have spent nearly a week going over this, creating examples in considerable detail, with citations in each example, showing in each example where the features exist.  Thus, although there are no books which "state" this fact, it is a fact that may be logically deduced and concluded.  Whether it's popular, whether the Marketing Departments of the billion-dollar who came up with the false and misleading statements which conflated Vectors with SIMD like that being pointed out, this remains to be seen. Lkcl (talk) 16:09, 13 June 2021 (UTC)
 * ok, so what is a way forward, here. surely there must be other wikipedia pages where the "Marketing" of large billion dollar companies has been ongoing for such a long time (2 decades in this case) that it's permeated pretty much all literature on the subject. if the correct logically-deducible facts are removed from this page it misleads readers and continues to allow billion dollar companies to propagate false marketing. there must be a process or wording by which this can be clearly communicated. what might that be? Lkcl (talk) 16:34, 13 June 2021 (UTC)
 * something which might help with the process of logical reasoning and factual deduction here: look up the definition of a Vector. actually, that's harder than it looks, but you get the idea https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics). question: where in the definition of vector does it say the number of elements is a hard-fixed quantity? SIMD by definition makes the number of elements a hard inalienable unchangeable quantity.  this is a fact.  it is part of the definition of SIMD.  by total contrast, Seymour Cray and other designers of Vector ISAs specifically designed Vector ISAs to be variable length.  this is also a fact.  therefore, it is blindingly obvious by definition - fact - that SIMD != Vector. is that clear enough and simple enough? if so, how is it best worded? Lkcl (talk) 17:01, 13 June 2021 (UTC)
 * ah - got another one for you. look closely at this article: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-2-dealing-with-leftovers - it starts with this: "In this post, we deal with an often encountered problem: input data that is not a multiple of the length of the vectors you want to process. You need to handle the leftover elements at the start or end of the array - what is the best way to do this on Neon?".  note the wording.  input data is not a multiple of the length of the vectors you want to process.  it then uses explicit fixed-length NEON SIMD instructions vld1.8.  further down, a statement Neon provides loads and stores that can operate on single elements in a vector. Using these, you can load a partial vector containing one element, then in the code fragment below talks about vld1.8  {d0}, [r0]!  @ load eight elements from the array.
 * in other words they're conflating the data itself, which is arrays (Vectors) with the capability of the hardware (which is fixed-length SIMD) and thus giving the reader the completely false and misleading impression by implication and by accidental word-association that the hardware itself is "Vector-capable".
 * now, this makes the article itself no less clear: it's a brilliant well-written article that does its best in the face of the god-awful seductive horror that fixed-length non-predicated SIMD actually is: it describes very clearly and succinctly the work-around techniques called "Fixups" which have to be deployed when the data vector length does not match the fixed hardware length... all the while not mentioning at all that if NEON was *actually* a Vector Processor, none of those god-awful dog's dinner techniques would even be needed. ARM can't exactly go shooting its own contributors, can it?
 * fortunately for ARM (and thank god for the programming community), SVE2 fixed all of that by providing element-level predication as a fundamental part of the SVE2 ISA.
 * by complete contrast to NEON this makes ARM SVE2 capable of properly processing variable-length Vector data. and that's really what this is about. NEON != Vector.  SVE2 ~= Vector. Cray == Vector. Lkcl (talk) 20:27, 13 June 2021 (UTC)
 * and another one. the article is about processors that were designed from the ground up to be "processors that handle large vectors".  general purpose computers which had SIMD added as an afterthought do not qualify as vector processors.  there is already a page on SIMD, and people interested on SIMD should go there and read about it.  if SIMD === Vector Processors, then why on earth does this page exist at all? the answer is: because it's *about* Vector Processors, not about SIMD. that alone tells you that there's a definite difference. Lkcl (talk) 22:57, 13 June 2021 (UTC)
 * i've added an extra set of examples which are based on Cray 1, SX Aurora and RVV "mapreduce" Vector capability. these categorically are not and cannot, by definition, be part of SIMD.  they are however fundamentally part of Vector processors. they're certainly not part of GPU ISAs, although occasionally you will see a short-vector (subvector) dotproduct or 3-vec crossproduct instruction. these however are very specific (vec3 usually) and are in no way general-purpose.  Cray, Aurora and RVV all have general purpose Vector reduction capability, and Cray and Aurora also have iteration (prefix sum, aka pascal's triangle).  SIMD is incapable by design of iteration and reduction and the resultant assembler for doing "horizontal sum" is absolutely awful Lkcl (talk) 15:19, 19 June 2021 (UTC)
 * fortunately for ARM (and thank god for the programming community), SVE2 fixed all of that by providing element-level predication as a fundamental part of the SVE2 ISA.
 * by complete contrast to NEON this makes ARM SVE2 capable of properly processing variable-length Vector data. and that's really what this is about. NEON != Vector.  SVE2 ~= Vector. Cray == Vector. Lkcl (talk) 20:27, 13 June 2021 (UTC)
 * and another one. the article is about processors that were designed from the ground up to be "processors that handle large vectors".  general purpose computers which had SIMD added as an afterthought do not qualify as vector processors.  there is already a page on SIMD, and people interested on SIMD should go there and read about it.  if SIMD === Vector Processors, then why on earth does this page exist at all? the answer is: because it's *about* Vector Processors, not about SIMD. that alone tells you that there's a definite difference. Lkcl (talk) 22:57, 13 June 2021 (UTC)
 * i've added an extra set of examples which are based on Cray 1, SX Aurora and RVV "mapreduce" Vector capability. these categorically are not and cannot, by definition, be part of SIMD.  they are however fundamentally part of Vector processors. they're certainly not part of GPU ISAs, although occasionally you will see a short-vector (subvector) dotproduct or 3-vec crossproduct instruction. these however are very specific (vec3 usually) and are in no way general-purpose.  Cray, Aurora and RVV all have general purpose Vector reduction capability, and Cray and Aurora also have iteration (prefix sum, aka pascal's triangle).  SIMD is incapable by design of iteration and reduction and the resultant assembler for doing "horizontal sum" is absolutely awful Lkcl (talk) 15:19, 19 June 2021 (UTC)
 * i've added an extra set of examples which are based on Cray 1, SX Aurora and RVV "mapreduce" Vector capability. these categorically are not and cannot, by definition, be part of SIMD.  they are however fundamentally part of Vector processors. they're certainly not part of GPU ISAs, although occasionally you will see a short-vector (subvector) dotproduct or 3-vec crossproduct instruction. these however are very specific (vec3 usually) and are in no way general-purpose.  Cray, Aurora and RVV all have general purpose Vector reduction capability, and Cray and Aurora also have iteration (prefix sum, aka pascal's triangle).  SIMD is incapable by design of iteration and reduction and the resultant assembler for doing "horizontal sum" is absolutely awful Lkcl (talk) 15:19, 19 June 2021 (UTC)
 * i've added an extra set of examples which are based on Cray 1, SX Aurora and RVV "mapreduce" Vector capability. these categorically are not and cannot, by definition, be part of SIMD.  they are however fundamentally part of Vector processors. they're certainly not part of GPU ISAs, although occasionally you will see a short-vector (subvector) dotproduct or 3-vec crossproduct instruction. these however are very specific (vec3 usually) and are in no way general-purpose.  Cray, Aurora and RVV all have general purpose Vector reduction capability, and Cray and Aurora also have iteration (prefix sum, aka pascal's triangle).  SIMD is incapable by design of iteration and reduction and the resultant assembler for doing "horizontal sum" is absolutely awful Lkcl (talk) 15:19, 19 June 2021 (UTC)

deeper problems with all associated articles
https://en.wikipedia.org/wiki/Talk:SIMD#Page_quality_is_awful_(in_the_summary)

there are fundamental problems with the three pages, Vector Processing, SIMD, and SIMT. from the link above it can be seen that there is MASSIVE confusion even from academic coursework and academic literature on this topic.

it also does not help that neither Flynn nor Duncan taxonomy cover SIMT! Even i was not aware in 2004 when working for Aspex that it was a *SIMT* processor not a *SIMD* one because NVIDIA had not coined the phrase, only introducing it in what... 2012? 2016? sonething like that.

it also does not help that a pure SIMD only processor with zero scalar capability and no scalar registers is ANOTHER class of processor that at the hardware level is virtually indistinguishable from SIMT.

some diagrams are urgently needed here which illustrate these things properly.

given that SIMD is literally the top world hit on google search engines, this is a pretty damn high priority task. how can this be properly given attention and resources? Lkcl (talk) 14:21, 15 June 2021 (UTC)
 * , when there is disagreement in sources about something like this, the approach we generally take is to report the different sides of the argument (with citations). Wikipedia is not the decider. ~Kvng (talk) 21:20, 15 June 2021 (UTC)
 * hiya Kvng, i don't have a problem with the citations being used, they are good. the problem is, they're not being read / understood properly because some of them are quite old (1977) i.e. use different terminology from modern computing.  combine that with the complexity of the subject, combine it with the secrecy that NVIDIA, AMD, Intel and ATM engage in where people *cannot find out* what is inside, and combine it with the "circular citation problem" of wikipedia (a misreport gets cited in academia which then is published and is cited by wikipedia....) and we have the situation where several inter-related very important computing topics are badly misrepresenting the fundamentals of computing architecture that is the cornerstone of our modern way of life.  now, i can point out the problem, from the expertise that i have, but i have a hell of a lot to get done.  i am going to need help finding citations.  i can hand-draw diagrams very quickly, but someone else with more time will need to do them in SVG. basically a conversation and collaboration is needed. Lkcl (talk) 01:41, 16 June 2021 (UTC)
 * guy harris kindly found a ref 1977 Flynn paper, Flynn followed up after his initial paper and sub-categorised SIMD. one of those is SIMT! Lkcl (talk)
 * guy harris kindly found a ref 1977 Flynn paper, Flynn followed up after his initial paper and sub-categorised SIMD. one of those is SIMT! Lkcl (talk)
 * guy harris kindly found a ref 1977 Flynn paper, Flynn followed up after his initial paper and sub-categorised SIMD. one of those is SIMT! Lkcl (talk)

update to categorisation
after creating a second example based on real-world Vector Processor ISAs i realised my initial deduction of what constitutes a Vector Processor was slightly inaccurate. the two discerning features, and SIMD is incapable of one of them by definition, are:


 * 1) element-level aligned memory LOAD/STORE where SIMD is typically per batch aligned
 * 2) this is the kicker as far as SIMD is concerned: inter-element reduction and iteration.  also known as "Horizontal sum" it is literally impossible for SIMD to perform.

the examples which are not original research one of which was there before i started editing the page show this distinction very clearly.

Predicated SIMD sort-of deserves the moniker "Vector capable" but unfortunately for SIMD the lack of Horizontal Sum and other reduction operations kicks it in the nuts as far as defining Predicated SIMD as "A Vector Processor", to use a colloquial term.

now, there does exist the possibility that some random processor out there may have "SIMD Horizontal Sum" however given the absolutely ridiculous stackoverflow discussions when searching for this i would not hold my breath waiting for Intel to add it. there might be some obscure historic SIMD architectures out there but if they have "Horizontal Sum" i haven't encountered them yet. Lkcl (talk) 16:17, 21 June 2021 (UTC)

possible way to express this
Kvng i think i might have a way to express this which stops people from making false and misleading statements but also does not risk "SYNTHESIS".

it involves simply stating that a comparison between typical SIMD ISAs and typical Vector ISAs shows that SIMD ISAs miss two features:
 * 1) no setvl instruction (or REP feature)
 * 2) no iteration and no reduce arithmetic instructions

if understated enough, the fact that these are the discernable differences, combined with the examples, should stop people from making the mistake of thinking "oh, ARM and IBM and Intel marketing material said they do Vectors therefore ARM and IBM and Intel must all be Vector Processors". what's your thoughts on that approach? Lkcl (talk) 00:53, 23 June 2021 (UTC)


 * What is your source for this proposed criteria? ~Kvng (talk) 12:54, 23 June 2021 (UTC)
 * i'm going to avoid as much as possible calling it "a criteria". instead making "an observation". there is no one source (or, i do not expect there to be one, i would be delighted if there were).
 * * comprehensive comparisons of Vector ISA candidates(VideoCore IV, Cray, SX Aurora, RVV, STAR100, x86) these all have setvl or a REP feature
 * * Cray, SX Aurora, RVV, these have Reduction and Iteration instructions
 * * no SIMD ISA has those features.
 * this is just a plain and simple fact. x86: no reduction.  MIPS: no reduction.  ARM NEON: no reduction.  ARM SVE2: no reduction. it is just... a... fact.
 * now, based on that observation it can be shown through the examples that it causes SIMD programmers absolute hell, due to the lack of those instructions. there are plenty of online Hell stackoverflow sources for this, and there's going to be better somewhere. could use some help finding some. Lkcl (talk) 18:58, 23 June 2021 (UTC)
 * https://stackoverflow.com/questions/12965377/simd-vs-vector-architectures not bad! pretty damn good answer! Lkcl (talk) 19:05, 23 June 2021 (UTC)
 * https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?media=seth-740-fall13-module5.1-simd-vector-gpu.pdf slide 34 is really useful, but is an implementation detail. illustrates the difference extremely well though.
 * http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture11-vector.pdf again illustrates Vector Chaining, very nice, but again doesn't go into detail
 * https://medium.com/swlh/risc-v-vector-instructions-vs-arm-and-x86-simd-8c9b17963a31 i thought this was going to be useful but it's not. his article about the M1 is spot-on though.
 * http://cva.stanford.edu/classes/ee482s/scribed/lect11.pdf finally something useful. will study in-depth over next few days.
 * https://stackoverflow.com/questions/59775017/are-rep-instructions-considered-vector-operations illustrates why clarity is important here. people are making the connection about the similarities, but not quite getting it.
 * http://meseec.ce.rit.edu/756-projects/spring2013/2-2.pdf useful diagrams, illustrates some of the simplicity obtained with vector regfiles, also describes chaining. still not completely getting the message across though.
 * https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?media=onur-740-fall13-module5.1.1-simd-and-gpus-part1.pdf hmm this sort-of gets the point across. duplicate of other slides? mentions Ahmdahl's Law.
 * http://www.icl.utk.edu/~luszczek/teaching/courses/fall2013/cosc530/cosc530_ch4all6up.pdf finally something about reduction, not perfect, still doesn't actually mention that RVV or SX Aurora actually has reduction.
 * okaay, the significance of the above search sank in overnight, from the early days of Cray-1 being pipelined vs SIMD being "batches". that's the key, *and* it's notable. it may take some diagrams to explain, or just link to some of the slides (multiple instances). Slide 7 from the cmu.edu one is the key. - it produces individual element results which can be used in other computations, where you can see clearly from the Array (SIMD) part of that slide, there's no data path between SIMD Lane 0 and SIMD Lane 1 and SIMD Lane 2 and SIMD Lane 3. Lkcl (talk) 15:16, 24 June 2021 (UTC)
 * https://commons.wikimedia.org/wiki/File:Vliwpipeline.png
 * https://commons.wikimedia.org/wiki/File:Vectorsimdpipeline.png
 * unnbelievable. at last. someone talking sense. http://thebeardsage.com/vector-architecture/ the paragraph "Array (SIMD) processors have to wait for all LOADs to complete before all processing can start", this is the key. Lkcl (talk) 14:50, 25 June 2021 (UTC)
 * Simd vs vector.png
 * https://commons.wikimedia.org/wiki/File:Vectorsimdpipeline.png
 * unnbelievable. at last. someone talking sense. http://thebeardsage.com/vector-architecture/ the paragraph "Array (SIMD) processors have to wait for all LOADs to complete before all processing can start", this is the key. Lkcl (talk) 14:50, 25 June 2021 (UTC)
 * Simd vs vector.png
 * Simd vs vector.png

Array processors needs to redirect to Flynn's taxonomy
arg after guy kindly found the 1972 flynn paper, strictly speaking the redirect Array Processors should instead be to the (new) subsection in Flynn's taxonomy because Array Processor is a subclass of SIMD. whoops. Lkcl (talk) 05:43, 18 June 2021 (UTC)

TODO find ISAs with array reordering
https://en.wikipedia.org/wiki/Array_data_structure#Compact_layouts

ASPEX's DMA Engine was able to do up to 3 dimensions of reordering. need to find other processors, Mitch Alsup mentioned on comp.arch that some AMD GPUs could do it, must look them up. Lkcl (talk) 22:49, 19 June 2021 (UTC)

This article needs to be rewritten
This article has numerous severe problems. It looks like there is a significant amount of editorializing, improper synthesis, and original research present, given the tone and content of this article, and the fact that majority of the sources cited in are primary sources. The entire article feels like it was written by taking what is found in ARM SVE and the RISC "V" Vector Extension and applying it forcibly to vector processors, instead of treating these as just two examples of vector architectures, as is evidenced by all the mentions of SVE and RVV in the article.

Because of this, the article actually misses all of the fundamental theory pertaining vector processor architecture and the organizations they enable.

For instance, it more or less ignores chaining, even though this technique is quintessential to vector processor. Inexplicably, chaining is hidden away in another article, Chaining (vector processing), even though this technique has no relevance outside of vector processors! A closely related concept, tailgating, is something this article is completely ignorant of.

There is no explicit explanation of how a vector processor could vary the number of operations it performs per cycle on the spatial dimension by varying the number of lanes it has. In fact, the term lane is related to the vector length in a manner that is entirely incorrect in the section, Vector processor! The number of lanes is not the maximum vector length; it is the number of pipelines that operate concurrently to execute one vector instruction. The maximum vector length is the maximum number of elements that fits in one vector register (for register-to-register vector processors).

The above discussion leads to another egregious omission: that vector processors can be of two kinds: memory-to-memory and register-to-register. This article has a couple of incidental mentions, neglecting the fact that this is in fact a central issue as to how vector processors work: whether it is memory-to-memory and register-to-register goes a long way to explain the relationship between vector length and the speed-up attained over a scalar processor (and for fun, vector caches can be added to this discussion).

The treatment of vector masks is also fatally flawed. It is viewed entirely through the lens of ARM SVE, where they are called predicate elements (IIRC, I don't have my ARM SVE manual at-hand), despite it being a feature of one of the first standalone vector processors, the CDC STAR-100 from the 1970s, and a feature of most vector processors since. Why this article treats SVE's conception of vector masking as the progenitor is beyond me.

All of the above shortcomings are central issues in chapter 4 and appendix F of Hennessey and Patterson's Computer Architecture: A Quantitative Approach (5E).

The other problems are too numerous to discuss here. This entire article needs to have the synthesis, running commentary, and original research removed. Then, it needs to be rewritten from scratch based on reliable, secondary sources such as textbooks, monographs, surveys, not the lecture slides, architecture manuals, and research papers that this article uses (and misuses&mdash;the research papers are likely being used for their related work sections, which describe the state of the art&mdash;this is misuse because the purpose of related work sections in papers is to contextualize the research presented, not to explain, analyze, or survey vector processing).

As a final note, array processors are presented here as synonymous to vector processors. While it is true that some authors use the term to refer to what other authors call vector processors, in many cases where it is used, it is in reference to something else: either SIMD processor arrays (such as the ILLIAC IV) or processors that are designed for processing arrays of data, but have no conception of an array in their architecture (and thus cannot be SIMD processor arrays or vector processors) (such as the Floating Point Systems AP-120B). See R.W. Hockney and C.R. Jesshope's Parallel Computers (2E), sections 1.1.3, 1.1.4, 1.1.6, and figure 1.2. This is a good source because it is from a time when the literature had to be specific about what kind of machine was being discussed, as vector processors, SIMD processor arrays, and "array processors" were in existence. This is in contrast to books from the 2000s and later (which are easy to find on Google), which only mention these topics in passing, without careful analysis.

I feel like if this was explained, then this article would be much more focused on vector processors, and would not have so many digressions as to what GPUs do (which Hennessey and Patterson's Computer Architecture states are related but also distinct from vector processors). HTW217 (talk) 11:42, 20 November 2021 (UTC)

hooray! finally! someone else who knows what the hwll rhey're talking about. you should have seen the mess of fundamental flawed assertions made when i began editing about a year ago: it stated something akin to "A GPU is a Vector Processor therefore SIMD equals Vectors". i did my best, however have not looked at the page for,some considerable time. if anyone has since edited it and asserted "Predicate Masks Equals SVE" that was definitely NOT me because i know it to be blatantly false. the Array Processors was a legacy redirect long before i contributed to the page. yes, terminology in this complex area is massively confused, and Multi-billion-dollar Corporations trying to peddle their Packed SIMD processors as "Vector" (even IBM calling Packed SIMD "VSX") is not helping. lastly: if Patterson had not been so shockingly arrogantly rude to me at a Conference i attended i would be much more inclined to listen to what he has to teach. Lkcl (talk) 22:44, 6 May 2022 (UTC)

followup: basically the page was a bit of a mess to start with, and so full of misunderstandings and mistakes, but had such a high pagerank, i had to do... *something*. but it is an incremental process and starting from a "legacy" position if you know what i mean. i'd be more than happy to collaborate.

one thing: there is a cyclic dependency problem here with this page. it had been so wrong for such a long time and has such a high pagerank that its misinformation was starting to leak into online textbook material. this will need to be taken into consideration when editing the page, because by citing those sources which themselves used the misinformation it reinforces the misinformation.

the other thing is that with the entire industry except for NEC pretty much having passed Vector Processing by for almost three decades, finding modern online primary sources is almost impossible. i had to go to a specialist archive site to find the Cray-1 tech manual which was a PDF of scanned images from a carbon-copy old school typewriter!

Lkcl (talk) 09:40, 7 May 2022 (UTC)

followup: basically the page was a bit of a mess to start with, and so full of misunderstandings and mistakes, but had such a high pagerank, i had to do... *something*. but it is an incremental process and starting from a "legacy" position if you know what i mean. i'd be more than happy to collaborate.

one thing: there is a cyclic dependency problem here with this page. it had been so wrong for such a long time and has such a high pagerank that its misinformation was starting to leak into online textbook material. this will need to be taken into consideration when editing the page, because by citing those sources which themselves used the misinformation it reinforces the misinformation.

the other thing is that with the entire industry except for NEC pretty much having passed Vector Processing by for almost three decades, finding modern online primary sources is almost impossible. i had to go to a specialist archive site to find the Cray-1 tech manual which was a PDF of scanned images from a carbon-copy old school typewriter!

Lkcl (talk) 09:52, 7 May 2022 (UTC)

Removal of the Aspex Microelectronics ASP from the Supercomputers section
The paragraph describing the Aspex Microelectronics ASP has been removed. The reasons for this removal are as follows:

Associative processors are categorized under the SIMD category in Flynn's taxonomy, but they are architecturally distinct from the vector processors this article is about, which are also categorized as SIMD. Flynn's 1972 paper describing an updated version of the taxonomy makes it quite clear they are distinct (note that the paper refers to vector processors as a pipelined version of the processor array [e.g. ILLIAC IV]).

The paragraph justifies the inclusion of the ASP by stating "[it] categorised itself as "Massive wide SIMD" but had bit-level ALUs and bit-level predication, and so could definitively be considered an array (vector) processor." This is a textbook example of weasel words. The statement also is advancing a case that this processor is a vector processor, which is clearly original research. No reliable secondary or tertiary sources have been presented to support the claim.

The cited paper's abstract describes it as an associative processor with a fine-grained SIMD architecture. The paper is behind a paywall, so I can't (and won't) access it, but there's nothing in the abstract to suggest it could be a vector processor. The company's marketing materials offer no further clarity on this matter. These are all primary sources that contradict the claim that it is a vector processor.

A related issue is the overt exaggeration and promotion leading to falsities. It's bit-level ALUs, bit-level predication, 4,096 PEs, and CAMs, which are described as an "extreme and rare example of an Array Processor" is patent nonsense. 1-bit PEs are found in the Thinking Machines CM-1. Predication (masking) isn't unique; every array processor since Slotnick's SOLOMON I from the early 1960s has it; when the PEs (ALUs) are 1-bit wide, then it is natural that masking is done on a bit-by-bit basis. That it had 4,096 PEs is unremarkable; the CM-1 could have up to 65,536. That each PE had its own CAM is an intrinsic feature of all associative processors.

The paragraph's inclusion in a section describing vector supercomputers is even more perplexing. Even if it were a vector processor, it is not a vector supercomputer. The cited paper claims the processor will be used in a massive parallel neural network. But this does not appear to have materialized, AFAICT, from Google searches. In fact, Google searches turn up very little relevant material about this company and its processor. Its would seem the paragraph overstates their impact. HTW217 (talk) 11:29, 6 January 2022 (UTC)