Talk:CLMUL instruction set

please clarify what it does. yes it is some kind of multiply but what does it do differently from the regular multiply

Flok (talk) 06:03, 10 June 2013 (UTC) --

May I add a reference that shows the usefulness of the carry-less multiplication? See below:

http://doi.ieeecomputersociety.org/10.1109/TC.2007.70832

robg 16:54, 22 July 2010 (UTC)

Other applications
Carryless multiplication can be used also in many other fields, for example in computing CRC codes http://dx.doi.org/10.1109/90.477710 and in fast forwarding shift registers (like in Gold code generators). Should this article include a section on uses of this instruction?Lauri.pirttiaho (talk) 10:53, 5 February 2011 (UTC)

Affiliated sources
This article has no third-party sources; it does not establish that this instruction set finds actual use, and that it's actually fast enough for its intended purpose. Q VVERTYVS (hm?) 23:28, 26 January 2015 (UTC)
 * I added an example of actual use from CloudFlare. While I cannot speak to the specifics of whether this specific change provided a particular benefit, their fork of pngcrush (which incorporates this instruction in the DEFLATE algorithm as hash_func via _mm_crc32_u32) did improve pngcrush performance for us by ~30% (and probably more for just decode/encode) . GreenReaper (talk) 11:36, 4 September 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on CLMUL instruction set. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20080407095317/http://softwareprojects.intel.com/avx/ to http://softwareprojects.intel.com/avx/
 * Added archive https://archive.is/20131109140737/http://developer.amd.com/2009/05/06/striking-a-balance/ to http://developer.amd.com/2009/05/06/striking-a-balance/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 16:39, 28 July 2017 (UTC)

GF 2^k
The article currently says that PCLMULQDQ "performs a carry-less multiplication of two 64-bit polynomials over the finite field GF(2)[X]." However, the documentation here https://www.intel.com/content/dam/develop/external/us/en/documents/clmul-wp-rev-2-02-2014-04-20.pdf says the instruction simply does "Carry-less multiplication of one quadword (8 bytes) of xmm1 by one quadword (8 bytes) of xmm2/m128, returning a double quadword (16 bytes). The immediate byte is used for determining which quadwords of xmm1 and xmm2/m128 should be used." This is not the same as "multiplication over the finite field GF(2)[X]", since the later also requires taking the modulo with an irreducible polynomial. The current text suggests that the instruction does the modulo, which appears to be false. Thomasda (talk) 02:50, 22 January 2023 (UTC)


 * The instruction performs exactly as described. It computes the multiplication of two polynomials if the coefficients (and only the coefficients) are from the field GF(2). Multiplication of polynomials is automatically carry-less, even if the coefficients are natural numbers, for example. No idea why people feel the need to explicitly specify it. Anyhow, arithmetic on polynomials with coefficients in GF(2) is part of the definition of CRC. To compute CRC, the input is interpreted as a polynomial of arbitrary (but fixed) degree.
 * GF(2)[X] is a Galois fields constructed by considering polynomials modulo a certain other irreducible polynomial. This is not what PCLMUL is about! In that sense, the Intel documentation must be wrong. If you look at the definition of multiplication of two polynomials in GF(2)[X], you first compute a*b and then take a*b module the irreducible polynomial. PCLMUL gives you a*b but not the modulo. Skoehler (talk) 00:33, 15 November 2023 (UTC)

And other platforms?
Why is this article so Intel centric? Other architectures certainly offer support for polynomial multiplication over GF(2). ARM for example has PMULL and PMULL2. Maybe RISC-V also has instructions? Skoehler (talk) 00:35, 15 November 2023 (UTC)