Talk:Load-link/store-conditional

Citation for CAS-based lock-free reference counting in the face of a changing object-graph, please? --Chris Purcell 11:55, 19 June 2006 (UTC)

I was just about to post on the talk page asking about this very point, and another one, but your edit got through before mine. Here it is:

Hasn't it been proven that DCAS can be implemented on top of CAS (in constant time and linear-in-threads space)? If so, then the fact that lock-free reference counting in the face of a changing object graph can be implemented in DCAS (as the previous version said) means it can be implemeneted in CAS, does it not?

On another issue, the previous version of the page had this sentence:


 * Some CPU assumes the address being accessed exclusively be configured write-through mode.

I didn't know what this meant. I rewrote it as this:


 * Some CPUs require the address being accessed exclusively to be configured in write-through mode.

However, because I couldn't figure out the original sentence, I may have misrepresented something.Falcotron 12:07, 19 June 2006 (UTC)

I don't understand the original sentence either, so without a citation it's probably best to remove it in case it's inaccurate. (Your rewriting appears to conserve truth!)

DCAS can be implemented on top of CAS, but only with out-of-line memory management, which means it can't be used to implement reference counting! --Chris Purcell 12:11, 19 June 2006 (UTC)

I thought I recently read a paper showing how DCAS-on-top-of-CAS could be used for refcounting as long as it's intrusive in the objects or the allocator. I may well be misremembering. I'll have to think it through or find the reference. Falcotron 12:36, 19 June 2006 (UTC)

Weak LL/SC vs. CAS
It's my understanding that most of the research on lock-free algorithms with LL/SC is not applicable to (most) weak implementations. And the phrase "... as it breaks many theoretical LL/SC algorithms..." sounds like it backs me up here.


 * True. However, the research is also not applicable to CAS.


 * Yes, but there's separate research on CAS, which is applicable to all real-world CAS implementations.


 * And all CAS ones :)

Further, even algorithms that are not broken are often affected, in that spuriously severed links mean spuriously repeated update attempts. (Whether this has a practical impact on performance, I don't know.) By contrast, the spuriously preserved links of CAS either break an algorithm, or have no effect whatsoever.


 * I'm afraid I have no idea what you mean here. What are "spuriously preserved links of CAS"?


 * Sorry; I was badly stretching a metaphor that I hadn't even made explicit. If a value is changed and restored between the test read and the CAS, it looks like the load and store are "linked," but they're not. Is that clearer?

Anyway, unless there has been recent research on using LL/SC on real-world CPUs that I'm not aware of (which is definitely possible), this is a serious practical restriction to implementing lock-free algorithms with weak LL/SC. This was what I was intending to say (although I apparently didn't say it very well). Falcotron 12:28, 19 June 2006 (UTC)


 * Weak LL/SC on real-world CPUs gives a very effective lock-free implementation of CAS. In fact, they will typically be fair, as well as very unlikely to fail spuriously, since they are implemented with a simple cacheline lock; the only thing likely to cause spurious failure, apart from attempting to nest another LL, is pre-emption. Indeed, modern Intel CPUs actually break CAS down into a microcode LL/SC.


 * If any load from a different address will break a link, isn't an SC more likely to fail spuriously than a CAS? Doesn't this make it harder to predict/guarantee the efficiency of LL/SC algorithms?


 * A weak LL/SC pair implementing a CAS is essentially identical to a CAS on modern hardware, and you can make lock-free progress guarantees under only very minor assumptions about the preemptive nastiness of the OS.


 * Either way, this comment is more applicable to my previous point than to this one: much LL/SC research cannot be applied to most LL/SC-capable CPUs, but all CAS research can be applied to all CAS-capable CPUs. If you're saying that in practice you really can use weak LL/SC for most important algorithms, I'll buy that. Or if I'm just out of date on the LL/SC research, I'll definitely buy that.


 * Yes, in practice you really can use weak LL/SC wherever you can use CAS.


 * So, in conclusion, there is no problem implementing lock-free algorithms with weak LL/SC that's not also a problem for CAS. --Chris Purcell 12:54, 19 June 2006 (UTC)


 * Based on what you're saying, it sounds like it might be worth changing the end of my "Weakness is relative" sentence to say that "most real-world implementations can be used for [many|most] algorithms."


 * If you want to.


 * Anyway, thanks for your patience. Falcotron 14:00, 19 June 2006 (UTC)


 * No problem! Peer review is fun ;) --Chris Purcell 16:43, 19 June 2006 (UTC)

CAS fairness
The article says: The only limitation of weak LL/SC versus CAS is that the latter is usually guaranteed to be fair, while the former is not. This is also true of strong LL/SC.

Fair in what sense? This needs a citation, especially since the instruction set manuals for Itanium, x86-64, and Sparc don't seem to mention fairness. --JeffreyYasskin (talk) 19:22, 16 December 2007 (UTC)

+1 for this one, can someone please clarify? I'll try to guess the reason with the following code:

If the read of line 3 systematically invalidates the reservation, the loop will never end, this is not really fairness, but may explain the remark on the article. Can someone confirm or deny my guess ?

-- unsigned comment added by 90.80.39.41 (talk) 26 May 2010 (UTC) —Preceding unsigned comment added by Chris Purcell (talk • contribs)

Due to absence of corroboration of the claim, I've removed it from the page. Additionally, I believe the claim is false: LL/SC has been guaranteed fair on at least some machines (one of the Power chips, would have to search for citation if this went into the article). IIRC CAS is microcoded with LL/SC on modern chips anyway. —Preceding unsigned comment added by Chris Purcell (talk • contribs) 11:33, 15 October 2010 (UTC)

Weak LL/SC vs. Restricted LL/SC
The article calls the usual hardware implementations of LL/SC weak. I have, however, the impression, that restricted is more common (e.g. Michael 2004 Hazard pointers: …, Gao and Hesselink 2006 A general lock-free algorithm using CAS, Jayanti and Petrovic 2003 Efficient and practical constractions of LL/SC variables). Further, Jayanti and Petrovic use weak LL/SC in a different sense, where weak LL is allowed to fail, if the subsequent SC will fail (they attribute this use of weak to Anderson and Moir 1995 Universal constructions for large objects). Don’t know whether this hand full of papers is representative… Rathgemz (talk) 12:57, 14 October 2010 (UTC)

relevance to c/c++11 atomics
Can anyone confirm the relation the new c/c++ standard atomic operations atomic_compare_exchange_weak/strong to ll/sc and especially the difference with CAS? --Itaj Sherman (talk) 02:42, 18 October 2013 (UTC)

ARM LL/SC implementation?
The article says: The ARM implementation defines platform dependent blocks, ranging from 8 bytes to 2048 bytes, and an LL/SC attempt in any given block fails if there is between the LL and SC a normal memory access inside the same block. The ARM implementation is the strongest and most practical.

I am fairly certain that the word "access" is incorrect (to me, "access" is a broad term, meaning either a load or a store). In fact, I believe real ARM behavior is to fail only if there is a store, and that a load will not generally fail the LL/SC attempt. I base this belief on the ARM Architecture V8-A specification from www.arm.com. From section B.2.10 Synchronization and semaphores: "The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if no other observer, process, or thread has performed a more recent store to address x." Note the use of the word "store", rather than "access".

(I have NOT edited this correction directly into the article. Would prefer that someone else does so, once they have verified my assertion.)

Article fails to point out major semantic difference wrt. LL/SC and CAS
In particular, it misses that LL/SC works by exploiting interprocessor bus traffic, whereas CAS operates on the basis of atomic access to values alone. Therefore LL/SC isn't vulnerable, and CAS is, to an A-B-A race -- where value A changes to a value B (implicitly, a gamut of other intermediate values), then back to the same value A, with the chain going unnoticed despite being significant to the algorithm. This makes LL/SC impossible to emulate using a CAS primitive alone; further runtime and datatype restrictions would be required (i.e. limiting the object of CAS to garbage-collected pointers only), or high-overhead virtualization of other CPUs' write access to the LL'd memory region through the MMU.

Also, more generally LL/SC is a topic of shared-memory multiprocessing (i.e. inclusive of shared memory clustering as in Mosix) rather than multithreading in particular. In the interest of accuracy, the wording should be revised. For references on the semantic difference, you can quote various CPU architectures' programming manuals. --84.250.160.105 (talk) 04:18, 24 October 2017 (UTC)

need to list a disadvantage
The author of rr, the record-and-replay debugger started out of Mozilla, has pointed out that it is impossible to do reliable record-and-replay on architectures with this misfeature. This is what stopped him from supporting ARM. He posted to the RISC-V architecture mailing list pointing out that RISC-V had made the same architectural mistake. 71.46.230.154 (talk) 06:09, 30 May 2019 (UTC)