User:Dahliamalkhi/quorums

Problem Definition
Quorum systems are tools for increasing the availability and efficiency of replicated services. A quo- rum system for a universe of servers is a collection of subsets of servers, each pair of which intersect. Intuitively, each quorum can operate on behalf of the system, thus increasing its availability and performance, while the intersection property guarantees that operations done on distinct quorums preserve consistency.

The motivation for quorum systems stems from the need to make critical missions performed by machines reliable. The only way to increase the reliability of a service, aside from using in- trinsically more robust hardware, is via replication. To make a service robust, it can be installed on multiple identical servers, each one of which holds a copy of the service state and performs read/write operations on it. This allows the system to provide information and perform operations even if some machines fail or communication links go down. Unfortunately, replication incurs a cost in the need to maintain the servers consistent. To enhance the availability and performance of a replicated service, Gifford and Thomas introduced in 1979 [3, 14] the usage of votes assigned to each server, such that a majority of the sum of votes is sufficient to perform operations. More generally, quorum systems are defined formally as follows:

Quorum system: Assume a universe U of servers, |U| = n, and an arbitrary number of clients. A quorum system Q ⊆ 2U is a set of subsets of U, every pair of which intersect. Each Q ∈ Q is called a quorum.

Access Protocol
To demonstrate the usability of quorum systems in constructing replicated services, quorums are used here to implement a multi-writer multi-reader atomic shared variable. Quorums have also been used in various mutual exclusion protocols, to achieve Consensus, and in commit protocols.

In our application, clients perform read and write operations on a variable x that is replicated at each server in the universe U. A copy of the variable x is stored at each server, along with a times- tamp value t. Timestamps are assigned by a client to each replica of the variable when the client writes the replica. Different clients choose distinct timestamps, e.g., by choosing integers appended with the name of c in the low-order bits.

The read and write operations are implemented as follows.

Write: For a client c to write the value v, it queries each server in some quorum Q to obtain a set of value/timestamp pairs A = {}u∈Q; chooses a timestamp t ∈ Tc greater than the highest timestamp value in A; and updates x and the associated timestamp at each server in Q to v and t, respectively.

Read: For a client to read x, it queries each server in some quorum Q to obtain a set of value/timestamp pairs A = {}u∈Q. The client then chooses the pair  with the highest timestamp in A to obtain the result of the read operation. It writes back  to each server in some quorum Q′.

In both read and write operations, each server updates its local variable and timestamp to the received values  only if t is greater than the timestamp currently associated with the vari- able. The above protocol correctly implements the semantics of a multi-writer multi-reader atomic variable (see Linearizability, Sequential Consistency).

Key Results
Perhaps the two most obvious quorum systems are the singleton, and the set of majorities, or more generally, weighted majorities suggested by Gifford [3]. Singleton: The set system Q = for some u ∈ U is the singleton quorum system. Weighted Majorities: Assume that every server s in the universe U is assigned a number of votes ws. Then, the set system Q = {Q ⊆ U : Pq∈Q wq > Pq2U wq 2 } is a quorum system called Weighted Majorities. When all the weights are the same, simply call this the system of Majorities. An example of a quorum system that cannot be defined by voting is the following Grid construction: Grid: Suppose that the universe of servers is of size n = k2 for some integer k. Arrange the universe into a √n × √n grid, as shown in Figure 1. A quorum is the union of a full row and one element from each row below the full row. This yields the Grid quorum system, whose quorums are of size O(√n). Figure 1: The Grid quorum system of 6 × 6, with one quorum shaded Maekawa suggests in [6] a quorum system that has several desirable symmetry properties, and in particular, that every pair of quorums intersect in exactly one element: FPP: Suppose that the universe of servers is of size n = q2 + q + 1, where q = pr for a prime p. It is known that a finite projective plane exists for n, with q + 1 pairwise intersecting subsets, each subset of size q + 1, and where each element is contained in q + 1 subsets. Then the set of finite projective plane subsets forms a quorum system.

Voting and Related notions
Since generally it would be senseless to access a large quorum if a subset of it is a quorum, a good definition may avoid such anomalies. Garcia-Molina and Barbara [2] call such well formed systems coteries, defined as follows: Coterie: A coterie Q ⊆ 2U is a quorum system such that for any Q,Q′ ∈ Q : Q 6⊆ Q′. Of special interest are quorum systems that cannot be reduced in size (i.e., that no quorum in the system can be reduced in size). Garcia-Molina and Barbara [2] use the term “dominates” to mean that one quorum system is always superior to another, as follows: Domination: Suppose that Q,Q′ are two coteries, Q 6= Q′, such that for every Q′ ∈ Q′, there exists a Q ∈ Q such that Q ⊆ Q′. Then Q dominates Q′. Q′ is dominated if there exists a coterie Q that dominates it, and is non-dominated if no such coterie exists. Voting was mentioned above as an intuitive way of thinking about quorum techniques. As it turns out, vote assignments and quorums are not equivalent. Garcia-Molina and Barbara [2] show that quorum systems are strictly more general than voting, i.e. each vote assignment has some corresponding quorum system but not the other way around. In fact, for a system with n servers, there is a double-exponential (22cn ) number of non-dominated coteries, and only O(2n2 ) different vote assignments, though for n ≤ 5, voting and non-dominated coteries are identical.

Measures
Several measures of quality have been identified to address the question of which quorum system works best for a given set of servers; among these, load and availability are elaborated on here. Load A measure of the inherent performance of a quorum system is its load. Naor and Wool define in [10] the load of a quorum system as the probability of accessing the busiest server in the best case. More precisely, given a quorum system Q, an access strategy w is a probability distribution on the elements of Q; i.e., PQ∈Q w(Q) = 1. w(Q) is the probability that quorum Q will be chosen when the service is accessed. Load is then defined as follows: Load: Let a strategy w be given for a quorum system Q = {Q1,. . . ,Qm} over a universe U. For an element u ∈ U, the load induced by w on u is lw(u) = PQi∋u w(Qi). The load induced by a strategy w on a quorum system Q is Lw(Q) = max u∈U {lw(u)}. The system load (or just load) on a quorum system Q is L(Q) = min w {Lw(Q)}, where the minimum is taken over all strategies. The load is a best case definition, and will be achieved only if an optimal access strategy is used, and only in the case that no failures occur. A strength of this definition is that load is a property of a quorum system, and not of the protocol using it. The following theorem was proved in [10] for all quorum systems. Theorem: Let Q be a quorum system over a universe of n elements. Denote by c(Q) the size of the smallest quorum of Q. Then L(Q) ≥ max{ 1 c(Q), c(Q) n }. Consequently, L(Q) ≥ 1 √n. Availability The resilience f of a quorum system provides one measure of how many crash failures a quorum system is guaranteed to survive. Resilience: The resilience f of a quorum system Q is the largest k such that for every set K ⊆ U, Note that, the resilience f is at most c(Q) − 1, since by disabling the members of the smallest quorum every quorum is hit. It is possible, however, that an f-resilient quorum system, though vulnerable to a few failure configurations of f +1 failures, can survive many configurations of more than f failures. One way to measure this property of a quorum system is to assume that each server crashes independently with probability p and then to determine the probability Fp that no quo- rum remains completely alive. This is known as failure probability and is formally defined as follows: Failure probability: Assume that each server in the system crashes independently with probabil- ity p. For every quorum Q ∈ Q let EQ be the event that Q is hit, i.e., at least one element i ∈ Q has crashed. Let crash(Q) be the event that all the quorums Q ∈ Q were hit, i.e., crash(Q) = VQ∈Q EQ. Then the system failure probability is Fp(Q) = Pr(crash(Q)). Peleg and Wool study the availability of quorum systems in [11]. A good failure probability Fp(Q) for a quorum system Q has limn→∞ Fp(Q) = 0 when p < 1 2 . Note that, the failure probability of any quorum system whose resilience is f is at least e− (f). Majorities has the best availability when p < 1 2 ; for p = 1 2, there exist quorum constructions with Fp(Q) = 1 2 ; for p > 1 2, the singleton has the best failure probability Fp(Q) = p, but for most quorum systems, Fp(Q) tends to 1. 3.3 The load and availability of quorum systems Quorum constructions can be compared by analyzing their behavior according to the above mea- sures. The singleton has a load of 1, resilience 0, and failure probability Fp = p. This system has the best failure probability when p > 1 2, but otherwise performs poorly in both availability and load. The system of Majorities has a load of ⌈n+1 2n ⌉ ≈ 1 2 . It is resilient to ⌊n−1 2 ⌋ failures, and its failure probability is e− (n). This system has the highest possible resilience and asymptotically optimal failure probability, but poor load. Grid’s load is O( 1 √n), which is within a constant factor from optimal. However, its resilience is only √n − 1 and it has poor failure probability which tends to 1 as n grows. The resilience of a FPP quorum system is q ≈ √n. The load of FPP was analyzed in [10] and shown to be L(FPP) = q+1 n ≈ 1/√n, which is optimal. However, its failure probability tends to 1 as n grows. As demonstrated by these systems, there is a tradeoff between load and fault tolerance in quorum systems, where the resilience f of a quorum system Q satisfies f ≤ nL(Q). Thus, improving one must come at the expense of the other, and it is in fact impossible to simultaneously achieve both optimally. One might conclude that good load conflicts with low failure probability, which is not necessarily the case. In fact, there exist quorum systems such as the Paths system of Naor andWool [10] and the Triangle Lattice of Bazzi [1] that achieve asymptotically optimal load of O(1/√n) and have close to optimal failure probability for their quorum sizes. Another construction is the CWlog system of Peleg and Wool [12], which has unusually small quorum sizes of log n−log log n, and for systems with quorums of this size, has optimal load, L(CWlog) = O(1/ log n), and optimal failure probability.
 * K| = k, there exists Q ∈ Q such that K ∩ Q = ∅.

Byzantine quorum systems
For the most part, quorum systems were studied in environments where failures may simply cause servers to become unavailable (benign failures). But what if a server may exhibit arbitrary, possi- bly malicious behavior? Malkhi and Reiter [7] study of quorum systems in environments prone to arbitrary (Byzantine) behavior of servers. Intuitively, a quorum system tolerant of Byzantine fail- ures is a collection of subsets of servers, each pair of which intersect in a set containing sufficiently many correct servers to mask out the behavior of faulty servers. More precisely, Byzantine quorum systems are defined as follows: Masking quorum system: A quorum system Q is a b-masking quorum system if it has resilience f ≥ b, and each pair of quorums intersect in at least 2b + 1 elements. The masking quorum system requirements enable a client to obtain the correct answer from the service despite up to b Byzantine server failures. More precisely, a write operation remains as before; to obtain the correct value of x from a read operation, the client reads a set of value/timestamp pairs from a quorum Q and sorts them into clusters of identical pairs. It then chooses a value/timestamp pair that is returned from at least b + 1 servers, and therefore must contain at least one correct server. The properties of masking quorum systems guarantee that at least one such cluster exists. If more than one such cluster exists, the client chooses the one with the highest timestamp. It is easy to see that any value so obtained was written before, and moreover, that the most recently written value is obtained. Thus, the semantics of a multi-writer multi-reader safe variable are obtained (see Linearizability, Sequential Consistency) in a Byzantine environment. For a b-masking quorum system, the following lower bound on the load holds: Theorem: Let Q be a b-masking quorum system. Then L(Q) ≥ max{2b+1 c(Q), c(Q) n }, and conse- quently L(Q) ≥ q2b+1 n. This bound is tight, and masking quorum constructions meeting it were shown. Malkhi and Reiter explore in [7] two variations of masking quorum systems. The first, called dissemination quorum systems, is suited for services that receive and distribute self-verifying infor- mation from correct clients (e.g., digitally signed values) that faulty servers can fail to redistribute but cannot undetectably alter. The second variation, called opaque masking quorum systems, is similar to regular masking quorums in that it makes no assumption of self-verifying data, but it differs in that clients do not need to know the failure scenarios for which the service was designed. This somewhat simplifies the client protocol and, in the case that the failures are maliciously in- duced, reveals less information to clients that could guide an attack attempting to compromise the system. It is also shown in [7] how to deal with faulty clients in addition to faulty servers.

Probabilistic quorum systems
The resilience of any quorum system is bounded by half of the number of servers. Moreover, as mentioned above, there is an inherent tradeoff between low load and good resilience, so that it is in fact impossible to simultaneously achieve both optimally. In particular, quorum systems over n servers that achieve the optimal load of 1 √n can tolerate at most √n faults. To break these limitations, Malkhi et al. propose in [8] to relax the intersection property of a quorum system so that “quorums” chosen according to a specified strategy intersect only with very high probability. They accordingly name these probabilistic quorum systems. These systems admit the possibility, albeit small, that two operations will be performed at non-intersecting quorums, in which case consistency of the system may suffer. However, even a small relaxation of consistency can yield dramatic improvements in the resilience and failure probability of the system, while the load remains essentially unchanged. Probabilistic quorum systems are thus most suitable for use when availability of operations despite the presence of faults is more important than certain consistency. This might be the case if the cost of inconsistent operations is high but not irrecoverable, or if obtaining the most up-to-date information is desirable but not critical, while having no information may have heavier penalties. The family of constructions suggested in [8] is as follows: W(n, ℓ): Let U be a universe of size n. W(n, ℓ), ℓ ≥ 1, is the system hQ,wi where Q is the set system Q = {Q ⊆ U : |Q| = ℓ√n}; w is an access strategy w defined by ∀Q ∈ Q,w(Q) = 1 . The probability of choosing according to w two quorums that do not intersect is less than e−ℓ2 , and can be made sufficiently small by appropriate choice of ℓ. Since every element is in 􀀀 n−1 ℓ√n−1� quorums, the load L(W(n, ℓ)) is ℓ √n = O( 1 √n). Because only ℓ√n servers need be available in order for some quorum to be available, W(n, ℓ) is resilient to n − ℓ√n crashes. The failure probability of W(n, ℓ) is less than e− (n) for all p ≤ 1 − ℓ √n, which is asymptotically optimal. Moreover, if 1 2 ≤ p ≤ 1 − ℓ √n, this probability is provably better than any (non-probabilistic) quorum system. Relaxing consistency can also provide dramatic improvements in environments that may expe- rience Byzantine failures. More details can be found in [8]. 4 APPLICATIONS Just about any fault tolerant distributed protocol, such as Paxos [5] or consensus [1] implicitly builds on quorums, typically majorities. More concretely, scalable data repositories were built, such as Fleet [9], Rambo [4] and Rosebud [13].
 * Q|

RECOMMENDED READING
[1] C. Dwork, N. Lynch, and L. Stockmeyer, Consensus in the presence of partial syn- chrony, J. Assoc. Comput. Mach., 35 (1988), pp. 288–323. [2] H. Garcia-Molina and D. Barbara, How to assign votes in a distributed system, Journal of the ACM, 32. [3] D. K. Gifford, Weighted voting for replicated data, in Proceedings of the 7th ACM Sym- posium on Operating Systems Principles, 1979, pp. 150–162. [4] S. Gilbert, N. Lynch, and A. Shvartsman, Rambo ii: Rapidly reconfigurable atomic memory for dynamic networks, June 2003, pp. 259–268. [5] L. Lamport, The part-time parliament, ACM Transactions on Computer Systems, 16 (1998), pp. 133–169. [6] M. Maekawa, A √n algorithm for mutual exclusion in decentralized systems. journal = ACM Trans. of Computer Systems, volume = 3, number = 2, pages = 145–159, year = 1985,. [7] D. Malkhi and M. Reiter, Byzantine quorum systems, Distributed Computing, 11 (1998), pp. 203–213. [8] D. Malkhi, M. Reiter, A. Wool, and R. Wright, Probabilistic quorum systems, The Information and Computation Journal, 170 (2001), pp. 184–206. [9] D. Malkhi and M. K. Reiter, An architecture for survivable coordination in large-scale systems, IEEE Transactions on Knowledge and Data Engineering, 12 (2000), pp. 187–202. [10] M. Naor and A. Wool, The load, capacity and availability of quorum systems, SIAM Journal of Computing, 27 (1998), pp. 423–447. [11] D. Peleg and A.Wool, The availability of quorum systems, Information and Computation, 123 (1995), pp. 210–223. [12], Crumbling walls: A class of practical and efficient quorum systems, Distributed Com- puting, 10 (1997), pp. 87–98. [13] R. Rodrigues and B. Liskov, Rosebud: A scalable byzantine-fault tolerant storage ar- chitecture, in Proceedings of the 18th ACM Symposium on Operating System Principle s, 2003. [14] R. H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM Transactions on Database Systems, 4 (1979), pp. 180–209.