User:Nimahoda/sandbox

In Computer Science the Order-Maintenance Problem involves maintaining a totally ordered set supporting the following operations:


 * , which inserts X immediately after Y in the total order;
 * , which determines if X precedes Y in the total order; and
 * , which removes X from the set.

The first data structure for order-maintenance was given by Dietz in 1982. This data structure supports  in $$O(\log n)$$ amortized time and  in constant time but does not support deletion. Tsakalidis published a data structure in 1984 based on BB[α] trees with the same performance bounds that supports deletion in $$O(\log n)$$ and showed how indirection can be used to improve insertion and deletion performance to $$O(1)$$ amortized time. In 1987 Dietz and Sleator published the first data structure supporting all operations in worst-case constant time.Dietz, Paul, and Daniel Sleator. "Two algorithms for maintaining order in a list". In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pp. 365-372. ACM, 1987. Dietz, P., D. Sleator, [http://www.cs.cmu.edu/~sleator/papers/maintaining-order.html Two algorithms for maintaining order in a list], Carnegie Mellon University Computer Science technical report CMU-CS-88-113, 1988

Efficient data structures for order-maintenance have applications in many areas, including data structure persistence,Driscoll, James R., Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan. [http://dx.doi.org/10.1016/0022-0000(89)90034-2 "Making data structures persistent."] Journal of computer and system sciences 38, no. 1 (1989): 86-124. graph algorithmsEppstein, David, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. [http://dx.doi.org/10.1145/265910.265914 "Sparsification—a technique for speeding up dynamic graph algorithms."] Journal of the ACM (JACM) 44, no. 5 (1997): 669-696. and fault-tolerant data structures. Aumann, Yonatan, and Michael A. Bender. "Fault tolerant data structures." In Foundations of Computer Science, 1996. Proceedings., 37th Annual Symposium on, pp. 580-589. IEEE, 1996.

List-Labeling
A problem related to the order-maintenance problem is the list-labeling problem in which instead of the  operation the solution must maintain an assignment of labels from a universe of integers $$\{1, 2, \ldots, u\}$$ to the elements of the set such that X precedes Y in the total order if and only if X is assigned a lesser label then Y. It must also support an operation  returning the label of any node X. Note that   can be implemented simply by comparing   and   so that any solution to the list-labeling problem immediately gives one to the order-maintenance problem. In fact, most solutions to the order-maintenance problem are solutions to the list-labeling problem augmented with a level of data structure indirection to improve performance. We will see an example of this below.

For efficiency, it is desirable for the size $$u$$ of the universe be bounded in the number of elements $$n$$ stored in the data structure. In the case where $$u$$ is required to be linear in $$n$$ the problem is known as the ''packed-array maintenance or dense sequential file maintenance'' problem. Consider the elements as entries in a file and the labels as giving the address where each element is stored. Then an efficient solution to the packed-array maintenance problem would allow efficient insertions and deletions of entries into the middle of a file with no asymptotic space overhead. Willard, Dan E. "Maintaining dense sequential files in a dynamic environment." In Proceedings of the fourteenth annual ACM symposium on Theory of computing, pp. 114-121. ACM, 1982. Willard, Dan E. "A density control algorithm for doing insertions and deletions in a sequentially ordered file in a good worst-case time." Information and Computation 97, no. 2 (1992): 150-204. This usage has recent applications in cache-oblivious data structures and optimal worst-case in-place sorting.Franceschini, Gianni, and Viliam Geffert. "An In-Place Sorting with O (n log n) Comparisons and O (n) Moves." Journal of the ACM (JACM) 52, no. 4 (2005): 515-537.

However, under some restrictions, $$\Omega(\log^2 n)$$ lower bounds have been found on the performance of insertion in solutions of the list-labeling problem with linear universesDietz, Paul, and Ju Zhang. "Lower bounds for monotonic list labeling." SWAT 90 (1990): 173-180. whereas we will see below that a solution with a polynomial universe can perform insertions in $$O(\log n)$$ time.

List-Labeling and Binary Search Trees
List-labeling in a universe of size polynomial in the number $$n$$ of elements in the total order is connected to the problem of maintaining balance in a binary search tree. Note that every node X of a binary search tree is implicitly labeled with an integer that corresponds to its path from the root of the tree. We call this integer the path label of X and define it as follows. Let $$\sigma$$ be the sequence of left and right descents in this path. For example, $$\sigma = (\text{left}, \text{left}, \text{right})$$ if X is the right child of the left child of the left child of the root of the tree. Then X is labeled with the integer whose base 3 representation is given by replacing every occurrence of $$\text{left}$$ in $$\sigma$$ with 0, replacing every occurrence of $$\text{right}$$ in $$\sigma$$ with 2 and appending 1 to the end of the resulting string. Then in the previous example, the path label of X is 00213 which is equal to 7 in base 10. Note that path labels preserve tree-order and so can be used to answer order queries in constant time.

If the tree has height $$h$$ then these integers come from the universe $$\{1, 2, \ldots, 3^{h+1}-2\}$$. Then if the tree is self-balancing so that it maintains a height no greater than $$c\log n$$ then the size of the universe is polynomial in $$n$$.

Therefore, the list-labeling problem can be solved by maintaining a balanced binary search tree on the elements in the list augmented with path labels for each node. However, since every rebalancing operation of the tree would have to also update these path labels, not every self-balancing tree data structure is suitable for this application. Note, for example, that rotating a node with a subtree of size $$k$$, which can be done in constant time under usual circumstances, requires $$\Omega(k)$$ path label updates. In particular, if the node being rotated is the root then the rotation would take time linear in the size of the whole tree. With that much time the entire tree could be rebuilt. We will see below that there are self-balancing binary search tree data structures that cause an appropriate number of label updates during rebalancing.

=Data Structure=

The list-labeling problem can be solved with a universe of size polynomial in the number of elements $$n$$ by augmenting a scapegoat tree with the path labels described above. Scapegoat trees do all of their rebalancing by rebuilding subtrees. Since it takes $$\Omega(k)$$ time to rebuild a subtree of size $$k$$, relabeling that subtree in $$O(k)$$ time after it is rebuilt does not change the asymptotic performance of the rebalancing operation.

Order
Given two nodes X and Y,  determines their order by comparing their path labels.

Insert
Given a new node for X and a pointer to the node Y,  inserts X immediately after Y in the usual way. If a rebalancing operation is required, the appropriate subtree is rebuilt, as usual for a scapegoat tree. Then that subtree is traversed depth first and the path labels of each of its nodes is updated. As described above, this asymptotically takes no longer than the usual rebuilding operation.

Delete
Deletion is performed similarly to insertion. Given the node X to be deleted,  removes X as usual. Any subsequent rebuilding operation is followed by a relabeling of the rebuilt subtree.

Analysis
It follows from the argument above about the rebalancing performance of a scapegoat tree augmented with path labels that the insertion and deletion performance of this data structure are asymptotically no worse than in a regular scapegoat tree. Then, since insertions and deletions take $$O(\log n)$$ amortized time in scapegoat trees, the same holds for this data structure.

Furthermore, since a scapegoat tree with parameter α maintains a height of at most $$\log_{1/\alpha}n + 1$$, the path labels have size at most $$3^{\log_{1/\alpha}n + 2}-2 \le 9n^{\frac{1}{\log_3(1/\alpha)}}$$. For the typical value of $$\alpha=\tfrac{2}{3}$$ then the labels are at most $$9n^3$$.

Of course, the order of two nodes can be determined in constant time by comparing their labels. A closer inspection shows that this holds even for large $$n$$. Specifically, if the word size of the machine is $$w$$ bits, then typically it can address at most $$2^w$$ nodes so that $$n < 2^w$$. It follows that the universe has size at most $$9\cdot 2^{3w}$$ and so that the labels can be represented with constant number (at most $$4 + 3w$$) of bits.

Lower Bound on List-Labeling
It has been shown that any solution to the list-labeling problem with a universe polynomial in the number of elements will have insertion and deletion performance no better than $$\Omega(\log n)$$. Then, for list-labeling, the above solution is asymptotically optimal. (Incidentally, this also proves an $$\Omega(\log n)$$ lower bound on the amortized rebalancing time of an insertion or deletion into a scapegoat tree.)

However, this lower bound does not apply to the order-maintenance problem and, as stated above, there are data structures that give worst-case constant time performance on all order-maintenance operations.

O(1) Amortized Insertion via Indirection
Indirection is a technique in data structures in which a problem split into multiple levels of a data structure in order to improve performance. Typically, a problem of size $$n$$ is split into $$n/\log n$$ problems of size $$\log n$$. For example, in y-fast tries. A similar strategy works to improve the insertion and deletion performance of the data structure described above to constant amortized time. In fact, this strategy works for any solution of the list-labeling problem with $$O(\log n)$$ amortized insertion and deletion time.

Specifically, instead of storing the $$n$$ elements of the total order directly in the tree, we split them up into $$n/\log n$$ contiguous sublists each of size $$\log n$$. We store nodes representing each of these sublists in the tree in their original. For each sublist we maintain a doubly-linked list of its elements, storing with each element a pointer to its representative in the tree as well as a local integer label, independent of the labels used in the tree. These labels are used to compare any two elements of the same sublist. Initially, we give the elements of each sublist are given the local labels $$\log n, 2\log n, \ldots, \log^2 n$$. This uniform labeling will not be maintained strictly however. Instead, the sublists will be periodically rebuilt.

Order
Given the sublist nodes X and Y, we answer  by first checking if the two nodes are in the same sublist. If so, we simply compare their local labels. Otherwise we compare their representatives in the tree. This takes constant time.

Insert
Given a new sublist node for X and a pointer to the sublist node Y, we perform  by inserting X immediately after Y in the sublist of Y. If X is inserted at the end of the sublist, we give it the local label $$\text{localLabel}(Y) + \log n$$, where $$\text{localLabel}(Y)$$ is the local label of Y; otherwise, if possible we label it with the floor of the average of the local labels of its two neighbours.

If the neighbours of Y have contiguous local labels then at least $$\log n$$ elements have been added to the sublist since it was last rebuilt. Then the sublist can be split into sublists of size $$\log n$$, each rebuilt with uniform local labeling and representatives of these sublists can be inserted into the tree.

Delete
Given a sublist node X to be deleted,  is performed simply by removing X from its sublist.

=See Also=
 * Scapegoat tree

=References=

=External Links=
 * Two simplified algorithms for maintaining order in a list. - This paper presents a list-labeling data structure with amortized performance that does not explicitly store a tree. The analysis given is simpler than the one given by (Dietz and Sleator, 1987) for a similar data structure.
 * Open Data Structures - Chapter 8 - Scapegoat trees