User:Ddoty/sandbox

A prefix hash tree (PHT) is a distributed data structure that enables more sophisticated queries over a distributed hash table (DHT). The prefix hash tree uses the lookup interface of a DHT to construct a trie-based data structure that is both efficient (updates are doubly logarithmic in the size of the domain being indexed), and resilient (the failure of any given node in a prefix hash tree does not affect the availability of data stored at other nodes).

PHT-Lookup-Linear
Given a key K, a PHT lookup operation returns the unique leaf node leaf(K) whose label is a prefix of K. Because there are (D + 1) distinct prefixes of K, there are (D + 1) potential candidates; an obvious algorithm is to perform a linear scan of these (D + 1) nodes until the required leaf node is reached. This is similar to a top-down traversal of the trie except that a DHT lookup is used to locate a PHT node given its prefix label. Pseudocode for this algorithm is given below.

Input: A key K  Output: leaf(K)

for ← 0 to D do /*Pi(K) denotes prefix of K of length i */ node ← DHT-LOOKUP(Pi(K)); if (node is a leaf node) then return node ; end return failure;

PHT-Lookup-Binary
linear search can be replaced by binary search on prefix lengths. If the current prefix is an internal node of the PHT, the search tries longer prefixes. Alternatively, if the current prefix is not an internal node of the PHT, the search tries shorter prefixes. The search terminates when the required leaf node is reached. Binary search reduces the number of DHT lookups from(D+1)to⌊log(D+1)⌋+1≈logD. [1]

Input : A key K  Output: leaf(K)

lo ← 0; hi ← D;  while (lo ≤ hi) do mid ← (lo + hi)/2; /*Pmid(K) denotes prefix of K oflength mid */ node ← DHT-LOOKUP(Pmid(K)); if (node is a leaf node) then return node ; else if (node is an internal node) then lo ← mid + 1; else hi ← mid- 1; end end return failure;

Range Query
Given two keys L and H (L ≤ H), a range query returns all keys K contained in the PHT satisfying L ≤ K ≤ H. Range queries can be implemented in a PHT in several ways; we present two simple algorithms.

The first algorithm is to locate leaf(L) using the PHT lookup operation. Now the doubly linked list of threaded leaves is traversed sequentially until the node leaf(H) is reached. All values satisying the range query are retrieved. This algorithm is simple and efficient; it initially requires log D DHT lookups to locate leaf(L). It cannot avoid traversing the remaining nodes to answer the query. The disadvantage of this algorithm is that a sequential scan of the leaf nodes may result in a high latency before the query is completely resolved.

The second algorithm is to parallelize. Using the DHT, locate the node whose label corresponds to the smallest prefix range that completely covers the specified range. If this is an internal node, then re- cursively forward the query onward to those chil- dren which overlap with the specified range. This process continuues until the leaf nodes overlapping with the query are reached. If this is not an inter- nal node, the required range query is covered by a single leaf node, which can be located by binary search. [1]