Reliable Server Pooling

Reliable Server Pooling (RSerPool) is a computer protocol framework for management of and access to multiple, coordinated (pooled) servers. RSerPool is an IETF standard, which has been developed by the IETF RSerPool Working Group and documented in RFC 5351, RFC 5352, RFC 5353, RFC 5354, RFC 5355 and RFC 5356.

Introduction
In the terminology of RSerPool a server is denoted as a Pool Element (PE). In its Pool, it is identified by its Pool Element Identifier (PE ID), a 32-bit number. The PE ID is randomly chosen upon a PE's registration to its pool. The set of all pools is denoted as the Handlespace. In older literature, it may be denoted as Namespace. This denomination has been dropped in order to avoid confusion with the Domain Name System (DNS). Each pool in a handlespace is identified by a unique Pool Handle (PH), which is represented by an arbitrary byte vector. Usually, this is an ASCII or Unicode name of the pool, e.g. "DownloadPool" or "WebServerPool".

Each handlespace has a certain scope (e.g. an organization or company), denoted as Operation Scope. It is explicitly not a goal of RSerPool to manage the global Internet's pools within a single handlespace. Due to the localisation of Operation Scopes, it is possible to keep the handlespace "flat". That is, PHs do not have any hierarchy in contrast to the Domain Name System with its top-level and sub-domains. This constraint results in a significant simplification of handlespace management.

Within an operation scope, the handlespace is managed by redundant Pool Registrars (PR), also denoted as ENRP servers or Name Servers (NS). PRs have to be redundant in order to avoid a PR to become a Single Point of Failure (SPoF). Each PR of an operation scope is identified by its Registrar ID (PR ID), which is a 32-bit random number. It is not necessary to ensure uniqueness of PR IDs. A PR contains a complete copy of the operation scope's handlespace. PRs of an operation scope synchronize their view of the handlespace using the Endpoint Handlespace Redundancy Protocol (ENRP). Older versions of this protocol use the term Endpoint Namespace Redundancy Protocol; this naming has been replaced to avoid confusion with Domain Name System|DNS, but the abbreviation has been kept. Due to handlespace synchronization by ENRP, PRs of an operation scope are functionally equal. That is, if any of the PRs fails, each other PR is able to seamlessly replace it.

Using the Aggregate Server Access Protocol (ASAP), a PE can add itself to a pool or remove it from by requesting a registration or deregistration from an arbitrary PR of the operation scope. In case of successful registration, the PR chosen for registration becomes the PE's Home-PR (PR-H). A PR-H not only informs the other PRs of the operation scope about the registration or deregistration of its PEs, it also monitors the availability of its PEs by ASAP Keep-Alive messages. A keep-alive message by a PR-H has to be acknowledged by the PE within a certain time interval. If the PE fails to answer within a certain timeout, it is assumed to be dead and immediately removed from the handlespace. Furthermore, a PE is expected to re-register regularly. At a re-registration, it is also possible for the PE to change its list of transport addresses or its policy information.

To use the service of a pool, a client — called Pool User (PU) in RSerPool terminology — first has to request the resolution of the pool's PH to a list of PE identities at an arbitrary PR of the operation scope. This selection procedure is denoted as Handle Resolution. For the case that the requested pool is existing, the PR will select a list of PE identities according to the pool's Pool Member Selection Policy, also simply denoted as Pool Policy.

Possible pool policies are e.g. a random selection (Random) or the least-loaded PE (Least Used). While in the first case it is not necessary to have any selection information (PEs are selected randomly), it is required to maintain up-to-date load information in the second case of selecting the least-loaded PE. Using an appropriate selection policy, it is e.g. possible to equally distribute the request load onto the pool's PEs.

After reception of a list of PE identities from a PR, a PU will write the PE information into its local cache. This cache is denoted as PU-side Cache. Out of its cache, the PU will select exactly one PE — again using the pool's selection policy — and establish a connection to it using the application's protocol, e.g. HTTP over SCTP or TCP in case of a web server. Using this connection, the service provided by the server is used. For the case that the establishment of the connection fails or the connection is aborted during service usage, a new PE can be selected by repeating the described selection procedure. If the information in the PU-side cache is not outdated, a PE identity may be directly selected from cache, skipping the effort of asking a PR for handle resolution. After re-establishing a connection with a new PE, the state of the application session has to be re-instantiated on the new PE. The procedure necessary for session resumption is denoted as Failover Procedure and is of course application-specific. For an FTP download for example, the failover procedure could mean to tell the new FTP server the file name and the last received data position. By that, the FTP server will be able to resume the download session. Since the failover procedure is highly application-dependent, it is not part of RSerPool itself, though RSerPool provides far reaching support for the implementation of arbitrary failover schemes by its Session Layer mechanisms.

To make it possible for RSerPool components to configure automatically, PRs can announce themselves via UDP over IP multicast. These announces can be received by PEs, PUs and other PRs, allowing them to learn the list of PRs currently available in the operation scope. The advantage of using IP multicast instead of broadcast is that this mechanism will also work over routers (e.g. LANs inter-connected via a VPN) and the announces will — for the case of e.g. a switched Ethernet — only be heard and processed by stations actually interested in this information. For the case that IP multicast is not available, it is of course possible to statically configure PR addresses.

Implementations
The following implementations are known:
 * RSPLIB Project (open source; is also reference implementation of the IETF RSerPool WG)
 * Motorola
 * Cisco
 * Münster University of Applied Sciences

RFCs

 * - Requirements for Reliable Server Pooling
 * - An Overview of Reliable Server Pooling Protocols
 * - Aggregate Server Access Protocol (ASAP)
 * - Endpoint Handlespace Redundancy Protocol (ENRP)
 * - Aggregate Server Access Protocol (ASAP) and Endpoint Handlespace Redundancy Protocol (ENRP) Parameters
 * - Threats Introduced by Reliable Server Pooling (RSerPool) and Requirements for Security in Response to Threats
 * - Reliable Server Pooling Policies
 * - Reliable Server Pooling MIB Module Definition

Working Group Drafts

 * Architecture for Reliable Server Pooling
 * Comparison of Protocols for Reliable Server Pooling
 * TCP Mapping for Reliable Server Pooling Enhanced Mode
 * Services Provided By Reliable Server Pooling
 * Reliable Server Pooling Sockets API Extensions

Other Drafts

 * Handle Resolution Option for ASAP
 * Least-Used Policy for Reliable Server Pooling
 * Reliable Server Pooling Applicability for IP Flow Information Exchange
 * Applicability of Reliable Server Pooling for Real-Time Distributed Computing
 * Reliable Server Pooling (RSerPool) Bakeoff Scoring
 * Applicability of Reliable Server Pooling for SCTP-Based Endpoint Mobility
 * Takeover Suggestion Flag for the ENRP Handle Update Message
 * The Applicability of Reliable Server Pooling (RSerPool) for Virtual Network Function Resource Pooling (VNFPOOL)
 * Ideas for a Next Generation of the Reliable Server Pooling Framework