Gearman

Gearman is an open-source application framework designed to distribute appropriate computer tasks to multiple computers, so large tasks can be done more quickly. In some cases, load balancing rather than raw speed may be the main goal; a Web server, for instance, could use Gearman to send tasks for which it is not optimized to another computer (which may be running on a different architecture, using another operating system, or loaded with a computer language better suited to a particular operation).

It was originally written in Perl by Brad Fitzpatrick. Brian Aker and Eric Day rewrote the framework in C.

How Gearman Works


Gearman assigns each involved computer a role as client, job server, or worker. A worker machine can be assigned multiple instances of the worker role, which allows more powerful computers to complete more portions of a given task. Tasks originate on a client, are transmitted from the client to the job server, and performed on one or more workers. The completed task's output is then returned, again by way of the job server, to the client where the task originated. Gearman is conceptually related to MapReduce; Gearman handles MapReduce by allowing worker nodes to map out work to other workers, with the original worker acting as the reducer.

Gearman performs coalescence on the work sent by a client. If two or more clients ask for work to be completed on the same body of work, either by seeing that the same blocks are being sent or by using the unique value sent by the client, it will coalesce the work so that only one worker is used. It does this specifically to avoid thundering herd problems which are common to cache hit failures.

To mitigate the damage that would be done if a job server (or its network connection) were to fail, clients can be configured with more than one assigned job server; if the first assigned job server fails, another can be transparently substituted.

Gearman implements a protocol that consists of binary packets containing requests and responses; this protocol defines the structure of messages passing between the three parts of a Gearman implementation. By default, the Gearman protocol uses TCP port 4730. It previously operated on port 7003, but this conflicted with the AFS port range and the new port (4730) was assigned by IANA.

The name "Gearman" was chosen as an anagram for "Manager", "since it dispatches jobs to be done, but does not do anything useful itself."

Features

 * Job retries
 * Round robin scheduling
 * Coalescence
 * Persistence storage via:
 * libmemcached
 * libdrizzle
 * SQLite
 * MySQL
 * Postgres
 * tokyocabinet
 * Redis (unreleased - currently in development)
 * MongoDB (unreleased - currently in development)

Implementations

 * Gearmand, up to version 1.1.12
 * Gearmand, from version 1.1.13
 * java-gearman-service
 * Gearman::Server
 * TclGearman

Clients
Currently there are client libraries for C, Perl, Node.js, Python, PHP, Ruby, Java, .NET, JMS, MySQL, PostgreSQL, and Drizzle.

Similar software

 * HAProxy
 * Squid
 * Varnish
 * Træfɪk