Fork–join queue



In queueing theory, a discipline within the mathematical theory of probability, a fork–join queue is a queue where incoming jobs are split on arrival for service by numerous servers and joined before departure. The model is often used for parallel computations or systems where products need to be obtained simultaneously from different suppliers (in a warehouse or manufacturing setting). The key quantity of interest in this model is usually the time taken to service a complete job. The model has been described as a "key model for the performance analysis of parallel and distributed systems." Few analytical results exist for fork–join queues, but various approximations are known.

The situation where jobs arrive according to a Poisson process and service times are exponentially distributed is sometimes referred to as a Flatto–Hahn–Wright model or FHW model.

Definition
On arrival at the fork point, a job is split into N sub-jobs which are served by each of the N servers. After service, sub-job wait until all other sub-jobs have also been processed. The sub-jobs are then rejoined and leave the system.

For the fork–join queue to be stable the input rate must be strictly less than sum of the service rates at the service nodes.

Applications
Fork–join queues have been used to model zoned RAID systems, parallel computations and for modelling order fulfilment in warehouses.

Response time
The response time (or sojourn time ) is the total amount of time a job spends in the system.

Distribution
Ko and Serfozo give an approximation for the response time distribution when service times are exponentially distributed and jobs arrive either according to a Poisson process or a general distribution. QIu, Pérez and Harrison give an approximation method when service times have a phase-type distribution.

Average response time
An exact formula for the average response time is only known in the case of two servers (N=2) with exponentially distributed service times (where each server is an M/M/1 queue). In this situation, the response time (total time a job spends in the system) is
 * $$\frac{12-\rho}{8\mu(1-\rho)}$$

where In the situation where nodes are M/M/1 queues and N > 2, Varki's modification of mean value analysis can also be used to give an approximate value for the average response time.
 * $$\rho=\lambda/\mu$$ is the utilization.
 * $$\lambda$$ is the arrival rate of jobs to all the nodes.
 * $$\mu$$ is the service rate across all the nodes.

For general service times (where each node is an M/G/1 queue) Baccelli and Makowski give bounds for the average response time and higher moments of this quantity both in the transient and steady state situations. Kemper and Mandjes show that for some parameters these bounds are not tight and show demonstrate an approximation technique. For heterogeneous fork-join queues (fork-join queues with different service times), Alomari and Menasce propose an approximation based on harmonic numbers that can be extended to cover more general cases such as probabilistic fork, open and closed fork-join queues.

Subtask dispersion
The subtask dispersion, defined to be the range of service times, can be numerically computed and optimal deterministic delays introduced to minimize the range.

Stationary distribution
In general the stationary distribution of the number of jobs at each queue is intractable. Flatto considered the case of two servers (N=2) and derived the stationary distribution for the number of jobs at each queue via uniformization techniques. Pinotsi and Zazanis show that a product form solution exists when arrivals are deterministic as the queue lengths are then independent D/M/1 queues.

Heavy traffic/diffusion approximation
When the server is heavily loaded (service rate of the queue is only just larger than arrival rate) the queue length process can be approximated by a reflected Brownian motion which converges to the same stationary distribution as the original queueing process. Under limiting conditions the state space of the synchronisation queues collapses and all queues behave identically.

Join queue distribution
Once jobs are served, the parts are reassembled at the join queue. Nelson and Tantawi published the distribution of the join queue length in the situation where all servers have the same service rate. Heterogeneous service rates and distribution asymptotic analysis are considered by Li and Zhao.

Networks of fork–join queues
An approximate formula can be used to calculate the response time distribution for a network of fork–join queues joined in series (one after the other).

Split–merge model
A related model is the split–merge model, for which analytical results exist. Exact results for the split-merge queue are given by Fiorini and Lipsky. Here on arrival a job is split into N sub-tasks which are serviced in parallel. Only when all the tasks finish servicing and have rejoined can the next job start. This leads to a slower response time on average.

Generalized (n,k) fork-join system
A generalization of the fork-join queueing system is the $$(n,k)$$ fork-join system where the job exits the system when any $$ k $$ out of $$n$$ tasks are served. The traditional fork-join queueing system is a special case of the $$ (n,k) $$ system when $$ k = n $$. Bounds on the mean response time of this generalized system were found by Joshi, Liu and Soljanin.