Block nested loop

A block-nested loop (BNL) is an algorithm used to join two relations in a relational database.

This algorithm is a variation of the simple nested loop join and joins two relations $$R$$ and $$S$$ (the "outer" and "inner" join operands, respectively). Suppose $$|R| < |S|$$. In a traditional nested loop join, $$S$$ will be scanned once for every tuple of $$R$$. If there are many qualifying $$R$$ tuples, and particularly if there is no applicable index for the join key on $$S$$, this operation will be very expensive.

The block nested loop join algorithm improves on the simple nested loop join by only scanning $$S$$ once for every group of $$R$$ tuples. Here groups are disjoint sets of tuples in $$R$$ and the union of all groups has the same tuples as $$R$$. For example, one variant of the block nested loop join reads an entire page of $$R$$ tuples into memory and loads them into a hash table. It then scans $$S$$, and probes the hash table to find $$S$$ tuples that match any of the tuples in the current page of $$R$$. This reduces the number of scans of $$S$$ that are necessary.

algorithm block_nested_loop_join is for each page pr in R do for each page ps in S do for each tuple r in pr do for each tuple s in ps do if r and s satisfy the join condition then yield tuple 

A more aggressive variant of this algorithm loads as many pages of $$R$$ as can be fit in the available memory, loading all such tuples into a hash table, and then repeatedly scans $$S$$. This further reduces the number of scans of $$S$$ that are necessary. In fact, this algorithm is essentially a special-case of the classic hash join algorithm.

The block nested loop runs in $$O(P_r P_s/M)$$ I/Os where $$M$$ is the number of available pages of internal memory and $$P_r$$ and $$P_s$$ is size of $$R$$ and $$S$$ respectively in pages. Note that block nested loop runs in $$O(P_r+P_s)$$ I/Os if $$R$$ fits in the available internal memory.