User:Prohlep/Locality of reference

In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources within relatively small time durations. Spatial locality means the use of proximal data elements within a short time period, where proximity here is usually defined in terms of cache memory address. Sequential locality, a special case of spatial locality, occurs when relevant data elements are arranged and accessed linearly. For example, the simple traversal of elements in a one-dimensional array, from the base address to the highest element would exploit the sequential locality of the array in memory.

Locality is merely one type of predictable behavior that occurs in computer systems. Systems which exhibit stronger locality of reference phenomenon, are good candidates for performance optimization through the use of techniques, like the cache and prefetching technology concerning the memory, or like the advanced branch predictor at the pipelining of processors.

Locality of reference
The locality of reference, also known as the locality principle, is the phenomenon, that the collection of the data locations referenced in a short period of time in a running computer, consists mainly of relatively well predictable clusters in the spatial-temporal coordinate space, similar to the spacetime.

Important special cases of the locality are the temporal, the spatial, equidistant and the branch ones.


 * Temporal locality: if at one point in the time a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future. There is a temporal proximity between the adjacent references to the same memory location. In this case it is common to make efforts to store a copy of the referenced data in special memory storage, which can be accessed faster.  Temporal locality is a very special case of the spatial locality, namely when the prospective location is identical to the present location.


 * Spatial locality: if a particular memory location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future. There is a spatial proximity between the memory locations, referenced at almost the same time. In this case it is common to make efforts to guess, how big neighbourhood around the current reference is worthwhile to prepare for faster access.


 * Equidistant locality: it is halfway between the spatial locality and the branch locality. Consider a loop accessing locations in an equidistant pattern, i.e. the path in the spatial-temporal coordinate space is a dotted line.  This case a simple linear function can predict, which location will be accessed in the near future.


 * Branch locality: if there are only few amount of possible alternatives for the prospective part of the path in the spatial-temporal coordinate space. This is the case, when an instruction loop has a simple structure, or the possible outcome of a small system of conditional branching instructions is restricted to a small set of possibilities.  Branch locality is typically not a spatial locality, the few possibilities can be located far away from each other.

In order to make benefit from the very frequently occurring temporal and spatial kind of locality, most of the information storage systems are hierarchical, see below. The equidistant locality is usually supported by the diverse nontrivial increment instructions of the processors. For the case of branch locality, the contemporary processors have sophisticated branch predictors, and on the base of this prediction the memory manager of the processor tries to collect and preprocess the data of the plausible alternatives.

'Bold textItalic text'== Globality of reference ==

If the phenomenon of locality is weak or missing, then we can speak of globality of reference.

In this case, the information storage access is fairly random. Hence the distribution of storage locations accessed in a short time period is more scattered than otherwise. In other terms: these locations fails to form a few small clusters in the spatial-temporal coordinate space. Therefore the smaller faster hierarchy level of the storage system has no enough capacity to accommodate, to cache the data for faster handling. Consequently the fairly randomness in question yields usually a performance penalty in terms of speed.

There is more or less nothing to do with this phenomenon except knowing about it and be prepared for it. And avoid it, if possible at all, see the sections Reasons for locality and Reasons for globality below.

Luckily, the locality of reference is much more frequent than the globality of reference, if the computer system is used for the average everyday tasks, and not for artificial intelligence, see below at the section Reasons for globality. In fact, this much higher practical frequency of the locality caused at all, that the locality of reference became a recognized phenomenon and important notion.

Reasons for locality
There are several reasons for locality. These reasons are either goals to achieve or circumstances to accept, depending on the aspect. The reasons below are not disjoint, in fact the list below goes from the most general case to special cases.


 * Predictability: In fact, locality is merely one type of predictable behavior in computer systems. Luckily, many of the practical problems are decidable and hence the corresponding program can behave predictable, if it is well written.


 * Structure of the program: Locality occurs often because of the way in which computer programs are created, for handling decidable problems. Generally, related data is stored in nearby locations in storage.  One common pattern in computing involves the processing of several items, one at a time. This means that if a lot of processing is done, the single item will be accessed more than once, thus leading to temporal locality of reference.  Furthermore, moving to the next item implies that the next item will be read, hence spatial locality of reference, since memory locations are typically read in batches.


 * Linear data structures: Locality often occurs because code contains loops that tend to reference arrays or other data structures by indices. Sequential locality, a special case of spatial locality, occurs when relevant data elements are arranged and accessed linearly. For example, the simple traversal of elements in a one-dimensional array, from the base address to the highest element would exploit the sequential locality of the array in memory. The more general equidistant locality occurs, when the linear traversal is over a longer area of adjacent data structures having identical structure and size, and in addition to this, not the whole structures are in access, but only the mutually corresponding same elements of the structures.  This is the case, when a matrix is stored as a longer area of adjacent matrix rows, namely in row-major order, and the current goal is to access a single column of the matrix, but the matrix is not in column-major order.  Clearly, the set of actual memory references needed to sequentially access the column in question form a dotted line in the spatial-temporal coordinate space.  This kind of behavior is well supported by the instruction set of microprocessors.

Reasons for globality
What is called predictability in the practical life, it is approximately the decidability in mathematics, universal algebra and mathematical logic. There is a wast amount of negative results in Universal Algebra, that most of the interesting mathematical questions are undecidable, and hence the corresponding search program must behave unpredictably, and therefore any kind of locality is missing.

Finding new proofs, new ideas, new concepts is more or less equivalent to do something similar to the famous Knuth-Bendix completion algorithm, introduced at a conference in 1969 and published as a research paper in 1970. Contrary to its name, it is definitely not an algorithm, since due to the famous Gödel's incompleteness theorems, (1) the time of the termination can not be predicted, and (2) in most of the cases (nearly 100% of the cases) the program will never terminate. As a consequence of this, the handling of the storage resources, including the path in the instruction space can not be predictable. Hence, in this kind of situations, the locality phenomenon is almost completely missing!

Hence, if you do serious mathematics in artificial intelligence, then better not to be satisfied with processors having big caches, because the slower levels of the memory hierarchy does really matter, and hence it is worthwhile to have really speedy memory modules installed into the motherboard.

All the above is not far from the reality of manufacturing, designing computers. There the mathematical verification of the plans of the digital electronics and the integrated circuits can run into decidability problems. This is the reason why the industrial Larch prover, developed in the corporation of Digital Equipment Corporation and Massachusetts Institute of Technology, is limited to those kind of reasoning methods, which terminate in log linear time.

Use of locality in general
If most of the time the substantial portion of the references aggregate into clusters, and if the shape of this system of clusters can be well predicted, then it can be used for speed optimization. There are several ways to make benefit from locality. The common techniques for optimization are:


 * to increase the locality of references. This is achieved usually on the software side.


 * to exploit the locality of references. This is achieved usually on the hardware side. The temporal and spatial locality can be capitalized by hierarchical storage hardwares. The equidistant locality can be used by the appropriately specialized instructions of the processors, this possibility is not only the responsibility of hardware, but the sofware as well, whether its structure is suitable for compiling a binary program which calls the specialized instructions in question.  The branch locality is a more elaborate possibility, hence more developing effort is needed, but there is much larger reserve for future exploration in this kind of locality than in all the remaining ones.

The branch locality is the promising direction for contemporary and future developments. However, there are already nice amount of achievements using this locality, see the diverse branch predictors in the contemporary processors, or the sophisticated Code Morphing concerning the Transmeta processors, like Crusoe and Efficeon.

Use of spatial and temporal locality: hierarchical memory
Hierarchical memory is a hardware optimization making benefit of spatial and temporal locality, what can be used on several levels of the memory hierarchy. Paging obviously benefits from temporal and spatial locality. A cache is a simple example of exploiting temporal locality, because it is a specially designed faster but smaller memory area, generally used to keep recently referenced data and data near recently referenced data, which can lead to potential performance increases. Data in cache does not necessarily correspond to data that is spatially close in main memory; however, data elements are brought into cache one cache line at a time. This means that spatial locality is again important: if one element is referenced, a few neighboring elements will also be brought into cache. Finally, temporal locality plays a role on the lowest level, since results that are referenced very closely together can be kept in the machine registers. Programming languages such as C allow the programmer to suggest that certain variables are kept in registers.

Data locality is a typical memory reference feature of regular programs (though many irregular memory access patterns exist). It makes the hierarchical memory layout profitable. In computers, memory is divided up into a hierarchy in order to speed up data accesses. The lower levels of the memory hierarchy tend to be slower, but larger. Thus, a program will achieve greater performance if it uses memory while it is cached in the upper levels of the memory hierarchy and avoids bringing other data into the upper levels of the hierarchy that will displace data that will be used shortly in the future. This is an ideal, and sometimes cannot be achieved.

Typical memory hierarchy (access times and cache sizes are approximations of typical values used as of 2006 for the purpose of discussion; actual values and actual numbers of levels in the hierarchy vary):
 * CPU registers (8-32 registers) – immediate access (0-1 clock cycles)
 * L1 CPU caches (32 KiB to 128 KiB) – fast access (3 clock cycles)
 * L2 CPU caches (128 KiB to 12 MiB) – slightly slower access (10 clock cycles)
 * Main physical memory (RAM) (256 MiB to 4 GiB) – slow access (100 clock cycles)
 * Disk (file system) (1 GiB to 1 TiB) – very slow (10,000,000 clock cycles)
 * Remote Memory (such as other computers or the Internet) (Practically unlimited) – speed varies

Modern machines tend to read blocks of lower memory into the next level of the memory hierarchy. If this displaces used memory, the operating system tries to predict which data will be accessed least (or latest) and move it down the memory hierarchy. Prediction algorithms tend to be simple to reduce hardware complexity, though they are becoming somewhat more complicated.

Spatial and temporal locality example: matrix multiplication
A common example is matrix multiplication: for i in 0..n  for j in 0..m     for k in 0..p       C[i][j] = C[i][j] + A[i][k] * B[k][j];

When dealing with large matrices, this algorithm tends to shuffle data around too much. Since memory is pulled up the hierarchy in consecutive address blocks, in the C programming language it would be advantageous to refer to several memory addresses that share the same row (spatial locality). By keeping the row number fixed, the second element changes more rapidly. In C and C++, this means the memory addresses are used more consecutively. One can see that since  affects the column reference of both matrices   and , it should be iterated in the innermost loop (this will fix the row iterators,   and  , while   moves across each column in the row). This will not change the mathematical result, but it improves efficiency. By switching the looping order for  and , the speedup in large matrix multiplications becomes dramatic. (In this case, 'large' means, approximately, more than 100,000 elements in each matrix, or enough addressable memory such that the matrices will not fit in L1 and L2 caches.)

Temporal locality can also be improved in the above example by using a technique called blocking. The larger matrix can be divided into evenly-sized sub-matrices, so that the smaller blocks can be referenced (multiplied) several times while in memory. for (ii = 0; ii < SIZE; ii += BLOCK_SIZE) for (kk = 0; kk < SIZE; kk += BLOCK_SIZE) for (jj = 0; jj < SIZE; jj += BLOCK_SIZE) for (i = ii; i < ii + BLOCK_SIZE && i < SIZE; i++) for (k = kk; k < kk + BLOCK_SIZE && k < SIZE; k++) for (j = jj; j < jj + BLOCK_SIZE && j < SIZE; j++) C[i][j] = C[i][j] + A[i][k] * B[k][j];

The temporal locality of the above solution is provided because a block can be used several times before moving on, so that it is moved in and out of memory less often. Spatial locality is improved because elements with consecutive memory addresses tend to be pulled up the memory hierarchy together.