Cost distance analysis

In spatial analysis and geographic information systems, cost distance analysis or cost path analysis is a method for determining one or more optimal routes of travel through unconstrained (two-dimensional) space. The optimal solution is that which minimizes the total cost of the route, based on a field of cost density (cost per linear unit) that varies over space due to local factors. It is thus based on the fundamental geographic principle of Friction of distance. It is an optimization problem with multiple deterministic algorithm solutions, implemented in most GIS software.

The various problems, algorithms, and tools of cost distance analysis operate over an unconstrained two-dimensional space, meaning that a path could be of any shape. Similar cost optimization problems can also arise in a constrained space, especially a one-dimensional linear network such as a road or telecommunications network. Although they are similar in principle, the problems in network space require very different (usually simpler) algorithms to solve, largely adopted from graph theory. The collection of GIS tools for solving these problems are called network analysis.

History
Humans seem to have an innate desire to travel with minimal effort and time. Historic, even ancient, roads show patterns similar to what modern computational algorithms would generate, traveling straight across flat spaces, but curving around mountains, canyons, and thick vegetation.

However, it was not until the 20th century that geographers developed theories to explain this route optimization, and algorithms to reproduce it. In 1957, during the Quantitative revolution in Geography, with its propensity to adopt principles or mathematical formalisms from the "hard" sciences (known as social physics), William Warntz used refraction as an analogy for how minimizing travel cost will make transportation routes change direction at the boundary between two landscapes with very different friction of distance (e.g., emerging from a forest into a prairie). His principle of "parsimonious movement," changing direction to minimize cost, was widely accepted, but the refraction analogy and mathematics (Snell's law) was not, largely because it does not scale well to normally complex geographic situations.

Warntz and others then adopted another analogy that proved much more successful in the common situation where travel cost varies continuously over space, by comparing it to terrain. They compared the cost rate (i.e., cost per unit distance, the inverse of velocity if the cost is time) to the slope of a terrain surface (i.e., elevation change per unit distance), both being mathematical derivatives of an accumulated function or field: total elevation above a vertical datum (sea level) in the case of terrain. Integrating the cost rate field from a given starting point would create an analogous surface of total accumulated cost of travel from that point. In the same way that a stream follows the path of least resistance downhill, the streamline on the cost accumulation surface from any point "down" to the source will be the minimum-cost path. Additional lines of research in the 1960s further developed the nature of the cost rate field as a manifestation of the concept of friction of distance, studying how it was affected by various geographic features.

At the time, this solution was only theoretical, lacking the data and computing power for the continuous solution. Raster GIS provided the first feasible platform for implementing the theoretical solution by converting the continuous integration into a discrete summation procedure. Dana Tomlin implemented cost distance analysis in his Map Analysis Package by 1986, and Ronald Eastman added it to IDRISI by 1989, with a more efficient "pushbroom" cost accumulation algorithm. Douglas (1994) further refined the accumulation algorithm, which is basically what is implemented in most current GIS software.

Cost raster


The primary data set used in cost distance analysis is the cost raster, sometimes called the cost-of-passage surface, the friction image, the cost-rate field, or cost surface. In most implementations, this is a raster grid, in which the value of each cell represents the cost (i.e., expended resources, such as time, money, or energy) of a route crossing the cell in a horizontal or vertical direction. It is thus a discretization of a field of cost rate (cost per linear unit), a spatially intensive property. This cost is a manifestation of the principle of friction of distance.

A number of different types of cost may be relevant in a given routing problem:
 * Travel cost, the resource expenditure required to move across the cell, usually time or energy/fuel.
 * Construction cost, the resources (usually monetary) required to build the infrastructure that makes travel possible, such as roads, pipes, and cables. While some construction costs are constant (e.g., paving material), others are spatially variant, such as property acquisition and excavation.
 * Environmental impacts, the negative effects on the natural or human environment caused by the infrastructure or the travel along it. For example, building an expressway through a residential neighborhood or a wetland would incur a high political cost (in the form of environmental impact assessments, protests, lawsuits, etc.).

Some of these costs are easily quantifiable and measurable, such as transit time, fuel consumption, and construction costs, thus naturally lending themselves to computational solutions. That said, there may be significant uncertainty in predicting the cost prior to implementing the route. Other costs are much more difficult to measure due to their qualitative or subjective nature, such as political protest or ecological impact; these typically require operationalization through the creation of a scale.

In many situations, multiple types of cost may be simultaneously relevant, and the total cost is a combination of them. Because different costs are expressed in different units (or, in the case of scales, no units at all), they usually cannot be directly summed, but must be combined by creating an index. A common type of index is created by scaling each factor to a consistent range (say, [0,1]), then combining them using weighted linear combination. An important part of the creation of an index model like this is Calibration (statistics), adjusting the parameters of the formula(s) to make the modeled relative cost match real-world costs, using methods such as the Analytic hierarchy process. The index model formula is typically implemented in a raster GIS using map algebra tools from raster grids representing each cost factor, resulting in a single cost raster grid.

Directional cost
One limitation of the traditional method is that the cost field is isotropic or omni-directional: the cost at a given location does not depend on the direction of traversal. This is appropriate in many situations, but not others. For example, if one is flying in a windy location, an airplane flying in the direction of the wind incurs a much lower cost than an airplane flying against it. Some research has been done on extending cost distance analysis algorithms to incorporate directional cost, but it is not yet widely implemented in GIS software. IDRISI has some support for anisotropy.

Least-cost-path algorithm
The most common cost distance task is to determine the single path through the space between a given source location and a destination location that has the least total accumulated cost. The typical solution algorithm is a discrete raster implementation of the cost integration strategy of Warntz and Lindgren, which is a deterministic (NP-complete) optimization.


 * 1) Inputs: cost field raster, source location, destination location (most implementations can solve for multiple sources and destinations simultaneously)
 * 2) Accumulation: Starting at the source location compute the lowest total cost needed to reach every other cell in the grid. Although there are several algorithms, such as those published by Eastman and Douglas,  they generally follow a similar strategy. This process also creates, as an important byproduct, a second raster grid usually called the backlink grid (Esri) or movement direction grid (GRASS), in which each cell has a direction code (0-7) representing which of its eight neighbors had the lowest cost.
 * 3) Find a cell that is adjacent to at least one cell that already has an accumulated cost assigned (initially, this is only the source cell)
 * 4) Determine which neighbor has the lowest accumulated cost. Encode the direction from the target to the lowest-cost neighbor in the backlink grid.
 * 5) Add the cost of the target cell (or an average of the costs of the target and neighbor cells) to the neighbor accumulated cost, to create the accumulated cost of the target cell. If the neighbor is diagonal, the local cost is multiplied by $$\sqrt{2}$$
 * 6) The algorithm must also take into account that indirect routes may have lower cost, often using a hash table to keep track of temporary cost values along the expanding fringe of computation that can be reconsidered.
 * 7) Repeat the procedure until all cells are assigned.
 * 8) Drain: In keeping with the terrain analogy, trace the optimal route from the given destination back to the source like a stream draining away from a location. At its most basic, this is accomplished by starting at the destination cell, moving in the direction indicated in the backlink grid, then repeating for the next cell, and so on until the source is reached. Recent software adds some improvements, such as looking across three or more cells to recognize straight lines at angles other than the eight neighbor directions. For example, the r.walk function in GRASS can recognize the "knight's move" (one cell straight, then one cell diagonal) and draw a straight line bypassing the middle cell.

Corridor analysis
A slightly different version of the least-cost path problem, which could be considered a fuzzy version of it, is to look for corridors more than one cell in width, thus providing some flexibility in applying the results. Corridors are commonly used in transportation planning and in wildlife management.

The solution to this problem is to compute, for every cell in the study space, the total accumulated cost of the optimal path between a given source and destination that passes through that cell. Thus, every cell in the optimal path derived above would have the same minimum value. Cells near this path would be reached by paths deviating only slightly from the optimal path, so they would have relatively low cost values, collectively forming a corridor with fuzzy edges as more distant cells have increasing cost values.

The algorithm to derive this corridor field is created by generating two cost accumulation grids: one using the source as described above. Then the algorithm is repeated, but using the destination as the source. Then these two grids are added using map algebra. This works because for each cell, the optimal source-destination path passing through that cell is the optimal path from that cell to the source, added to the optimal path from that cell to the destination. This can be accomplished using the cost accumulation tool above, along with a map algebra tool, although ArcGIS provides a Corridor tool that automates the process.

Cost-based allocation
Another use of the cost accumulation algorithm is to partition space among multiple sources, with each cell assigned to the source it can reach with the lowest cost, creating a series of regions in which each source is the "nearest". In the terrain analogy, these would correspond to watersheds (one could thus call these "cost-sheds," but this term is not in common usage). They are directly related to a voronoi diagram, which is essentially an allocation over a space with constant cost. They are also conceptually (if not computationally) similar to location-allocation tools for network analysis.

A cost-based allocation can be created using two methods. The first is to use a modified version of the cost accumulation algorithm, which substitutes the backlink grid for an allocation grid, in which each cell is assigned the same source identifier of its lowest-cost neighbor, causing the domain of each source to gradually grow until they meet each other. This is the approach taken in ArcGIS Pro. The second solution is to first run the basic accumulation algorithm, then use the backlink grid to determine the source into which each cell "flows." GRASS GIS uses this approach; in fact, the same tool is used as for computing watersheds from terrain.

Implementations
Cost distance tools are available in most raster GIS software:
 * GRASS GIS (often bundled into QGIS), with separate accumulation (r.cost) and drain (r.walk) functions
 * ArcGIS Desktop and ArcGIS Pro, with separate accumulation (Cost Distance) and drain (Cost Path) geoprocessing tools, as well as Corridor generation. Recently, starting with ArcGIS Pro version 2.5, a new set of cost distance tools was introduced, using more advanced algorithms with more flexible options.
 * TerrSet (formerly Idrisi) has several tools, implementing a variety of algorithms to solve different kinds of cost distance problems, including anisotropic (directional) cost.

Applications
Cost distance analysis has found applications in a wide range of geography related disciplines including archeaology and landscape ecology.