Species distribution modelling

Species distribution modelling (SDM), also known as environmental (or ecological) niche modelling (ENM), habitat modelling, predictive habitat distribution modelling, and range mapping uses computer algorithms to predict the distribution of a species across geographic space and time using environmental data. The environmental data are most often climate data (e.g. temperature, precipitation), but can include other variables such as soil type, water depth, and land cover. SDMs are used in several research areas in conservation biology, ecology and evolution. These models can be used to understand how environmental conditions influence the occurrence or abundance of a species, and for predictive purposes (ecological forecasting). Predictions from an SDM may be of a species’ future distribution under climate change, a species’ past distribution in order to assess evolutionary relationships, or the potential future distribution of an invasive species. Predictions of current and/or future habitat suitability can be useful for management applications (e.g. reintroduction or translocation of vulnerable species, reserve placement in anticipation of climate change).

There are two main types of SDMs. Correlative SDMs, also known as climate envelope models, bioclimatic models, or resource selection function models, model the observed distribution of a species as a function of environmental conditions. Mechanistic SDMs, also known as process-based models or biophysical models, use independently derived information about a species' physiology to develop a model of the environmental conditions under which the species can exist.

The extent to which such modelled data reflect real-world species distributions will depend on a number of factors, including the nature, complexity, and accuracy of the models used and the quality of the available environmental data layers; the availability of sufficient and reliable species distribution data as model input; and the influence of various factors such as barriers to dispersal, geologic history, or biotic interactions, that increase the difference between the realized niche and the fundamental niche. Environmental niche modelling may be considered a part of the discipline of biodiversity informatics.

History
A. F. W. Schimper used geographical and environmental factors to explain plant distributions in his 1898 Pflanzengeographie auf physiologischer Grundlage (Plant Geography Upon a Physiological Basis) and his 1908 work of the same name. Andrew Murray used the environment to explain the distribution of mammals in his 1866 The Geographical Distribution of Mammals. Robert Whittaker's work with plants and Robert MacArthur's work with birds strongly established the role the environment plays in species distributions. Elgene O. Box constructed environmental envelope models to predict the range of tree species. His computer simulations were among the earliest uses of species distribution modelling.

The adoption of more sophisticated generalised linear models (GLMs) made it possible to create more sophisticated and realistic species distribution models. The expansion of remote sensing and the development of GIS-based environmental modelling increase the amount of environmental information available for model-building and made it easier to use.

Correlative SDMs
SDMs originated as correlative models. Correlative SDMs model the observed distribution of a species as a function of geographically referenced climatic predictor variables using multiple regression approaches. Given a set of geographically referred observed presences of a species and a set of climate maps, an algorithm finds the most likely environmental ranges within which a species lives. Correlative SDMs assume that species are at equilibrium with their environment and that the relevant environmental variables have been adequately sampled. The models allow for interpolation between a limited number of species occurrences.

For these algorithms to be effective, it is required to gather observations not only of species presences, but also of absences, that is, where the species does not live. Records of species absences are typically not as common as records of presences, thus often "random background" or "pseudo-absence" data are used to fit these models. If there are incomplete records of species occurrences, pseudo-absences can introduce bias. Since correlative SDMs are models of a species’ observed distribution, they are models of the realized niche (the environments where a species is found), as opposed to the fundamental niche (the environments where a species can be found, or where the abiotic environment is appropriate for the survival). For a given species, the realized and fundamental niches might be the same, but if a species is geographically confined due to dispersal limitation or species interactions, the realized niche will be smaller than the fundamental niche.

Correlative SDMs are easier and faster to implement than mechanistic SDMs, and can make ready use of available data. Since they are correlative however, they do not provide much information about causal mechanisms and are not good for extrapolation. They will also be inaccurate if the observed species range is not at equilibrium (e.g. if a species has been recently introduced and is actively expanding its range).

Mechanistic SDMs
Mechanistic SDMs are more recently developed. In contrast to correlative models, mechanistic SDMs use physiological information about a species (taken from controlled field or laboratory studies) to determine the range of environmental conditions within which the species can persist. These models aim to directly characterize the fundamental niche, and to project it onto the landscape. A simple model may simply identify threshold values outside of which a species can't survive. A more complex model may consist of several sub-models, e.g. micro-climate conditions given macro-climate conditions, body temperature given micro-climate conditions, fitness or other biological rates (e.g. survival, fecundity) given body temperature (thermal performance curves), resource or energy requirements, and population dynamics. Geographically referenced environmental data are used as model inputs. Because the species distribution predictions are independent of the species’ known range, these models are especially useful for species whose range is actively shifting and not at equilibrium, such as invasive species.

Mechanistic SDMs incorporate causal mechanisms and are better for extrapolation and non-equilibrium situations. However, they are more labor-intensive to create than correlational models and require the collection and validation of a lot of physiological data, which may not be readily available. The models require many assumptions and parameter estimates, and they can become very complicated.

Dispersal, biotic interactions, and evolutionary processes present challenges, as they aren’t usually incorporated into either correlative or mechanistic models.

Correlational and mechanistic models can be used in combination to gain additional insights. For example, a mechanistic model could be used to identify areas that are clearly outside the species’ fundamental niche, and these areas can be marked as absences or excluded from analysis. See for a comparison between mechanistic and correlative models.

Niche modelling algorithms (correlative)
There are a variety of mathematical methods that can be used for fitting, selecting, and evaluating correlative SDMs. Algorithms include "profile" methods, which are simple statistical techniques that use e.g. environmental distance to known sites of occurrence such as BIOCLIM and DOMAIN; "regression" methods (e.g. forms of generalized linear models); and "machine learning" methods such as maximum entropy (MAXENT). Ten machine learning algorithms used in SDM can be seen in. An incomplete list of algorithms that have been used for niche modelling includes:

Profile techniques

 * BIOCLIM
 * DOMAIN
 * Ecological niche factor analysis (ENFA)
 * Mahalanobis distance
 * Isodar analysis

Regression-based techniques

 * Generalized linear model (GLM)
 * Generalized additive model (GAM)
 * Multivariate adaptive regression splines (MARS)
 * Maxlike
 * Favourability Function (FF)

Machine learning techniques

 * MAXENT
 * Artificial neural networks (ANN)
 * Genetic Algorithm for Rule Set Production (GARP)
 * Boosted regression trees (BRT)/gradient boosting machines (GBM)
 * Random forest (RF)
 * Support vector machines (SVM)
 * XGBoost (XGB)

Furthermore, ensemble models can be created from several model outputs to create a model that captures components of each. Often the mean or median value across several models is used as an ensemble. Similarly, consensus models are models that fall closest to some measure of central tendency of all models—consensus models can be individual model runs or ensembles of several models.

Niche modelling software (correlative)
SPACES is an online Environmental niche modeling platform that allows users to design and run dozens of the most prominent algorithms in a high performance, multi-platform, browser-based environment.

MaxEnt is the most widely used method/software uses presence only data and performs well when there are few presence records available.

ModEco implements various algorithms.

DIVA-GIS has an easy to use (and good for educational use) implementation of BIOCLIM

The Biodiversity and Climate Change Virtual Laboratory (BCCVL) is a "one stop modelling shop" that simplifies the process of biodiversity and climate impact modelling. It connects the research community to Australia's national computational infrastructure by integrating a suite of tools in a coherent online environment. Users can access global climate and environmental datasets or upload their own data, perform data analysis across six different experiment types with a suite of 17 different algorithms, and easily visualise, interpret and evaluate the results of the models. Experiments types include: Species Distribution Model, Multispecies Distribution Model, Species Trait Model (currently under development), Climate Change Projection, Biodiverse Analysis and Ensemble Analysis. Example of BCCVL SDM outputs can be found here

Another example is Ecocrop, which is used to determine the suitability of a crop to a specific environment. This database system can also project crop yields and evaluate the impact of environmental factors such as climate change on plant growth and suitability.

Most niche modelling algorithms are available in the R packages 'dismo', 'biomod2' and 'mopa'..

Software developers may want to build on the openModeller project.

The Collaboratory for Adaptation to Climate Change adapt.nd.edu has implemented an online version of openModeller that allows users to design and run openModeller in a high-performance, browser-based environment to allow for multiple parallel experiments without the limitations of local processor power.