Species diversity

Species diversity is the number of different species that are represented in a given community (a dataset). The effective number of species refers to the number of equally abundant species needed to obtain the same mean proportional species abundance as that observed in the dataset of interest (where all species may not be equally abundant). Meanings of species diversity may include species richness, taxonomic or phylogenetic diversity, and/or species evenness. Species richness is a simple count of species. Taxonomic or phylogenetic diversity is the genetic relationship between different groups of species. Species evenness quantifies how equal the abundances of the species are.

Calculation of diversity
Species diversity in a dataset can be calculated by first taking the weighted average of species proportional abundances in the dataset, and then taking the inverse of this. The equation is:


 * $${}^q\!D={1 \over \sqrt[q-1]}$$

The denominator equals mean proportional species abundance in the dataset as calculated with the weighted generalized mean with exponent q - 1. In the equation, S is the total number of species (species richness) in the dataset, and the proportional abundance of the ith species is $$p_{i}$$. The proportional abundances themselves are used as weights. The equation is often written in the equivalent form:


 * $${}^q\!D=\left ( {\sum_{i=1}^S p_i^q} \right )^{1/(1-q)}$$

The value of q determines which mean is used. q = 0 corresponds to the weighted harmonic mean, which is 1/S because the $$p_{i}$$ values cancel out, with the result that 0D is equal to the number of species or species richness, S. q = 1 is undefined, except that the limit as q approaches 1 is well defined:


 * $$\lim_{q \rightarrow 1} {}^q\!D = \exp\left(-\sum_{i=1}^S p_i \ln p_i\right),$$

which is the exponential of the Shannon entropy.

q = 2 corresponds to the arithmetic mean. As q approaches infinity, the generalized mean approaches the maximum $$p_{i}$$ value. In practice, q modifies species weighting, such that increasing q increases the weight given to the most abundant species, and fewer equally abundant species are hence needed to reach mean proportional abundance. Consequently, large values of q lead to smaller species diversity than small values of q for the same dataset. If all species are equally abundant in the dataset, changing the value of q has no effect, but species diversity at any value of q equals species richness.

Negative values of q are not used, because then the effective number of species (diversity) would exceed the actual number of species (richness). As q approaches negative infinity, the generalized mean approaches the minimum $$p_{i}$$ value. In many real datasets, the least abundant species is represented by a single individual, and then the effective number of species would equal the number of individuals in the dataset.

The same equation can be used to calculate the diversity in relation to any classification, not only species. If the individuals are classified into genera or functional types, $$p_{i}$$ represents the proportional abundance of the ith genus or functional type, and qD equals genus diversity or functional type diversity, respectively.

Diversity indices
Often researchers have used the values given by one or more diversity indices to quantify species diversity. Such indices include species richness, the Shannon index, the Simpson index, and the complement of the Simpson index (also known as the Gini-Simpson index).

When interpreted in ecological terms, each one of these indices corresponds to a different thing, and their values are therefore not directly comparable. Species richness quantifies the actual rather than effective number of species. The Shannon index equals log(1D), that is, q approaching 1, and in practice quantifies the uncertainty in the species identity of an individual that is taken at random from the dataset. The Simpson index equals 1/2D, q = 2, and quantifies the probability that two individuals taken at random from the dataset (with replacement of the first individual before taking the second) represent the same species. The Gini-Simpson index equals 1 - 1/2D and quantifies the probability that the two randomly taken individuals represent different species.

Sampling considerations
Depending on the purposes of quantifying species diversity, the data set used for the calculations can be obtained in different ways. Although species diversity can be calculated for any data-set where individuals have been identified to species, meaningful ecological interpretations require that the dataset is appropriate for the questions at hand. In practice, the interest is usually in the species diversity of areas so large that not all individuals in them can be observed and identified to species, but a sample of the relevant individuals has to be obtained. Extrapolation from the sample to the underlying population of interest is not straightforward, because the species diversity of the available sample generally gives an underestimation of the species diversity in the entire population. Applying different sampling methods will lead to different sets of individuals being observed for the same area of interest, and the species diversity of each set may be different. When a new individual is added to a dataset, it may introduce a species that was not yet represented. How much this increases species diversity depends on the value of q: when q = 0, each new actual species causes species diversity to increase by one effective species, but when q is large, adding a rare species to a dataset has little effect on its species diversity.

In general, sets with many individuals can be expected to have higher species diversity than sets with fewer individuals. When species diversity values are compared among sets, sampling efforts need to be standardised in an appropriate way for the comparisons to yield ecologically meaningful results. Resampling methods can be used to bring samples of different sizes to a common footing. Species discovery curves and the number of species only represented by one or a few individuals can be used to help in estimating how representative the available sample is of the population from which it was drawn.

Trends
The observed species diversity is affected not only by the number of individuals but also by the heterogeneity of the sample. If individuals are drawn from different environmental conditions (or different habitats), the species diversity of the resulting set can be expected to be higher than if all individuals are drawn from a similar environment. Increasing the area sampled increases observed species diversity both because more individuals get included in the sample and because large areas are environmentally more heterogeneous than small areas.