Molecular ecology

Molecular ecology is a field of evolutionary biology that is concerned with applying molecular population genetics, molecular phylogenetics, and more recently genomics to traditional ecological questions (e.g., species diagnosis, conservation and assessment of biodiversity, species-area relationships, and many questions in behavioral ecology). It is virtually synonymous with the field of "Ecological Genetics" as pioneered by Theodosius Dobzhansky, E. B. Ford, Godfrey M. Hewitt, and others. These fields are united in their attempt to study genetic-based questions "out in the field" as opposed to the laboratory. Molecular ecology is related to the field of conservation genetics.

Methods frequently include using microsatellites to determine gene flow and hybridization between populations. The development of molecular ecology is also closely related to the use of DNA microarrays, which allows for the simultaneous analysis of the expression of thousands of different genes. Quantitative PCR may also be used to analyze gene expression as a result of changes in environmental conditions or different responses by differently adapted individuals.

Molecular ecology uses molecular genetic data to answer ecological question related to biogeography, genomics, conservation genetics, and behavioral ecology. Studies mostly use data based on deoxyribonucleic acid sequences (DNA). This approach has been enhanced over a number of years to allow researchers to sequence thousands of genes from a small amount of starting DNA. Allele sizes are another way researchers are able to compare individuals and populations which allows them to quantify the genetic diversity within a population and the genetic similarities among populations.

Bacterial diversity
Molecular ecological techniques are used to study in situ questions of bacterial diversity. Many microorganisms are not easily obtainable as cultured strains in the laboratory, which would allow for identification and characterization. It also stems from the development of PCR technique, which allows for the rapid amplification of genetic material.

The amplification of DNA from environmental samples using general or group-specific primers leads to a mix of genetic material, requiring sorting before sequencing and identification. The classic technique to achieve this is through cloning, which involves incorporating the amplified DNA fragments into bacterial plasmids. Techniques such as temperature gradient gel electrophoresis, allow for a faster result. More recently, the advent of relatively low-cost, next-generation DNA sequencing technologies, such as 454 and Illumina platforms, has allowed exploration of bacterial ecology concerning continental-scale environmental gradients such as pH that was not feasible with traditional technology.

Fungal diversity
Exploration of fungal diversity in situ has also benefited from next-generation DNA sequencing technologies. The use of high-throughput sequencing techniques has been widely adopted by the fungal ecology community since the first publication of their use in the field in 2009. Similar to the exploration of bacterial diversity, these techniques have allowed high-resolution studies of fundamental questions in fungal ecology such as phylogeography, fungal diversity in forest soils, stratification of fungal communities in soil horizons, and fungal succession on decomposing plant litter.

The majority of fungal ecology research leveraging next-generation sequencing approaches involves sequencing of PCR amplicons of conserved regions of DNA (i.e. marker genes) to identify and describe the distribution of taxonomic groups in the fungal community in question, though more recent research has focused on sequencing functional gene amplicons (e.g. Baldrian et al. 2012 ). The locus of choice for a description of the taxonomic structure of fungal communities has traditionally been the internal transcribed spacer (ITS) region of ribosomal RNA genes due to its utility in identifying fungi to genus or species taxonomic levels, and its high representation in public sequence databases. A second widely used locus (e.g. Amend et al. 2010, Weber et al. 2013 ), the D1-D3 region of 28S ribosomal RNA genes, may not allow the low taxonomic level classification of the ITS, but demonstrates superior performance in sequence alignment and phylogenetics. Also, the D1-D3 region may be a better candidate for sequencing with Illumina sequencing technologies. Porras-Alfaro et al. showed that the accuracy of classification of either ITS or D1-D3 region sequences was largely based on the sequence composition and quality of databases used for comparison, and poor-quality sequences and sequence misidentification in public databases is a major concern. The construction of sequence databases that have broad representation across fungi, and that are curated by taxonomic experts is a critical next step.

Next-generation sequencing technologies generate large amounts of data, and analysis of fungal marker-gene data is an active area of research. Two primary areas of concern are methods for clustering sequences into operational taxonomic units by sequence similarity and quality control of sequence data. Currently, there is no consensus on preferred methods for clustering, and clustering and sequence processing methods can significantly affect results, especially for the variable-length ITS region. In addition, fungal species vary in intra-specific sequence similarity of the ITS region. Recent research has been devoted to the development of flexible clustering protocols that allow sequence similarity thresholds to vary by taxonomic groups, which are supported by well-annotated sequences in public sequence databases.

Extra-pair fertilizations
In recent years, molecular data and analyses have been able to supplement traditional approaches of behavioral ecology, the study of animal behavior in relation to its ecology and evolutionary history. One behavior that molecular data has helped scientists better understand is extra-pair fertilizations (EPFs), also known as extra-pair copulations (EPCs). These are mating events that occur outside of a social bond, like monogamy and are hard to observe. Molecular data has been key to understanding the prevalence of and the individuals participating in EPFs.

While most bird species are socially monogamous, molecular data has revealed that less than 25% of these species are genetically monogamous. EPFs complicate matters, especially for male individuals, because it does not make sense for an individual to care for offspring that are not their own. Studies have found that males will adjust their parental care in response to changes in their paternity. Other studies have shown that in socially monogamous species, some individuals will employ an alternative strategy to be reproductively successful since a social bond does not always equal reproductive success.

It appears that EPFs in some species is driven by the good genes hypothesis, In red-back shrikes (Lanius collurio) extra-pair males had significantly longer tarsi than within-pair males, and all of the extra-pair offspring were males, supporting the prediction that females will bias their clutch towards males when they mate with an "attractive" male. In house wrens (Troglodytes aedon), extra-pair offspring were also found to be male-biased compared to within-offspring.

Without molecular ecology, identifying individuals that participate in EPFs and the offspring that result from EPFs would be impossible.

Isolation by distance
Isolation by distance (IBD), like reproductive isolation, is the effect of physical barriers to populations that limit migration and lower gene flow. The shorter the distance between populations the more likely individuals are to disperse and mate and thus, increase gene flow. The use of molecular data, specifically allele frequencies of individuals among populations in relation to their geographic distance help to explain concepts such as, sex-biased dispersal, speciation, and landscape genetics.

The Mantel test is an assessment that compares genetic distance with geographic distance and is most appropriate because it doesn't assume that the comparisons are independent of each other. There are three main factors that influence the chances of finding a correlation of IBD, which include sample size, metabolism, and taxa. For example, based on the meta-analysis, ectotherms are more likely than endotherms to display greater IBD.

Metapopulation theory
Metapopulation theory dictates that a metapopulation consists of spatially distinct populations that interact with one another on some level and move through a cycle of extinctions and recolonizations (i.e. through dispersal). The most common metapopulation model is the extinction-recolonization model which explains systems in which spatially distinct populations undergo stochastic changes in population sizes which may lead to extinction at the population level. Once this has occurred, dispersing individuals from other populations will immigrate and "rescue" the population at that site. Other metapopulation models include the source-sink model (island-mainland model) where one (or multiple) large central population(s) produces disperses to smaller satellite populations that have a population growth rate of less than one and could not persist without the influx from the main population.

Metapopulation structure and the repeated extinctions and recolonizations can significantly affect a population's genetic structure. Recolonization by a few dispersers leads to population bottlenecks which will reduce the effective population size (Ne), accelerate genetic drift, and deplete genetic variation. However, dispersal between populations in the metapopulation can reverse or halt these processes over the long term. Therefore, in order for individual sub-populations to remain healthy, they must either have a large population size or have a relatively high rate of dispersal with other subpopulations. Molecular ecology focuses on using tests to determine the rates of dispersal between populations and can use molecular clocks to determine when historic bottlenecks occurred. As habitat becomes more fragmented, dispersal between populations will become increasingly rare. Therefore, subpopulations that may have historically been preserved by a metapopulation structure may start to decline. Using mitochondrial or nuclear markers to monitor dispersal coupled with population Fst values and allelic richness can provide insight into how well a population is performing and how it will perform into the future.

Molecular clock hypothesis
The molecular clock hypothesis states that DNA sequences roughly evolve at the same rate and because of this the dissimilarity between two sequences can be used to tell how long ago they diverged from one another. The first step in using a molecular clock is it must be calibrated based on the approximate time the two lineages studied diverged. The sources usually used to calibrate the molecular clocks are fossils or known geological events in the past. After calibrating the clock the next step is to calculate divergence time by dividing the estimated time since the sequences diverged by the amount of sequence divergence. The resulting number is the estimated rate at which molecular evolution is occurring. The most widely cited molecular clock is a ‘universal’ mtDNA clock of approximately two percent sequence divergence every million years. Although referred to as a universal clock, this idea of the "universal" clock is not possible considering rates of evolution differ within DNA regions. Another drawback to using molecular clocks is that they ideally need to be calibrated from an independent source of data other than the molecular data. This poses a problem for taxa that don't fossilize/preserve easily, making it almost impossible to calibrate their molecular clock. Despite these inconveniences, the molecular clock hypothesis is still used today. The molecular clock has been successful in dating events happening up to 65 million years ago.

Mate choice hypotheses
The concept of mate choice explains how organisms select their mates based on two main methods; The Good Genes Hypothesis and Genetic Compatibility. The Good Genes Hypothesis, also referred to as the sexy son hypothesis, suggests that the females will choose a male that produce an offspring that will have increased fitness advantages and genetic viability. Therefore, the mates that are more 'attractive" are more likely to be chosen for mating and pass on their genes to the next generation. In species which exhibit polyandry the females will search out for the most suitable males and re-mate until they have found the best sperm to fertilize their eggs. Genetic compatibility is where mates are choosing their partner based on the compatibility of their genotypes. The mate which is doing the selecting must know their own genotype as well as the genotypes of potential mates in order to select the appropriate partner. Genetic compatibility in most instances is limited to specific traits, such as the major histocompatibility complex in mammals, because of complex genetic interactions. This behavior is potentially seen in humans. A study looking at women's choice in men based on body odors concluded that the scent of the odors were influenced by the MHC and that they influence mate choice in human populations.

Sex-biased dispersal
Sex-biased dispersal, or the tendency of one sex to disperse between populations more frequently than the other, is a common behavior studied by researchers. Three major hypotheses currently exist to help explain sex-biased dispersal. The resource-competition hypothesis infers that the more philopatric sex (the sex more likely to remain at its natal grounds) benefits during reproduction simply by having familiarity with natal ground resources. A second proposal for sex-biased dispersal is the local mate competition hypothesis, which introduces the idea that individuals encounter less mate competition with relatives the farther from their natal grounds they disperse. And the inbreeding avoidance hypothesis suggests individuals disperse to decrease inbreeding.

Studying these hypotheses can be arduous since it is nearly impossible to keep track of every individual and their whereabouts within and between populations. To combat this time-consuming method, scientists have recruited several molecular ecology techniques in order to study sex-biased dispersal. One method is the comparison of differences between nuclear and mitochondrial markers among populations. Markers showing higher levels of differentiation indicate the more philopatric sex; that is, the more a sex remains at natal grounds, the more their markers will take on a unique I.D, due to lack of gene flow with respect to that marker. Researchers can also quantify male-male and female-female pair relatedness within populations to understand which sex is more likely to disperse. Pairs with values consistently lower in one sex indicate the dispersing sex. This is because there is more gene flow in the dispersing sex and their markers are less similar than individuals of the same sex in the same population, which produces a low relatedness value. FST values are also used to understand dispersing behaviors by calculating an FST value for each sex. The sex that disperses more displays a lower FST value, which measures levels of inbreeding between the subpopulation and the total population. Additionally, assignment tests can be utilized to quantify the number of individuals of a certain sex dispersing to other populations. A more mathematical approach to quantifying sex-biased dispersal on the molecular level is the use of spatial autocorrelation. This correlation analyzes the relationship between geographic distance and spatial distance. A correlation coefficient, or r value, is calculated and the plot of r against distance provides an indication of individuals more related to or less related to one another than expected.

Quantitative trait loci
A quantitative trait locus (QTL) refers to a suite of genes that controls a quantitative trait. A quantitative trait is one that is influenced by several different genes as opposed to just one or two. QTLs are analyzed using Qst. Qst looks at the relatedness of the traits in focus. In the case of QTLs, clines are analyzed by Qst. A cline (biology) is a change in allele frequency across a geographical distance. This change in allele frequency causes a series of intermediate varying phenotypes that when associated with certain environmental conditions can indicate selection. This selection causes local adaptation, but high gene flow is still expected to be present along the cline.

For example, barn owls in Europe exhibit a cline in reference to their plumage coloration. Their feathers range in coloration from white to reddish-brown across the geological range of the southwest to the northeast. This study sought to find if this phenotypic variation was due to selection by calculating the Qst values across the owl populations. Because high gene flow was still anticipated along this cline, selection was only expected to act upon the QTLs that incur locally adaptive phenotypic traits. This can be determined by comparing the Qst values to Fst (fixation index) values. If both of these values are similar and Fst is based on neutral markers then it can be assumed that the QTLs were based on neutral markers (markers not under selection or locally adapted) as well. However, in the case of the barn owls the Qst value was much higher than the Fst value. This means that high gene flow was present allowing the neutral markers to be similar, indicated by the low Fst value. But, local adaptation due to selection was present as well, in the form of varying plumage coloration since the Qst value was high, indicating differences in these non-neutral loci. In other words, this cline of plumage coloration has some sort of adaptive value to the birds.

Fixation indices
Fixation indices are used when determining the level of genetic differentiation between sub-populations within a total population. FST is the script used to represent this index when using the formula:
 * $$ F_{ST} = \frac{ H_{T} - H_{S} } { H_{T} } $$

In this equation, HT represents the expected heterozygosity of the total population and HS is the expected heterozygosity of a sub-populations. Both measures of heterozygosity are measured at one loci. In the equation, heterozygosity values expected from the total population are compared to observed heterozygosity values of the sub-populations within this total population. Larger FST values imply that the level of genetic differentiation between sub-populations within a total population is more significant. The level of differentiation is the result of a balance between gene flow amongst sub-populations (decreasing differentiation) and genetic drift within these sub-populations (increasing differentiation); however, some molecular ecologists note that it cannot be assumed that these factors are at equilibrium. FST can also be viewed as a way of comparing the amount of inbreeding within sub-populations to the amount of inbreeding for the total population and is sometimes referred to as an inbreeding coefficient. In these cases, higher FST values typically imply higher amounts of inbreeding within the sub-populations. Other factors such as selection pressures may also affect FST values.

FST values are accompanied by several analog equations (FIS, GST, etc.). These additional measures are interpreted in a similar manner to FST values; however, they are adjusted to accompany other factors that FST may not, such as accounting for multiple loci.

Inbreeding depression
Inbreeding depression is the reduced fitness and survival of offspring from closely related parents. Inbreeding is commonly seen in small populations because of the greater chance of mating with a relative due to limited mate choice. Inbreeding, especially in small populations, is more likely to result in higher rates of genetic drift, which leads to higher rates of homozygosity at all loci in the population and decreased heterozygosity. The rate of inbreeding is based on decreased heterozygosity. In other words, the rate at which heterozygosity is lost from a population due to genetic drift is equal to the rate of accumulating inbreeding in a population. In the absence of migration, inbreeding will accumulate at a rate that is inversely proportional to the size of the population.

There are two ways in which inbreeding depression can occur. The first of these is through dominance, where beneficial alleles are usually dominant and harmful alleles are usually recessive. The increased homozygosity resulting from inbreeding means that harmful alleles are more likely to be expressed as homozygotes, and the deleterious effects cannot be masked by the beneficial dominant allele. The second method through which inbreeding depression occurs is through overdominance, or heterozygote advantage. Individuals that are heterozygous at a particular locus have a higher fitness than homozygotes at that locus. Inbreeding leads to decreased heterozygosity, and therefore decreased fitness.

Deleterious alleles can be scrubbed by natural selection from inbred populations through genetic purging. As homozygosity increases, less fit individuals will be selected against and thus those harmful alleles will be lost from the population.

Outbreeding depression
Outbreeding depression is the reduced biological fitness in the offspring of distantly related parents. The decline in fitness due to outbreeding is attributed to a breakup of coadapted gene complexes or favorable epistatic relationships. Unlike inbreeding depression, outbreeding depression emphasizes interactions between loci rather than within them. Inbreeding and outbreeding depression can occur at the same time. Risks of outbreeding depression increase with increased distance between populations. The risk of outbreeding depression during genetic rescue often limits the ability to increase a small or fragmented gene pool's genetic diversity. The spawn of an intermediate of two or more adapted traits can render the adaptation less effective than either of the parental adaptations. Three main mechanisms influence outbreeding depression; genetic drift, population bottlenecking, differentiation of adaptations, and set chromosomal dissimilarities resulting in sterile offspring. If outbreeding is limited and the population is large enough, selective pressure acting on each generation may be able to restore fitness. However, the population is likely to experience a multi-generational decline of overall fitness as selection for traits takes multiple generations. Selection acts on outbred generations using increased diversity to adapt to the environment. This may result in greater fitness among offspring than the original parental type.

Conservation units
Conservation units are classifications often used in conservation biology, conservation genetics, and molecular ecology in order to separate and group different species or populations based on genetic variance and significance for protection. Two of the most common types of conservation units are: Conservation units are often identified using both neutral and non-neutral genetic markers, with each having its own advantages. Using neutral markers during unit identification can provide unbiased assumptions of genetic drift and time since reproductive isolation within and among species and populations, while using non-neutral markers can provide more accurate estimations of adaptive evolutionary divergence, which can help determine the potential for a conservation unit to adapt within a certain habitat.
 * Management Units (MU): Management units are populations that have very low levels of gene flow and can therefore be genetically differentiated from other populations.
 * Evolutionarily significant units (ESU): Evolutionarily significant units are populations that show enough genetic differentiation to warrant their management as distinct units.

Because of conservation units, populations and species that have high or differing levels of genetic variation can be distinguished in order to manage each individually, which can ultimately differ based on a number of factors. In one instance, Atlantic salmon located within the Bay of Fundy were given evolutionary significance based on the differences in genetic sequences found among different populations. This detection of evolutionary significance can allow each population of salmon to receive customized conservation and protection based on their adaptive uniqueness in response to geographic location.

Phylogenies and community Ecology
Phylogenies are the evolutionary history of an organism, also known as phylogeography. A phylogenetic tree is a tree that shows evolutionary relationships between different species based on similarities/differences among genetic or physical traits. Community ecology is based on knowledge of evolutionary relationships among coexisting species. Phylogenies embrace aspects of both time (evolutionary relationships) and space (geographic distribution). Typically phylogeny trees include tips, which represent groups of descendent species, and nodes, which represent the common ancestors of those descendants. If two descendants split from the same node, they are called sister groups. They also may include an outgroup, a species outside of the group of interest. The trees depict clades, which is a group of organisms that include an ancestor and all descendants of that ancestor. The maximum parsimony tree is the simplest tree that has the minimum number of steps possible.

Phylogenies confer important historical processes that shape current distributions of genes and species. When two species become isolated from each other they retain some of the same ancestral alleles also known as allele sharing. Alleles can be shared because of lineage sorting and hybridization. Lineage sorting is driven by genetic drift and must occur before alleles become species specific. Some of these alleles over time will simply be lost, or they may proliferate. Hybridization leads to introgression of alleles from one species to another.

Community ecology emerged from natural history and population biology. Not only does it include the study of the interactions between species, but it also focuses on ecological concepts such as mutualism, predation, and competition within communities. It is used to explicate properties such as diversity, dominance, and composition of a community. There are three primary approaches to integrating phylogenetic information into studies of community organizations. The first approach focuses on examining the phylogenetic structure of community assemblages. The second approach focuses on exploring the phylogenetic basis of community niche structures. The final way zones in on adding a community context to studies of trait evolution and biogeography.

Species concepts
Species concepts are the subject of debate in the field of molecular ecology. Since the beginning of taxonomy, scientists have wanted to standardize and perfect the way species are defined. There are many species concepts that dictate how ecologists determine a good species. The most commonly used concept is the biological species concept which defines a species as groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups (Mayr, 1942). This concept is not always useful, particularly when it comes to hybrids. Other species concepts include phylogenetic species concept which describes a species as the smallest identifiable monophyletic group of organisms within which there is a parental pattern of ancestry and descent. This concept defines species on the identifiable. It would also suggest that until two identifiable groups actually produce offspring, they remain separate species. In 1999, John Avise and Glenn Johns suggested a standardized method for defining species based on past speciation and measuring biological classifications as time dependent. Their method used temporal banding to make genus, family and order based on how many tens of millions of years ago the speciation event that resulted in each species took place.

Landscape genetics
Landscape genetics is a rapidly emerging interdisciplinary field within molecular ecology. Landscape genetics relates genetics to landscape characteristics, such as land-cover use (forests, agriculture, roads, etc.), presence of barriers, and corridors, rivers, elevation, etc. Landscape genetics answers how landscape affects dispersal and gene flow.

Barriers are any landscape features that prevents dispersal. Barriers for terrestrial species can include mountains, rivers, roads, and unsuitable terrain, such as agriculture fields. Barriers for aquatic species can include islands or dams. Barriers are species specific; for example a river is a barrier to a field mouse, while a hawk can fly over a river. Corridors are areas over which dispersal is possible. Corridors are stretches of suitable habitat and can also be man-made, such as overpasses over roads and fish ladders on dams.

Geographic data used for landscape genetics can include data collected by radars in planes, land satellite data, marine data collected by NOAA, as well as any other ecological data. In landscape genetics researchers often use different analyses to attempt to determine the best way for a species to travel from point A to point B. Least cost path analysis uses geographic data to determine the most efficient path from one point to another. Circuit scape analysis predicts all the possible paths and the probability of each path's use between point A and point B. These analyses are used to determine the route a dispersing individual is likely to travel.

Landscape genetics is becoming an increasingly important tool in wildlife conservation efforts. It is being used to determine how habitat loss and fragmentation affects the movement of species. It is also used to determine which species need to be managed and whether to manage subpopulations the same or differently according to their gene flow.