Paleogenomics

Paleogenomics is a field of science based on the reconstruction and analysis of genomic information in extinct species. Improved methods for the extraction of ancient DNA (aDNA) from museum artifacts, ice cores, archeological or paleontological sites, and next-generation sequencing technologies have spurred this field. It is now possible to detect genetic drift, ancient population migration and interrelationships, the evolutionary history of extinct plant, animal and Homo species, and identification of phenotypic features across geographic regions. Scientists can also use paleogenomics to compare ancient ancestors against modern-day humans. The rising importance of paleogenomics is evident from the fact that the 2022 Nobel Prize in physiology or medicine was awarded to a Swedish geneticist Svante Pääbo [1955-], who worked on paleogenomics.

Background
Initially, aDNA sequencing involved cloning small fragments into bacteria, which proceeded with low efficiency due to the oxidative damage the aDNA suffered over millennia. aDNA is difficult to analyze due to facile degradation by nucleases; specific environments and postmortem conditions improved isolation and analysis. Extraction and contamination protocols were necessary for reliable analyses. With the development of the Polymerase Chain Reaction (PCR) in 1983, scientists could study DNA samples up to approximately 100,000 years old, a limitation of the relatively short isolated fragments. Through advances in isolation, amplification, sequencing, and data reconstruction, older and older samples have become analyzable. Over the past 30 years, high copy number mitochondrial DNA was able to answer many questions; the advent of NGS techniques prompted far more. Moreover, this technological revolution allowed the transition from paleogenetics to paleogenomics.

Challenges and techniques
PCR, NGS second generation, and various library methods are available for sequencing aDNA, besides many bioinformatics tools. When dealing with each of these methods it is important to consider that aDNA can be altered post-mortem. Specific alterations arise from:
 * Basis mutational patterns sequence data (C->T mutation)
 * Crosslinks
 * Cytosine deamination (increased towards read termini)
 * Depurination
 * Genome fragmentation

Specific patterns and onset of these alterations help scientists to estimate the sample's age.

Formerly, scientists diagnosed post-mortem damages using enzymatic reactions or gas chromatography associated with mass spectroscopy; in more recent years scientists began to detect them by exploiting mutational sequence data. This strategy allows to identify excess of C->T mutations following treatment with uracil DNA glycosylase. Nowadays, one uses high-throughput sequencing (HTS) to identify depurination (a process that drives post-mortem DNA fragmentation, younger samples present more adenine than guanine), single strand breaks in double helix of DNA and abasic site (created by C->T mutation).

A single fragment of aDNA can be sequenced in its full length with HTS. With these data we can create a distribution representing a size decay curve that enables a direct quantitative comparison of fragmentation across specimens through space and environmental conditions. Throughout the decay curve it is possible to obtain the median length of the given fragment of aDNA. This length reflects the fragmentation levels after death, which generally increases with depositional temperature.

Libraries
Two different libraries can be performed for aDNA sequencing using PCR for genome amplification:
 * Double-stranded aDNA library (dsDNA library)
 * Single-stranded aDNA library (ssDNA library)

The first one is created using the blunt-end approach. This technique uses two different adaptors: these adaptors bind randomly the fragment and it can then be amplified. The fragment that does not contain both adaptors cannot be amplified causing an error source. To reduce this error, Illumina T/A ligation was introduced: this method consists in inserting the A tailing in DNA sample to facilitate the ligation of T tailed adaptors. In this methods we optimize the amplification of the aDNA.

To obtain ssDNA libraries, DNA is first denatured with heat. The obtained ssDNA is then ligated to two adaptors in order to generate the complementary strand and finally PCR is applied.

aDNA Enrichment
As aDNA may contain bacterial DNA or other microorganisms, the process requires enrichment. In order to separate endogenous and exogenous fractions, various methods are employed:
 * Damaged template enrichment: Used when constructing an ssDNA library because this method targets DNA damage. When Bst polymerase fills the nick, the sample is treated with uracil DNA glycosylase and endonuclease VIII. These compounds attack the abasic site. The undamaged DNA remains attached to streptavidin-coated paramagnetic beads and can be separated from the sample. This method is specific for samples from late Pleistocene Neanderthals.
 * Extension-free target enrichment in solution: this method is based on target-probe hybridization. This method requires DNA denaturation and then inserts overlapping tiled probes along target regions. Then, PCR for DNA amplification is used and finally DNA is linked to a biotinylated adaptor. It's useful for samples from Archaic hominin ancestry.
 * Solid-phase target enrichment: in this method microarray and real-time PCR method are used in parallel with shotgun sequencing screening.
 * Whole-genome enrichment: used for sequencing the entire genome of single individuals. Whole-genome In-Solution Capture (WISC) is used. This method starts with the preparation of a genome-wide RNA probe library from a species with a genome that is closely related to the target genome in the DNA sample.

Diversification of present-day non-African populations and anatomically modern humans
By now many studies in different fields have led to the conclusion that present-day non-African population is the result of the diversification in several different lineages of an ancestral, well-structured, metapopulation which was the protagonist of an out-of-Africa expansion, in which it carried a subset of African genetic heritage. In this context, the analysis of ancient DNA was fundamental to test already formulated hypothesis and to provide new insights. First, it has allowed to narrow the timing and the structure of this diversification phenomenon by providing the calibration of the autosomal and mitochondrial mutation rate. Admixture analysis has demonstrated that at least two independent gene flow events have occurred between ancestors of modern humans and archaic humans, such as Neanderthal and Denisovan populations, testifying the “leaky replacement” model of Eurasian human population history. According to all these data, the human divergence of the non-African lineages occurred around 45,000 – 55,000 BP. Besides that, in many cases ancient DNA has allowed to track historical processes which have led, in time, to the actual population genetic structure, which would have been difficult to do counting only on the analysis of present-day genomes. Among these still unresolved questions, some of the most studied are the identity of the first inhabitants of the Americas, the peopling of Europe and the origin of agriculture in Europe.

Phenotypic variation in humans
Analysis of ancient DNA allows to study mutations of phenotypic traits following changes in environment and human behavior. Migration to new habitats, new dietary shifts (following the transition to agriculture) and building of large communities led to the exposure of humans to new conditions that ultimately resulted in biological adaptation.

Skin colour
Migration of humans out of Africa to higher latitudes involved less exposure to sunlight. Since UVA and UVB rays are crucial for the synthesis of vitamin D, which regulates calcium absorption and thus is essential for bone health, living at higher latitudes would mean a substantial reduction in vitamin D synthesis. This put a new selective pressure on skin colour trait, favouring lighter skin colour at higher latitudes. The two most important genes involved in skin pigmentation are SLC24A5 and SLC45A2. Nowadays the “light skin” alleles of these genes are fixed in Europe but they reached a relatively high frequency only fairly recently (about 5000 years ago). Such slow depigmentation process suggests that ancient Europeans could have faced the downsides of low vitamin D production, such as musculoskeletal and cardiovascular conditions. Another hypothesis is that pre-agricultural Europeans could have met their vitamin D requirements through their diet (since meat and fish contain some vitamin D)

Adaptation to agricultural diet
One of the major examples of adaptation following the switch to agricultural diet is the persistence of production of the lactase enzyme in adulthood. This enzyme is essential to digest lactose present in milk and dietary products and its absence leads to diarrhea following the consumption of these products. Lactase persistence is determined predominantly by a single-base mutation in the MCM6 gene and ancient DNA data show that this mutation became common only within the past 5000 years, thousands of years after the beginning of dairying practices. Thus, even in the case of lactase-persistence there is a huge time delay between the onset of a new habit and the spread of the adaptive allele and so milk consumption may have been restricted to children or to lactose-reduced products.

Another example of mutation positively selected by the switch to agriculture is the number of AMY1 gene copies. AMY1 encodes for the starch-digesting enzyme amylase present in saliva and modern humans have a higher number of gene copies compared to chimpanzees.

The immune system
The human immune system has undergone intense selection through the millennia, adapting to different pathogen landscapes. Several environmental and cultural changes have imposed a selective pressure on different immune-associated genes. Migrations, for example, exposed humans to new habitats carrying new pathogens or pathogen vectors (e.g. mosquitos). Also the switch to agriculture involved exposition to different pathogens and health conditions, both due to the increased population density and to living close to livestock. However, it is difficult to directly correlate particular ancient genome changes to improved resistance to particular pathogens, giving the vastness and complexity of the human immune system. Besides studying directly changes in the human immune system, it is also possible to study the ancient genomes of pathogens, such as those causing tuberculosis, leprosy, plague, smallpox or malaria. For example, researchers have discovered that all strains of Yersinia pestis before 3600 years ago were lacking the ymt gene, which is essential for the pathogen to survive in the intestine of fleas. This suggests that in the ancient past plague may had been less virulent compared to more recent Y. pestis outbreaks.

A study of ancient DNA supported or confirmed that recent human evolution to resist infection of pathogens also increased inflammatory disease risk in post-Neolithic Europeans over the last 10,000 years, estimating nature, strength, and time of onset of selections due to pathogens.

Plants and animals
Many non-hominin vertebrates - ancient mammoth, polar bear, dog and horse - have been reconstructed through aDNA recovery from fossils and samples preserved at low temperature or high altitude. Mammoth studies are most frequent due to the high presence of soft tissue and hair from permafrost and are used to identify the relationship and demographic changes with more recent elephants. Polar bear studies are performed to identify the impact of climate change in evolution and biodiversity. Dog and horse studies give insights into domestication. In plants, aDNA has been isolated from seeds, pollen and wood. A correlation has been identified between ancient and extant barley. Another application was the detection of domestication and adaptation process of maize which include genes for drought tolerance and sugar content.

Challenges and future perspectives
The analysis of ancient genomes of anatomically modern humans has, in recent years, completely revolutionized our way of studying population migrations, transformation and evolution. Nevertheless, much still remains unknown. The first and obvious problem related to this kind of approach, which is going to be partially overcome by the continuous improvement of the ancient DNA extraction techniques, is the difficulty of recovering well preserved ancient genomes, a challenge that is particularly observed in Africa and in Asia, where the temperatures are higher than in other colder regions of the world. Further, Africa is, among all the continents, the one that harbors the most genetic diversity. Besides DNA degradation, also exogenous contamination limits paleogenomic sequencing and assembly processes. As we do not possess ancient DNA coming from the time and the region inhabited by the original ancestors of present-day non-African population, we still know little about their structure and location. The second and more important challenge that this matter has to face is the recovery of DNA from early modern humans (100,000 – 200,000 BP). These data, together with a major number of archaic genomes to analyze and with the knowledge of the timing and of the distribution of archaic genetic admixture, will allow scientists to more easily reconstruct the history of our species. In fact, collecting more data about or genetic history will allow us to track human evolution not only in terms of migrations and natural selection, but also in terms of culture. In the next decade paleogenomics research field is going to focus its attention mainly on three topics: the definition, at a fine-scale detail, of past human interactions by denser sampling, the comprehension of how these interactions have contributed to agricultural transition by analysis of DNA of understudied regions and, finally, the quantification of the natural selection contribution to present-day phenotypes. To interpret all these data geneticists will be required to cooperate, as they have already done with anthropologists and archaeologists, with historians.

Bioethics
Bioethics in paleogenomics concerns ethical questions that arise in the study of ancient human remains, due to the complex relationships among scientists, governments and indigenous populations. In addition, paleogenomic studies have the potential to harm community or individual histories and identities, as well as to reveal denouncing information about their descendants. For these reasons, these kind of studies are still a touchy subject. Paleogenomics studies can have negative consequences mainly because of the discrepancies between articulations of ethical principles and practices. In fact, ancestors’ remains are usually considered legally and scientifically as “artifacts”, rather than “human subjects”, which justifies questionable behaviors and lack of engagement from communities. Testing of ancestral remains are therefore used in disputes, claims in treaty, repatriation, or other legal cases. The acknowledgement of the importance and susceptibility of this subject is heading towards ethical commitment and guidance applicable to different contexts, in order to preserve ancestral remains’ dignity and avoid ethical issues. Finally, another pioneering area of interest is the so-called “de-extinction” project, which aims to the resurrection of extinct species, such as the mammoth. This project, which appears to be possible thanks to the CRISPR/Cas9 technology, is, however, strongly connected to many ethical issues.