Genetic history of the Iberian Peninsula



The ancestry of modern Iberians (comprising the Spanish and Portuguese) is consistent with the geographical situation of the Iberian Peninsula in the South-west corner of Europe, showing characteristics that are largely typical in Southern and Western Europeans. As is the case for most of the rest of Southern Europe, the principal ancestral origin of modern Iberians are Early European Farmers who arrived during the Neolithic. The large predominance of Y-Chromosome Haplogroup R1b, common throughout Western Europe, is also testimony to a sizeable input from various waves of (predominantly male) Western Steppe Herders that originated in the Pontic-Caspian Steppe during the Bronze Age.

Modern Iberians' genetic inheritance largely derives from the pre-Roman inhabitants of the Iberian Peninsula:


 * Pre-Indo-European and Indo-European speaking pre-Celtic groups: (Iberians, Lusitani, Vettones, Turdetani, Aquitani, Conii).
 * Celts (Gallaecians, Celtiberians, Turduli and Celtici), who were Latinised after the conquest of the region by the ancient Romans.

There are also minor genetic influences from the Germanic tribes who arrived in the early medieval period. Due to its position on the Mediterranean Sea, like other Southern European countries, there were also contacts with other Mediterranean peoples such as the ancient Phoenicians, Greeks and Carthaginians who briefly settled along Iberia's eastern and southern coasts, the Sephardi Jewish community, and Berbers and Arabs arrived during Al-Andalus, all of them leaving some North African and Middle Eastern genetic legacies, particularly in the south and west of the Iberian Peninsula. Similar to Sardinia, Iberia was shielded from settlement from the Middle East and Caucasus region by its western geographic location, and thus has lower levels of Western Asian and Middle Eastern admixture than Italy and Greece, most of which probably arrived to Iberia during historic rather than prehistoric times, especially in the Roman period.

Population Genetics: Methods and Limitations
The foremost pioneer of the study of population genetics was Luigi Luca Cavalli-Sforza. Cavalli-Sforza used classical genetic markers to analyse DNA by proxy. This method studies differences in the frequencies of particular allelic traits, namely polymorphisms from proteins found within human blood (such as the ABO blood groups, Rhesus blood antigens, HLA loci, immunoglobulins, G-6-P-D isoenzymes, among others). Subsequently, his team calculated genetic distances between populations, based on the principle that two populations that share similar frequencies of a trait are more closely related than populations that have more divergent frequencies of the trait.

Since then, population genetics has progressed significantly and studies using direct DNA analysis are now abundant and may use mitochondrial DNA (mtDNA), the non-recombining portion of the Y chromosome (NRY) or autosomal DNA. MtDNA and NRY DNA share some similar features which have made them particularly useful in genetic anthropology. These properties include the direct, unaltered inheritance of mtDNA and NRY DNA from mother to offspring and father to son, respectively, without the 'scrambling' effects of genetic recombination. We also presume that these genetic loci are not affected by natural selection and that the major process responsible for changes in base pairs has been mutation (which can be calculated).

Whereas Y-DNA and mtDNA haplogroups represent but a small component of a person's DNA pool, autosomal DNA has the advantage of containing hundreds and thousands of examinable genetic loci, thus giving a more complete picture of genetic composition. Descent relationships can only to be determined on a statistical basis, because autosomal DNA undergoes recombination. A single chromosome can record a history for each gene. Autosomal studies are much more reliable for showing the relationships between existing populations but do not offer the possibilities for unraveling their histories in the same way as mtDNA and NRY DNA studies promise, despite their many complications.

Analyses of nuclear and ancient DNA


Nuclear DNA analysis shows that Spanish and Portuguese populations are most closely related to other populations of western Europe. There is an axis of significant genetic differentiation along the east–west direction, in contrast to remarkable genetic similarity in the north–south direction. North African admixture, associated with the Islamic conquest, can be dated to the period between c. AD 860–1120.

A study published in 2019 using samples of 271 Iberians spanning prehistoric and historic times proposes the following inflexion points in Iberian genomic history:


 * 1) Mesolithic: hunter-gatherers from the European Steppes of Western Russia, Georgia and Ukraine are the first humans to settle the northwest of the Iberian Peninsula.
 * 2) Neolithic: neolithic farmers settle the entire Iberian Peninsula from Anatolia.
 * 3) Chalcolithic: Inflow of Central European hunter-gatherers and some gene inflow from sporadic contact with North Africa.
 * 4) Bronze Age: Steppe inflow from Central Europe.
 * 5) Iron Age: Additional Steppe gene flow from Central Europe, - the genetic pool of the Basque people remains mostly intact from this point on.
 * 6) Roman period: genetic inflow from Central and Eastern Mediterranean. Some additional inflow of North African genes detected in Southern Iberia.
 * 7) Visigothic period: no detectable inflows.
 * 8) Muslim period: Inflow from Northern Africa. Following the Reconquista, there is further genetic convergence between North and South Iberia.

North African influence


A number of studies have focused on ascertaining the genetic impact of historical North African population movements into Iberia on the genetic composition of modern Spanish and Portuguese populations. Initial studies pointed to the Straits of Gibraltar acting more as a genetic barrier than a bridge during prehistorical times, while other studies point to a higher level of recent North African admixture among Iberians than among other European populations,     albeit this is as a result of more recent migratory movements, particularly the Moorish invasion of Iberia in the 8th century.

In terms of autosomal DNA, the most recent study regarding African admixture in Iberian populations was conducted in April 2013 by Botigué et al. using genome-wide SNP data for over 2000 European, Maghreb, Qatar and Sub-Saharan individuals of which 119 were Spaniards and 117 Portuguese, concluding that Spain and Portugal hold significant levels of North African ancestry. Estimates of shared ancestry averaged from 4% in some places to 10% in the general population; the populations of the Canary Islands yielded from 0% to 96% of shared ancestry with north Africans, although the Canary islands are a Spanish exclave located in the African continent, and thus this output is not representative of the Iberian population; these same results did not exceed 2% in other western or southern European populations. However, contrary to past autosomal studies and to what is inferred from Y-Chromosome and Mitochondrial Haplotype frequencies (see below), it does not detect significant levels of Sub-Saharan ancestry in any European population outside the Canary Islands. Indeed, a prior 2011 autosomal study by Moorjani et al. found Sub-Saharan ancestry in many parts of southern Europe at ranges of between 1-3%, "the highest proportion of African ancestry in Europe is in Iberia (Portugal 4.2±0.3% and Spain 1.4±0.3%), consistent with inferences based on mitochondrial DNA and Y chromosomes and the observation by Auton et al. that within Europe, the Southwestern Europeans have the highest haplotype-sharing with North Africans."

Recent studies show minor relationships between some Iberian regions and North African populations as a result of the Al-Andalus historical period which in Portugal lasted between the 8th and 12th centuries AD, and in southern Spain continued until the late 15th century AD. Iberia is the European region that has a more prominent presence of haplogroup E3b of the human Y chromosome (E-M81), of haplogroup U (U6) and Haplotype Va, and this may be the result of some original common western Mediterranean population. In Portugal, North African Y-chromosome haplogroups (especially those typically North-West African) are at a frequency of 7.1%. Some studies of mitochondrial DNA also find evidence of the North African haplogroup U6, especially in northern Portugal. Although the frequency of U6 is low (4–6%), it was estimated that approximately 27% of the population of northern Portugal had some North African ancestry, as U6 is also not a common lineage in North Africa. According to some studies, the North African and Arab elements in the ancestry of today's Iberians are more than trivial when compared to the basis of pre-Islamic ancestry, and the Strait of Gibraltar seems to function more as a genetic bridge than a barrier.

However, a study that has used different genetic markers has reached different conclusions. In an autosomal study by Spínola et al. (2005), which analyzed the human leukocyte antigen (HLA genes) (inherent in all ancestors in direct paternal and maternal lineages) in hundreds of individuals from Portugal, showed that the Portuguese population has been influenced by other Europeans and North Africans, via many ancient migrations. According to the authors, the North and the South of Portugal show a greater similarity towards North Africans as opposed to the people of the center of the country, who seem closer to other Europeans, since the North of Portugal seems to have concentrated, certainly due to the pressure of Arab expansion, an ancient genetic pole originating from many North Africans and other Europeans, influences through millennia, while southern Portugal shows a North African genetic influence, probably the result of origins recent from the Amazigh people who accompanied the Arab expansion.



According to a study published in the American Journal of Human Genetics in December 2008, 30% of modern Portuguese (23.6% in the north and 36.3% in the south) have DNA that shows they have male Sephardic Jewish ancestry and 14% (11.8 in the North and 16.1% in the South) have Moorish ancestry. Despite the possible alternative sources for lineages attributed to a Sephardic Jewish origin, these proportions were testimony to the importance of religious conversion (voluntary or forced), shown by historical episodes of social and religious intolerance.



In terms of paternal Y-Chromosome DNA, recent studies coincide in that Iberia has the greatest presence of the typically Northwest African Y-chromosome haplotype marker E-M81 in Europe, with an average of 3%. as well as Haplotype Va. Estimates of Y-Chromosome ancestry vary, with a 2008 study published in the American Journal of Human Genetics using 1140 samples from throughout the Iberian peninsula, giving a proportion of 10.6% North African ancestry  to the paternal composite of Iberians. A similar 2009 study of Y-chromosome with 659 samples from Southern Portugal, 680 from Northern Spain, 37 samples from Andalusia, 915 samples from mainland Italy, and 93 samples from Sicily found significantly higher levels of North African male ancestry in Portugal, Spain and Sicily (7.1%, 7.7% and 7.5% respectively) than in peninsular Italy (1.7%).

Other studies of the Iberian gene-pool have estimated significantly lower levels of North African Ancestry. According to Bosch et al. 2000 "NW African populations may have contributed 7% of Iberian Y chromosomes". A wide-ranging study by Cruciani et al. 2007, using 6,501 unrelated Y-chromosome samples from 81 populations found that: "Considering both these E-M78 sub-haplogroups (E-V12, E-V22, E-V65) and the E-M81 haplogroup, the contribution of northern African lineages to the entire male gene pool of Iberia (barring Pasiegos), continental Italy and Sicily can be estimated as 5.6 percent, 3.6 percent and 6.6 percent, respectively". A 2007 study estimated the contribution of northern African lineages to the entire male gene pool of Iberia as 5.6%." In general aspects, according to (Bosch et al. 2007) "...the origins of the Iberian Y-chromosome pool may be summarized as follows: 5% recent NW African, 78% Upper Paleolithic and later local derivatives (group IX), and 10% Neolithic" (H58, H71).

Mitochondrial DNA studies of 2003, coincide in that the Iberian Peninsula holds higher levels of typically North African Haplotype U6, as well as higher frequencies of Sub-Saharan African Haplogroup L in Portugal. High frequencies are largely concentrated in the south and southwest of the Iberian peninsula, therefore overall frequency is higher in Portugal (7.8%) than in Spain (1.9%) with a mean frequency for the entire peninsula of 3.8%. There is considerable geographic divergence across the peninsula with high frequencies observed for Western Andalusia (14.6%) and Córdoba (8.3%)., Southern Portugal (10.7%), South West Castile (8%). Adams et al. and other previous publications, propose that the Moorish occupation left a minor Jewish, Saqaliba and some Arab-Berber genetic influence mainly in southern regions of Iberia.

The most recent and comprehensive genomic studies establish that North African genetic ancestry can be identified throughout most of the Iberian Peninsula, ranging from 0% to 11%, but is highest in the south and west, while being absent or almost absent in the Basque Country and northeast.

Current debates revolve around whether U6 presence is due to Islamic expansion into the Iberian peninsula or prior population movements  and whether Haplogroup L is linked to the slave trade or prior population movements linked to Islamic expansion. A majority of Haplogroup L lineages in Iberia being North African in origin points to the latter. In 2015, Hernández et al. concluded that "the estimated entrance of the North African U6 lineages into Iberia at 10 ky correlates well with other L African clades, indicating that some U6 and L lineages moved together from Africa to Iberia in the Early Holocene while a majority were introduced during historic times."

Y-Chromosome haplogroups
Like other Western Europeans, among Spaniards and Portuguese the Y-DNA Haplogroup R1b is the most frequent, occurring at over 70% throughout most of Spain. R1b is particularly dominant in the Basque Country and Catalonia, occurring at rate of over 80%. In Iberia, most men with R1b belong to the subclade R-P312 (R1b1a1a2a1a2; as of 2017). The distribution of haplogroups other than R1b varies widely from one region to another.

In Portugal as a whole the R1b haplogroups rate 70%, with some areas in the Northwest regions reaching over 90%.

Although R1b prevails in much of Western Europe, a key difference is found in the prevalence in Iberia of R-DF27 (R1b1a1a2a1a2a). This subclade is found in over 60% of the male population in the Basque Country and 40-48% in Madrid, Alicante, Barcelona, Cantabria, Andalucia, Asturias and Galicia. R-DF27 constitutes much more than the half of the total R1b in the Iberian Peninsula. Subsequent in-migration by members of other haplogroups and subclades of R1b did not affect its overall prevalence, although this falls to only two thirds of the total R1b in Valencia and the coast more generally. R-DF27 is also a significant subclade of R1b in parts of France and Britain. R-S28/R-U152 (R1b1a1a2a1a2b) is the prevailing subclade of R1b in Northern Italy, Switzerland and parts of France, but it represents less than 5.0% of the male population in Iberia. Ancient samples from the central European Bell Beaker culture, Hallstatt culture and Tumulus culture belonged to this subclade. R-S28/R-U152 is slightly significant in Seville, Barcelona, Portugal and Basque Country at 10-20% of the total population, but it is represented at frequencies of only 3.0% in Cantabria and Santander, 2.0% in Castille and Leon, 6% in Valencia, and under 1% in Andalusia. Sephardic Jews I1 0%	I2*/I2a 1%	I2 0%	Haplogroup R1a 5%	R1b 13%	 G 15% Haplogroup J2 2 25%	J*/J1 22%	 E-M2151b1b 9%	T 6%	Q 2%

Haplogroup J, mostly subclades of Haplogroup J-M172 (J2), is found at levels of over 20% in some regions, while Haplogroup E has a general frequency of about 10% – albeit with peaks surpassing 30% in certain areas. Overall, E-M78 (E1b1b1a1 in 2017) and E-M81 (E1b1b1b1a in 2017) both constitute about 4.0% each, with a further 1.0% from Haplogroup E-M123 (E1b1b1b2a1) and 1.0% from unknown subclades of E-M96. (E-M81 is widely considered to represent relatively historical migrations from North Africa).

Mitochondrial DNA


There have been a number of studies about the mitochondrial DNA haplogroups (mtDNA) in Europe. In contrast to Y DNA haplogroups, mtDNA haplogroups did not show as much geographical patterning, but were more evenly ubiquitous. Apart from the outlying Sami, all Europeans are characterized by the predominance of haplogroups H, U and T. The lack of observable geographic structuring of mtDNA may be due to socio-cultural factors, namely patrilocality and a lack of polyandry.

The subhaplogroups H1 and H3 have been subject to a more detailed study and would be associated to the Magdalenian expansion from Iberia c. 13,000 years ago:


 * H1 encompasses an important fraction of Western European mtDNA, reaching its local peak among contemporary Basques (27.8%) and appearing at a high frequency among other Iberians and North Africans. Its frequency is above 10% in many other parts of Europe (France, Sardinia, British Isles, Alps, large portions of Eastern Europe), and above 5% in nearly all the continent. Its subclade H1b is most common in eastern Europe and NW Siberia. So far, the highest frequency of H1 - 61%- has been found among the Tuareg of the Fezzan region in Libya.
 * H3 represents a smaller fraction of European genome than H1 but has a somewhat similar distribution with peak among Basques (13.9%), Galicians (8.3%) and Sardinians (8.5%). Its frequency decreases towards the northeast of the continent, though. Studies have suggested haplogroup H3 is highly protective against AIDS progression.

A 2007 European-wide study including Spanish Basques and Valencian Spaniards found Iberian populations to cluster the furthest from other continental groups, implying that Iberia holds the most ancient European ancestry. In this study, the most prominent genetic stratification in Europe was found to run from the north to the south-east, while another important axis of differentiation runs east–west across the continent. It also found, despite the differences, that all Europeans are closely related.

Spain

 * Frequencies of Y-DNA haplogroups in Spanish regions

Portugal


Excerpts from the Abstract of a study published in 2015:

"[...] In the case of Portugal, previous population genetics studies have already revealed the general portrait of HVS-I and HVS-II mitochondrial diversity, becoming now important to update and expand the mitochondrial region analysed. Accordingly, a total of 292 complete control region sequences from continental Portugal were obtained, under a stringent experimental design to ensure the quality of data through double sequencing of each target region.* Furthermore, H-specific coding region SNPs were examined to detail haplogroup classification and complete mitogenomes were obtained for all sequences belonging to haplogroups U4 and U5. In general, a typical Western European haplogroup or Atlantic modal haplotype (AMH) composition was found in mainland Portugal, associated to high level of mitochondrial genetic diversity. Within the country, no signs of substructure were detected. The typing of extra coding region SNPs has provided the refinement or confirmation of the previous classification obtained with EMMA tool in 96% of the cases. Finally, it was also possible to enlarge haplogroup U phylogeny with 28 new U4 and U5 mitogenomes."

The AMH reaches the highest frequencies in the Iberian Peninsula, in Great Britain and Ireland. In the Iberian Peninsula it reaches 70% in Portugal as a whole, with more than 90% in NW Portugal and nearly 90% in Galicia (NW Spain), while the highest value is to be found among the Basques (NE Spain).

The Atlantic modal haplotype (AMH) or haplotype 15 is a Y chromosome haplotype of Y-STR microsatellite variations, associated with the Haplogroup R1b. It was discovered prior to many of the SNPs now used to identify subclades of R1b and references to it can be found in some of the older literature. It corresponds most closely with subclade R1b1a2a1a(1) [L11].

The AMH is the most frequently occurring haplotype amongst human males in Atlantic Europe. It is characterized by the following marker alleles:
 * DYS388 12
 * DYS390 24
 * DYS391 11
 * DYS392 13
 * DYS393 13
 * DYS394 14 (also known as DYS19)