Genetic studies on Gujarati people

The study of the genetics and archaeogenetics of the Gujarati people of India aims at uncovering these people's genetic history. According to the 1000 Genomes Project, "Gujarati" is a general term used to describe people who trace their ancestry to the region of Gujarat, located in the northwestern part of the Indian subcontinent, and who speak the Gujarati language, an Indo-European language. They have some genetic commonalities as well as differences with other ethnic groups of India.

DNA studies
"Our analysis of haplotype similarity at the SLC24A5 region [which affects skin pigmentation] between the Gujarati Indians also indicated greater degree of sharing with the southern Europeans than with the south Tamil Indians." "One of the GIH subgroups fall outside the main gradient of Indian groups, suggesting that they harbor substantial ancestry that is not a simple mixture of ASI and ANI. A speculative hypothesisis that some Gujarati groups descend from the founders of the “Gurjara Pratihara” empire, which is thought to have been founded by Central Asian invaders in the 7th century A.D. and to have ruled parts of northwest India from the 7-12th centuries. I. Karve noted that endogamous groups with names like “Gurjar” are now distributed throughout the northwest of the subcontinent, and hypothesized that that they likely trace their names to this invading group."
 * According to a 2017 study by European geneticists (Silva et al., 2017), Gujaratis carry predominantly Ancestral North Indian (West Eurasian) genes. The study shows that Gujaratis, Rajasthanis, and Pakistanis are genetically part of Western South Asia. It also mentions:
 * The main non-indigenous component in the Subcontinent, the Iran/Caucasus/Steppe or Caucasus hunter-gatherer component, exceeds 35% in Gujarat. (This component peaks in ancient remains from the Caucasus and Iran at ~100%, and is found in Bronze Age Yamnaya from the Pontic-Caspian steppe at ~50%.)
 * Gujaratis appear to be much more diverse genetically than other South Asians.
 * Gujaratis are proximate to people from South-West Asia, due to high levels of ANI ancestry, which may have arrived in two waves.
 * According to a 2017 study by geneticists from Mangalore (D'Cunha et al., 2017), phylogenetic analysis shows a separation of Dravidian population (sampled from coastal towns of Karnataka and Kerala) from Gujarati population. According to this study, recent studies show that the data on Gujaratis "may not be representative for Indians on whole, particularly for the Dravidian population of southern India". The Dravidian language-speaking individuals included in this study segregated distinctly into a tight cluster away from Indo-Aryan language speakers." The proximity of Gujaratis to Europeans "supported the existing view of European influences in the genetic composition of north Indian populations".
 * According to a 2015 study by British and Pakistani geneticists (Ayub et al., 2015), Europeans split from South Asians (represented by Gujaratis) around 8,000 years ago in the Neolithic period, as supported by LD decay. This study divided humans into five distinct clusters based on allele-frequency differences: (1) Africans, (2) a widespread group including Europeans, Middle Easterners, and South Asians, (3) East Asians, (4) Oceanians, and (5) Native Americans.
 * A 2014 study by Singaporean geneticists (Ali et al., 2014) studied the differences between two sub-populations of India, the Indo-European-speaking Gujaratis (labelled as North Indians) and the Dravidian-speaking Tamils (labelled as South Indians), and found that one of the most apparent differences between them is in skin complexion, with North Indians being much fairer, which points to gene flow from Europe to North India. The study also mentions:
 * A 2013 study by geneticists from California (Pemberton et al., 2013) observes that Gujarati individuals form a distinct cluster from other South Asian groups, while all other South Asian groups cluster together. This is "consistent with a neighbor-joining analysis of the combined Asian Indian and CGP data sets that found 100% bootstrap support for a Gujarati grouping."
 * A 2012 study by Irish and Portuguese geneticists (Magalhães et al., 2012) identified five large blocks of similarity: 1) Africa, 2) North Africa, Middle East, Europe and Central-South Asia, 3) East Asia, 4) Americas, and 5) Oceania. The biggest of these is the NA, ME, Europe, and CSA block which corresponds to the Indo-European continental group.
 * Gujaratis were included in the Central South Asia block along with Pakistanis, and the highest AMid (a tool to measure similarity) for Gujaratis is Pathan.
 * The delayed expansion hypothesis was suggested, according to which Gujaratis are descended from an ancestral Eurasian founding population which was isolated long after the Out-of-Africa diaspora before expanding throughout Eurasia.
 * Gujaratis and Burusho showed the highest AMids to East Asians in the CSA block, and were thus said to co-cluster. The CSA was divided into two main groups: Balochi/Brahui/Makrani and another with Burusho and Gujarati in a single cluster.
 * According to a 2011 study by European and Indian geneticists (Metspalu et al., 2011), samples from Uttar Pradesh are more spread toward South Indians than Gujaratis are.
 * According to a 2010 study by Chinese and American geneticists (Xing et al., 2010), Brahmins and Gujaratis show a closer relationship to Europeans while Dravidian-speaking tribals (Irulas from Andhra Pradesh) show a closer relationship to East Asians. Among Indian populations, the largest genetic distance is between Gujaratis and the tribal Irulas.
 * A 2010 study by geneticists from Greece and New York (Paschou et al., 2010) analyzed genetic data from thousands of individuals and investigated human population structure by using Principal Component Analysis, and concluded that the world can be divided into five broad regions: (1) Africa, (2) Europe-Middle East-Central South Asia, (3) East Asia, (4) Oceania, and (5) America. Regarding Gujaratis, the study mentioned that "Gujarati Indians (GIH), originating from Gujarat (the most western state of India and immediately adjacent to Pakistan) are easily placed in Central South Asia where they are classified as Pakistanis".
 * A 2009 study by geneticists from Massachusetts and Hyderabad (Reich et al., 2009) found high substructure in Gujarati Indian American (GIH) samples.

Autosomal DNA components
The 1000 Genomes Project collected 117 samples of unrelated Gujarati people living in Houston, Texas (abbreviated GIH). These samples were analyzed in a 2016 study by South African and Indian geneticists (Sengupta et al., 2016), who divided them into two subgroups, and used admixture to estimate the proportion of inferred ancestral component for each subgroup. Based on genetic and geographical affinities, Gujaratis and Gujarati Brahmins were placed in the Northwest Indian subcontinent in supplementary table 1 of this study.

A separate study by geneticists from West Bengal (Basu et al., 2016) estimated the following ancestral proportions for Gujarati Brahmins, and classified them in northwest India:

Most South Asians carry both the Ancestral North Indian (ANI) component, which is closely related to those in Central Asia, West Asia and Europe, and the Ancestral South Indian (ASI) component, which is restricted to South Asia.

Sub-haplogroup U
Over 33% of all mitochondrial genetic markers of the population of Gujarat originate from West Eurasia. The mitochondrial DNA (mtDNA) sub-haplogroup U7 is common in Gujarat. It is found in over 12% of the population, higher than in Punjab (9%), Pakistan, Iran, Afghanistan, or anywhere else. U7 is only found in frequencies between 0% and 0.9% in other populations of India. The study by Quintana-Murci et al. (2004) of 34 Gujaratis found the presence of U2a and U7 sub-haplogroups (8.8%) followed by U2b (5.9%) and U2c (2.9%).

Other sub-haplogroups of R
The study by Quintana-Murci et al. (2004) of Gujaratis found the presence of R* (8.8%), H (5.9%), J1(2.9%).

Other minor haplogroups of N
The study by Quintana-Murci et al. (2004) found presence of haplogroup W (8.8%) in Gujaratis as well as other northwestern states like Punjab and Kashmir. Haplogroup W is descended from the haplogroup N2. They also found N* (2.9%) in Gujaratis.

The mtDNA haplogroup HV is found in at least one Patel Gujarati family which is also found in some Tamil Brahmins.

Y chromosome
The Y chromosome DNA (Y-DNA) of some Gujarati males has haplogroup R1a, specifically R-L657 (the L657 subclade). Other has R-Y874 (the Y874 subclade of R1a1a1) which also found in Telugu people. Other common finding is haplogroup J2b2 (J-M241). The SNP mutation referred to as Y951 is also found in Gujarati people as well as Punjabis. The haplogroup C-M356 (C1b, previously called C5a, within which C-P92 is a subclade defined by the P92 mutation) is also found among some Gujarati people.

A set of 48 bi-allelic markers on the non-recombining region of Y chromosome (NRY) were analysed in 284 males; representing nine Indo-European speaking tribal populations of South Gujarat. The genetic structure of the populations revealed that none of these groups was overtly admixed or completely isolated. However, elevated haplogroup diversity and FST value point towards greater diversity and differentiation which suggests the possibility of early demographic expansion of the study groups. The phylogenetic analysis revealed 13 paternal lineages, of which six haplogroups: C5, H1a*, H2, J2, R1a1* and R2 accounted for a major portion of the Y chromosome diversity. The higher frequency of the six haplogroups and the pattern of clustering in the populations indicated overlapping of haplogroups with West and Central Asian populations. Other analyses undertaken on the population affiliations revealed that the Indo-European speaking populations along with the Dravidian speaking groups of southern India have an influence on the tribal groups of Gujarat. The vital role of geography in determining the distribution of Y lineages was also noticed. This implies that although language plays a vital role in determining the distribution of Y lineages, the present day linguistic affiliation of any population in India for reconstructing the demographic history of the country should be considered with caution.

R1a origins in Gujarat
Following the 2010 discovery that the oldest strain of the R1a1a branch was concentrated in the Gujarat-Sindh-Western Rajasthan region, it has been suggested that this location may have been the origin of this genetic group. The genetic similarities between North Indians and Eastern Europeans are possibly due to having a common ancestor. R1b, the most common haplogroup in Western Europe, has a relatively low concentration in India, and it possibly originated in the Persian Gulf. It is possible that these haplogroups originated from two major genetic dispersals from the Persian Gulf-Makran-Gujarat region.

Genetics
The pericentric inversion of the Y-chromosome (inv(Y)) is a rare chromosomal heteromorphism which is hereditary. It has no clinical significance or reproductive disadvantage. It is studied in Gujarati Muslims of South Africa, 8 inv(Y) men opposed to 9 normal men. The p49a/TaqI and p49a/PvuII haplotypes were determined and found that 8 inv(Y) possessed identical TaqI and PvuII where as 7 different TaqI and 8 different PvuII haplotypes observed in the 9 normal men. So It is concluded that inv(Y) has common genetic origin in Gujarati Muslims of South Africa. The origin is traced to Kholvad, a small village near Surat, and some neighbouring villages. It is probably originated through random genetic drift in reproductively isolated community, maintained by strict endogamy based on religious and linguistic affiliations.
 * Inverted Y chromosome polymorphism in Gujarati Muslims of South Africa

Vitiligo is a long term skin condition characterized by patches of the skin losing their pigment. It affects 1—2% of the population of world while 0.5—2.5% population of India. But Gujarat and Rajasthan has the highest prevalence (~8.8%). Rasheedunnisa Begum et al. conducted genetic studies of over 1500 patients from Gujarat and found an SNP variation in the autosomal DNA of Gujaratis which make them more prone to Vitiligo.
 * Vitiligo in Gujaratis


 * Deletion β° thalassaemia

A study suggested the Indian deletion β° type (600 bp deletion involving the 3’ end) thalassaemia has single origin as all 23 patients of it in study were from either Sindh or Gujarat who had identical haplotypes.

The Leicestershire Perinatal Mortality Survey for the years 1976 to 1982 found high incidences of Meckel syndrome in Gujarati immigrants.
 * Meckel syndrome

Archeogenetics
In India, caste is a form of social stratification characterized by endogamy. Within caste, there are endogamous groups known as gol (marriage circle) by Gujarati people. In it, there are small number of exogamous lineages known as gotra. A person can not marry within gotra as well as outside gol. Gujarati Patels practice this "exogamic endogamy" which is also found elsewhere in India. One such gol known as Chha Gaam Patels (CGP) include people from six villages of Charotar region of Gujarat. A study found their genetic similarities as well as confirms their patrilocal and patrilineal practices within the group. It also confirms low-level of female gene inflow within the group. The study illustrates impact of culture marriage practice on genetic variation in Indian population.
 * Impact of cultural marriage practices on genetic variation

The Parsis have sharp contrast between genetic data obtained from mitochondrial DNA (mtDNA), a maternal component, and Y-chromosome DNA (Y-DNA), a paternal component. According to Y-DNA, they resemble the Iranian population, which supports their historical origin from Iran. But about 60% of their maternal gene pool originates from South Asian haplogroups, which is just 7% in Iranians. Parsis have high frequency of haplogroup M of mtDNA (55%), similar to Indians, which is just 1.7% in combined Iranian sample. Due to high diversity in Y-DNA and mtDNA lineages, the strong drift effect is unlikely even though they had small population. The studies suggest a male-mediated migration of Parsi ancestors from Iran to Gujarat where they admixed with local female population which ultimately resulting in loss of Iranian mtDNA.
 * Parsis

Dawoodi Bohras of Gujarat show 47% genetic contribution from West Asia, especially Iran; followed by 30% from Arabia and from 23% closest Hindu parental populations. This shows considerable genetic flow from West Asia in them.
 * Dawoodi Bohras

Four tribal populations (Chaudhari, Vasava, Kotwalia and Gamit) of the Surat district in South Gujarat were studied for the distribution of 22 polymorphic systems of the blood. The studies suggested that they are considerably hetero-genetic and have small genetic difference among them which is due to genetic variation. It also showed that Vasava and Kotwalia are closely related genetically, and Chaudhari and Gamit are different from them.
 * Gujarati tribal

According to a 2016 study on "Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset", It shows us that, Gujaratis (GIH Samples in 1000 genomes project) are closer to PJL (Punjabis from Lahore) samples (Fst=0.0036) and ITU (Telugu) samples (Fst=0.0039) than they are to STU (Sri Lankan Tamil) samples (Fst=0.0044) and BEB (Bengali) samples (Fst=0.0045).
 * Genetic distance(Fst) between Gujaratis and other South Asians