User:Slrubenstein/molecular genetics

Molecular genetics: lineages and clusters
With the recent availability of large amounts of human genetic data from many geographically distant human groups scientists have again started to investigate the relationships between people from various parts of the world. One method is to investigate DNA molecules that are passed down from mother to child (mtDNA) or from father to son (Y chromosomes), these form molecular lineages and can be informative regarding prehistoric population migrations. Alternatively autosomal alleles are investigated in an attempt to understand how much genetic material groups of people share. This work has led to a debate amongst geneticists, molecular anthropologists and medical doctors as to the validity of conceps such as "race". Some researchers insist that classifying people into groups based on ancestry may be important from medical and social policy points of view, and claim to be able to do so accurately. Others claim that individuals from different groups share far too much of their genetic material for group membership to have any medical implications. This has reignited the scientific debate over the validity of human classification and concepts of "race".

Molecular lineages, Y chromosomes and mitochondrial DNA
Mitochondria are intracellular organelles that contain DNA, this mitochondrial DNA (mtDNA) is passed in a direct female line of descent from mother to child. Human Y chromosomes are male specific sex chromosomes, any human that possesses a Y chromosome will be morphologically male. Y chromosomes are therefore passed from father to son. When a mutation arises in mtDNA or Y chromosome it is passed down a specific maternal or paternal line and because mutations accumulate on these molecules they can be used to identify specific molecular lineages. These mutations are derived from copying mistakes, when the DNA is copied it is possible that a single mistake occurs in the DNA sequence, these single mistakes are called single nucleotide polymorphisms (SNPs).



Mitochondrial DNA and Y chromosome research has produced three reproducible observations relevant to race and human evolution.

Firstly all mtDNA and Y chromosome lineages derive from a common ancestral molecule. For mtDNA this ancestor is estimated to have lived about 140,000-290,000 years ago (Mitochondrial Eve), while for Y chromosomes the ancestor is estimated to have lived about 70,000 years ago (Y chromosome Adam). These observations are robust, and the individuals that originally carried these ancestral molecules are the direct female and male line most recent common ancestors of all extant anatomically modern humans. The observation that these are the direct female line and male line ancestors of all living humans should not be interpreted as meaning that either was the first anatomically modern human. Nor should we assume that there were no other modern humans living concurrently with mitochondrial Eve or Y chromosome Adam. A more reasonable explanation is that other humans who lived at the same time did indeed reproduce and pass their genes down to extant humans, but that their mitochondrial and Y chromosomal lineages have been lost over time, probably due to random events (e.g. producing only male or female children). It is impossible to know to what extent these non-extant lineages have been lost, or how much they differed from the mtDNA or Y chromosome of our maternal and paternal lineage MRCA. The difference in dates between Y chromosome Adam and mitochondrial Eve is usually attributed to a higher extinction rate for Y chromosomes. This is probably because a few very successful men produce a great many children, while a larger number of less successful men will produce far fewer children.

Secondly mtDNA and Y chromosome work supports a recent African origin for anatomically modern humans, with the ancestors of all extant modern humans leaving Africa somewhere between 100,000 - 50,000 years ago.

Thirdly studies show that specific types (haplogroups) of mtDNA or Y chromosomes do not always cluster by geography, ethnicity or race, implying multiple lineages are involved in founding modern human populations, with many closely related lineages spread over large geographic areas, and many populations containing distantly related lineages. Keita et al. (2004) say, with reference to Y chromosome and mtDNA studies and their relevance to concepts of "race":

How much are genes shared? Clustering analyses and what they tell us
Human genetic variation is not distributed uniformly throughout the global population, the global range of human habitation means that there are great distance between some human populations (e.g. between South America and Southern Africa) and this will reduce gene flow between these populations. On the other hand environmental selection is also likely to play a role in differences between human populations. Conversely it is now believed that the majority of genetic differences between populations is selectively neutral. The existence of differences between peoples from different regions of the world is relevant to discussions about the concept of "race", some biologists believe that the language of "race" is relevant in describing human genetic variation. It is now possible to reasonably estimate the continents of origin of an individual's ancestors based on genetic data

Richard Lewontin has claimed that "race" is a meaningless classification because the majority of human variation is found within groups (~85%), and therefore two individuals from different "races" are almost as likely to be as similar to each other as either is to someone from their own "race". In 2003 A. W. F. Edwards rebuked this argument, claiming that Lewontin's conclusion ignores the fact that most of the information that distinguishes populations is hidden in the correlation structure of the data and not simply in the variation of the individual factors (see Infobox: Multi Locus Allele Clusters). Edwards concludes that "It is not true that 'racial classification is ... of virtually no genetic or taxonomic significance' or that 'you can't predict someone’s race by their genes'." Researchers such as Neil Risch and Noah Rosenberg have argued that a person's biological and cultural background may have important implications for medical treatment decisions, both for genetic and non-genetic reasons.

The results obtained by clustering analyses are dependent on several criteria:
 * The clusters produced are relative clusters and not absolute clusters, each cluster is the product of comparisons between sets of data derived for the study, results are therefore highly influenced by sampling strategies. (Edwards, 2003)
 * The geographic distribution of the populations sampled, because human genetic diversity is marked by isolation by distance, populations from geographically distant regions will form much more discrete clusters than those from geographically close regions. (Kittles and Weiss, 2003)
 * The number of genes used. The more genes used in a study the greater the resolution produced and therefore the greater number of clusters that will be identified. (Tang, 2005)

Rosenberg et al.'s (2002) paper "Genetic Structure of Human Populations." especially was taken up by Nicholas Wade in the New York Times as evidence that genetics studies supported the "popular conception" of race. On the other hand Rosenberg's work used samples from the Human Genome Diversity Project (HGDP), a project that has collected samples from individuals from 52 ethnic groups from various locations around the world. The HGDP has itself been criticised for collecting samples on an "ethnic group" basis, on the grounds that ethnic groups represent constructed categories rather than categories which are solely natural or biological. Scientists such as the molecular anthropologist Jonathan Marks, the geneticists David Serre, Svante Pääbo, Mary-Claire King and medical doctor Arno G. Motulsky argue that this is a biased sampling strategy, and that human samples should have been collected geographically, i.e. that samples should be collected from points on a grid overlaying a map of the world, and maintain that human genetic variation is not partitioned into discrete racial groups (clustered), but is spread in a clinal manner (isolation by distance) that is masked by this biased sampling strategy. The existence of allelic clines and the observation that the bulk of human variation is continuously distributed, has led scientists such as Kittles and Weiss (2003) to conclude that any categorization schema attempting to partition that variation meaningfully will necessarily create artificial truncations. It is for this reason, Reanne Frank argues, that attempts to allocate individuals into ancestry groupings based on genetic information have yielded varying results that are highly dependent on methodological design.

In a follow up paper "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure" in 2005, Rosenberg et al. maintain that their clustering analysis is robust. But they also agree that there is evidence for clinality (isolation by distance). Thirdly they distance themselves from the language of race, and do not use the term "race" in any of their publications: "The arguments about the existence or nonexistence of 'biological races' in the absence of a specific context are largely orthogonal to the question of scientific utility, and they should not obscure the fact that, ultimately, the primary goals for studies of genetic variation in humans are to make inferences about human evolutionary history, human biology, and the genetic causes of disease."

One of the underlying questions regarding the distribution of human genetic diversity is related to the degree to which genes are shared between the observed clusters, and therefore the extent that membership of a cluster can accurately predict an individuals genetic makeup or susceptibility to disease. This is at the core of Lewontin's argument. Lewontin used Sewall Wright's Fixation index (FST), to estimate that on average 85% of human genetic diversity is contained within groups. Are members of the same cluster always more genetically similar to each other than they are to members of a different cluster? Lewontin's argument is that within group differences are almost as high as between group differences, and therefore two individuals from different groups are almost as likely to be more similar to each other than they are to members of their own group. Can clusters correct for this finding? In 2004 Bamshad et al. used the data from Rosenberg et al. (2002) to investigate the extent of genetic differences between individuals within continental groups relative to genetic differences between individuals between continental groups. They found that though these individuals could be classified very accurately to continental clusters, there was a significant degree of genetic overlap on the individual level.

This question was addressed in more detail in a 2007 paper by Witherspoon et al. entitled "Genetic Similarities Within and Between Human Populations". Where they make the following observations:


 * Genetic differences between human continental populations account for only a small fraction of the differences between people.
 * Multilocus clusters provide accurate and reproducible results for dividing people into the correct populations.
 * Two individuals from different populations are often more genetically alike to each other than they are to individuals from their own population.

The paper states that "All three of the claims listed above appear in disputes over the significance of human population variation and 'race'" and asks "If multilocus statistics are so powerful, then how are we to understand this [last] finding?"

Witherspoon et al. (2007) attempt to reconcile these apparently contradictory findings, and show that the observed clustering of human populations into relatively discrete groups is a product of using what they call "population trait values". This means that each individual is compared to the "typical" trait for several populations, and assigned to a population based on the individual's overall similarity to one of the populations as a whole. They therefore claim that clustering analyses cannot necessarily be used to make inferences regarding the similarity or dissimilarity of individuals between or within clusters, but only for similarities or dissimilarities of individuals to the "trait values" of any given cluster. The paper measures the rate of misclassification using these "trait values" and calls this the "population trait value misclassiﬁcation rate" (CT). The paper investigates the similarities between individuals by use of what they term the "dissimilarity fraction" (ω): "the probability that a pair of individuals randomly chosen from different populations is genetically more similar than an independent pair chosen from any single population." Witherspoon et al. show that two individuals can be more genetically similar to each other than to the typical genetic type of their own respective populations, and yet be correctly assigned to their respective populations. An important observation is that the likelihood that two individuals from different populations will be more similar to each other genetically than two individuals from the same population depends on several criteria, most importantly the number of genes studied and the distinctiveness of the populations under investigation. For example when 10 loci are used to compare three geographically disparate populations (sub-Saharan African, East Asian and European) then individuals are more similar to members of a different group about 30% of the time. If the number of loci is increased to 100 individuals are more genetically similar to members of a different population ~20% of the time, and even using 1000 loci, ω ~ 10%. They do stated that for these very geographically separated populations it is possible to reduce this statistic to 0% when tens of thousands of loci are used. That means that individuals will always be more similar to members of their own population. But the paper notes that humans are not distributed inot geographically separated populations, omitting intermediate regions may produce a false distinctiveness for human diversity. The paper supports the observation that "highly accurate classification of individuals from continuously sampled (and therefore closely related) populations may be impossible". Furthermore the results indicate that clustering analyses and self reported ethnicity may not be good estimates for genetic susceptibility to disease risk. Witherspoon et al. conclude that: