User:WeijiBaikeBianji/sandbox8

The relationship between race and genetics is relevant to the controversy concerning race. In everyday life many societies classify populations into groups based on phenotypical traits and impressions of probable geographic ancestry and socio-economic status - these are the groups we tend to call "races". Because the patterns of variation of human genetic traits are clinal, with a gradual change in trait frequency between population clusters, it is possible to statistically correlate clusters of physical traits with individual geographic ancestry. The frequencies of alleles tend to form clusters where populations live closely together and interact over periods of time. This is due to endogamy within kin groups and lineages or national, cultural or linguistic boundaries. This causes genetic clusters to correlate statistically with population groups when a number of alleles are evaluated. Different clines align around the different centers, resulting in more complex variations than those observed comparing continental groups.

For example if a person has light skin, light hair and blue eyes, a combination of traits that seems to have evolved in Northern Europe and is found at a high frequency there, it is probable that person has some recent European ancestry. And by extension, according to the racial categories in use in North America that person is likely to be classified by others, and to self-identify, as "white". In a similar way, Genetic analysis enables us to determine the geographic ancestry of a person pinpointing the migrational history of a person's ancestors with a high degree of accuracy, and by inference the probable racial category into which they will be classified in a given society. In that way there is a distinct statistical correlation between gene frequencies and racial categories. However, because all populations are genetically diverse, and because there is a complex relation between ancestry, genetic makeup and phenotype, and because racial categories are based on subjective evaluations of the traits, it is not the case that there are any specific genes, that can be used to determine a person's race.

Research in genetics offers a means to classify humans which is more precise than broad phenotypically based racial categories, given that genetics can provide a much more complex analysis of individual genetic makeup and geographic ancestry, than self identified membership of a racial category. With a blood transfusion, for example, it is vital to know the genetically determined blood type of the donor and recipient, but it is not helpful to know their respective geographic ancestries. Most physical anthropologists consider race to be primarily a social category that does not correspond significantly with biological variation, but some anthropologists, particularly forensic anthropologists, consider race a useful biological category. They argue that it is possible to determine race from physical remains with a reasonable degree of certainty; what is identified is the geographic phenotype. Medical practitioners also sometimes argue that racial categories can be used successfully as proxies to assess risk of those different heritable illnesses that occur with different frequencies among populations of different geographic ancestries. Others argue that this use may be problematic because it risks underestimating risks of individuals from ethno-racial categories that are not considered high-risk, and to overestimate the risk in populations that are, resulting in stigmatization.

Genetic variation
Genetic variation arises from mutations, migration between populations (gene flow) and from the reshuffling of genes through sexual reproduction. Variation is counteracted by natural selection and genetic drift for example founder effect, when a population is founded small number of initial founders and hence has a correspondingly small degree of genetic variation. Epigenetic inheritance are heritable changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the DNA sequence.

Human phenotypes are highly polygenic (dependent on interaction by many genes) and are influenced by environment as well as genetics.

Nucleotide diversity is based on single mutations, single nucleotide polymorphisms (SNPs). The nucleotide diversity between humans is about 0.1 percent (one difference per one thousand nucleotides between two humans chosen at random). This amounts to approximately three million SNPs (since the human genome has about three billion nucleotides). There are an estimated ten million SNPs in the human population.

Research has shown that non-SNP (structural) variation accounts for more human genetic variation than single nucleotide diversity. Structural variation includes copy-number variation and results from deletions, inversions, insertions and duplications. It is estimated that approximately 0.4 percent of the genomes of unrelated people differ, apart from copy number. When copy-number variation is included, human-to-human genetic variation is estimated to be at least 0.5 percent.

Trait, protein and gene studies
Early classification attempts measured surface traits. Before the discovery of DNA, scientists used blood proteins (the human blood group systems) to study human genetic variation. Research by Ludwik and Hanka Herschfeld during World War I found that the incidence of blood groups A and B differed by region; for example, among Europeans 15 percent were group B and 40 percent group A. Eastern Europeans and Russians had a higher incidence of group B; people from India had the greatest incidence. The Herschfelds concluded that humans comprised two "biochemical races", originating separately. It was hypothesized that these two races later mixed, resulting in the patterns of groups A and B. This was one of the first theories of racial differences to include the idea that human variation did not correlate with genetic variation. It was expected that groups with similar proportions of blood groups would be more closely related, but instead it was often found that groups separated by great distances (such as those from Madagascar and Russia), had similar incidences. Researchers currently use genetic testing, which may involve hundreds (or thousands) of genetic markers or the entire genome.

Structure
Several methods to examine and quantify genetic subgroups exist, including cluster and principal components analysis. Genetic markers from individuals are examined to find a population's genetic structure. While subgroups overlap when examining variants of one marker only, when a number of markers are examined different subgroups have different average genetic structure. An individual may be described as belonging to several subgroups. These subgroups may be more or less distinct, depending on how much overlap there is with other subgroups.

In cluster analysis, the number of clusters to search for K is determined in advance; how distinct the clusters are varies. The results obtained from cluster analyses depend on several factors:
 * A large number genetic markers studied facilitates finding distinct clusters.
 * Some genetic markers vary more than others, so fewer are required to find distinct clusters. Ancestry-informative markers exhibits substantially different frequencies between populations from different geographical regions. Using AIMs, scientists can determine a person's ancestral continent of origin based solely on their DNA. AIMs can also be used to determine someone's admixture proportions.
 * The more individuals studied, the easier it becomes to detect distinct clusters (statistical noise is reduced).
 * Low genetic variation makes it more difficult to find distinct clusters. Greater geographic distance generally increases genetic variation, making identifying clusters easier.
 * A similar cluster structure is seen with different genetic markers when the number of genetic markers included is sufficiently large. The clustering structure obtained with different statistical techniques is similar. A similar cluster structure is found in the original sample with a subsample of the original sample.

Distance
Genetic distance is genetic divergence between species or populations of a species. It may compare the genetic similarity of related species, such as humans and chimpanzees. Within a species, genetic distance measures divergence between subgroups.

Genetic distance significantly correlates to geographic distance between populations, a phenomenon sometimes known as "isolation by distance". Genetic distance may be the result of physical boundaries restricting gene flow such as islands, deserts, mountains or forests.

Genetic distance is measured by the fixation index (FST). FST is the correlation of randomly chosen alleles in a subgroup to a larger population. It is often expressed as a proportion of genetic diversity. This comparison of genetic variability within (and between) populations is used in population genetics. The values range from 0 to 1; zero indicates the two populations are freely interbreeding, and one would indicate that two populations are separate.

History and geography
Cavalli-Sforza has described two methods of ancestry analysis. Current-population genetic structure does not imply that differing clusters or components indicate only one ancestral home per group; for example, a genetic cluster in the US comprises Hispanics with European, Native American and African ancestry.

Geographic analyses attempt to identify places of origin, their relative importance and possible causes of genetic variation in an area. The results can be presented as maps showing genetic variation. Cavalli-Sforza and colleagues argue that if genetic variations are investigated, they often correspond to population migrations due to new sources of food, improved transportation or shifts in political power. For example, in Europe the most significant direction of genetic variation corresponds to the spread of agriculture from the Middle East to Europe between 10,000 and 6,000 years ago. Such geographic analysis works best in the absence of recent large-scale, rapid migrations.

Historic analyses use differences in genetic variation (measured by genetic distance) as a molecular clock indicating the evolutionary relation of species or groups, and can be used to create evolutionary trees reconstructing population separations.

Validation
Results of genetic-ancestry research are supported if they agree with research results from other fields, such as linguistics or archeology. Cavalli-Sforza and colleagues have argued that there is a correspondence between language families found in linguistic research and the population tree they found in their 1994 study. There are generally shorter genetic distances between populations using languages from the same language family. Exceptions to this rule are Sami, who are genetically associated with populations speaking languages from other language families. The Sami speak a Uralic language, but are genetically primarily European. This is argued to have resulted from migration (and interbreeding) with Europeans while retaining their original language. Agreement also exists between research dates in archeology and those calculated using genetic distance.

Ancestral populations
A 1994 study by Cavalli-Sforza and colleagues evaluated genetic distances among 42 native populations based on 120 blood polymorphisms. The populations were grouped into nine clusters: African (sub-Saharan), Caucasoid (European), Caucasoid (extra-European), northern Mongoloid (excluding Arctic populations), northeast Asian Arctic, southern Mongoloid (mainland and insular Southeast Asia), Pacific islander, New Guinean and Australian, and American (Amerindian). Although the clusters demonstrate varying degrees of homogeneity, the nine-cluster model represents a majority (80 out of 120) of single-trait trees and is useful in demonstrating the historic phylogenetic relationship among these populations.

The greatest genetic distance between two continents is between Africa and Oceania, at 0.2470. Based on physical appearance this is counterintuitive, since indigenous Australians and New Guineans resemble Africans (with dark skin and curly hair). This measure of genetic distance reflects the isolation of Australia and New Guinea since the end of the last glacial maximum, when the continent was isolated from mainland Asia due to rising sea levels. The next-largest genetic distance is between Africa and the Americas, at 0.2260. This is expected, since the longest geographic distance by land is between Africa and South America. The shortest genetic distance, 0.0155, is between European and extra-European Caucasoids. Africa is the most genetically divergent continent, with all other groups more related to each other than to sub-Saharan Africans. This is expected, according to the single-origin hypothesis. Europe has a general genetic variation about three times less than that of other continents; the genetic contribution of Asia and Africa to Europe is thought to be two-thirds and one-third, respectively.

Recent studies have been published using an increasing number of genetic markers.

Population structures
Definitions of race are rooted in taxonomic classifications first developed in 18th- and 19th-century Europe. Race has overlapped with a debate about species known as the species problem.

Since the 1960s scientists have understood race as a social construct imposed on phenotypes in culturally determined ways, rather than a biological concept. A 2000 study by Celera Genomics found that human DNA does not differ significantly across populations. Citizens of any village in the world, in Scotland or Tanzania, have 90 percent of the genetic variability humanity has to offer. Only .01 percent of genes account for a person's appearance. Biological adaptation plays a role in bodily features and skin type. According to Luigi Luca Cavalli-Sforza, "From a scientific point of view, the concept of race has failed to obtain any consensus; none is likely, given the gradual variation in existence. It may be objected that the racial stereotypes have a consistency that allows even the layman to classify individuals. However, the major stereotypes, all based on skin color, hair color and form, and facial traits, reflect superficial differences that are not confirmed by deeper analysis with more reliable genetic traits and whose origin dates from recent evolution mostly under the effect of climate and perhaps sexual selection".

Group size
Research techniques can be used to detect genetic population differences if enough genetic markers are used; the Japanese and Chinese East Asian populations have been identified. Sub-Saharan Africans have greater genetic diversity than other populations.

Between-group genetics
In 1972, Richard Lewontin performed a FST statistical analysis using 17 markers (including blood-group proteins). He found that the majority of genetic differences between humans (85.4 percent) were found within a population, 8.3 percent were found between populations within a race and 6.3 percent were found to differentiate races (Caucasian, African, Mongoloid, South Asian Aborigines, Amerinds, Oceanians, and Australian Aborigines in his study). Since then, other analyses have found FST values of 6–10 percent between continental human groups, 5–15 percent between different populations on the same continent and 75–85 percent within populations.

While acknowledging Lewontin's observation that humans are genetically homogeneous, A. W. F. Edwards in his 2003 paper "Human Genetic Diversity: Lewontin's Fallacy" argued that information distinguishing populations from each other is hidden in the correlation structure of allele frequencies, making it possible to classify individuals using mathematical techniques. Edwards argued that even if the probability of misclassifying an individual based on a single genetic marker is as high as 30 percent (as Lewontin reported in 1972), the misclassification probability nears zero if enough genetic markers are studied simultaneously. Edwards saw Lewontin's argument as based on a political stance, denying biological differences to argue for social equality.

In The Ancestor's Tale Richard Dawkins devotes a chapter to the subject of race and genetics. After an extensive discussion of race and how the term is not well defined, Dawkins turns to the genetics of race. Dawkins describes the relatively low genetic variation between races, and geneticists conclusion that race is not an important aspect of a person. These conclusions echo those of Lewontin, and Dawkins characterizes this view as scientific orthodoxy. However, Dawkins felt that reasonable genetic conclusions had been tainted by Lewontin's politics. Dawkins accepted Lewontin's position that our perception of relatively large differences between human races and subgroups, as compared to the variation within these groups, is a biased perception and that human races and populations are remarkably similar to each other, with the largest part by far of human variation being accounted for by the differences between individuals. Dawkins' also agreed with Lewontin that racial classification had no social value, and was in fact destructive. Together with Edwards, Dawkins disagreed with Lewontin that this means race is of "virtually no genetic or taxonomic significance" and summarized Edwards' point that however small the racial partition of the total variation may be, if such racial characteristics as there are highly correlated with other racial characteristics, they are by definition informative, and therefore of taxonomic significance. Dawkins went on to concludes that racial classification informs us about no more than the traits common used to classify race: the superficial, external traits like eye shape and skin color.

While acknowledging that FST remains useful, a number of scientists have written about other approaches to characterizing human genetic variation. Long & Kittles (2009) stated that FST failed to identify important variation and that when the analysis includes only humans, FST = 0.119, but adding chimpanzees increases it only to FST = 0.183. Mountain & Risch (2004) argued that an FST estimate of 0.10-0.15 does not rule out a genetic basis for phenotypic differences between groups and that a low FST estimate implies little about the degree to which genes contribute to between-group differences. Pearse & Crandall 2004 wrote that FST figures cannot distinguish between a situation of high migration between populations with a long divergence time, and one of a relatively recent shared history but no ongoing gene flow.

Anthropologists (such as C. Loring Brace), philosopher Jonathan Kaplan and geneticist Joseph Graves have argued that while it is possible to find biological and genetic variation roughly corresponding to race, this is true for almost all geographically distinct populations: the cluster structure of genetic data is dependent on the initial hypotheses of the researcher and the populations sampled. When one samples continental groups, the clusters become continental; with other sampling patterns, the clusters would be different. Weiss and Fullerton note that if one sampled only Icelanders, Mayans and Maoris, three distinct clusters would form; all other populations would be composed of genetic admixtures of Maori, Icelandic and Mayan material. Kaplan therefore concludes that, while differences in particular allele frequencies can be used to identify populations that loosely correspond to the racial categories common in Western social discourse, the differences are of no more biological significance than the differences found between any human populations (e.g., the Spanish and Portuguese).

Self-identification
Jorde and Wooding (2004) wrote that clusters from genetic markers did not correspond to subjects' self-identified race or ethnic group. However, the studies cited were based on relatively few genetic markers and deemed insufficient. In contrast, studies based on a higher number of genetic markers have found more agreement.

A 2005 study by Tang and colleagues used 326 genetic markers to determine genetic clusters. The 3,636 subjects, from the United States and Taiwan, self-identified as belonging to white, African American, East Asian or Hispanic ethnic groups. The study found "nearly perfect correspondence between genetic cluster and SIRE for major ethnic groups living in the United States, with a discrepancy rate of only 0.14 percent".

Paschou et al. (2010) found "essentially perfect" agreement between 51 self-identified populations and the population's genetic structure, using 650,000 genetic markers. Selecting for informative genetic makers allowed a reduction to less than 650, while retaining near-total accuracy.

Correspondence between genetic clusters in a population (such as the current US population) and self-identified race or ethnic groups does not mean that such a cluster (or group) corresponds to only one ethnic group. African Americans have an estimated 10–20-percent European genetic admixture; Hispanics have European, Native American and African ancestry. In Brazil there has been extensive admixture between Europeans, Amerindians and Africans, resulting in no clear differences in skin color and relatively weak associations between self-reported race and African ancestry.

Genetic-distance increase
Genetic distances generally increase continually with geographic distance, which makes a dividing line arbitrary. Any two neighboring settlements will exhibit some genetic difference from each other, which could be defined as a race. Therefore, attempts to classify races impose an artificial discontinuity on a naturally occurring phenomenon. This explains why studies on population genetic structure yield varying results, depending on methodology.

Rosenberg and colleagues (2005) have argued, based on cluster analysis, that populations do not always vary continuously and a population's genetic structure is consistent if enough genetic markers (and subjects) are included. "Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions." They also wrote, regarding a model with five clusters corresponding to Africa, Eurasia (Europe, Middle East, and Central/South Asia), East Asia, Oceania, and the Americas: "For population pairs from the same cluster, as geographic distance increases, genetic distance increases in a linear manner, consistent with a clinal population structure. However, for pairs from different clusters, genetic distance is generally larger than that between intracluster pairs that have the same geographic distance. For example, genetic distances for population pairs with one population in Eurasia and the other in East Asia are greater than those for pairs at equivalent geographic distance within Eurasia or within East Asia. Loosely speaking, it is these small discontinuous jumps in genetic distance—across oceans, the Himalayas, and the Sahara—that provide the basis for the ability of STRUCTURE to identify clusters that correspond to geographic regions". This applies to populations in their ancestral homes when migrations and gene flow were slow; large, rapid migrations exhibit different characteristics. Tang and colleagues (2004) wrote, "we detected only modest genetic differentiation between different current geographic locales within each race/ethnicity group. Thus, ancient geographic ancestry, which is highly correlated with self-identified race/ethnicity—as opposed to current residence—is the major determinant of genetic structure in the U.S. population".

Number of clusters
Cluster analysis has been criticized because the number of clusters to search for is decided in advance, with different values possible (although with varying degrees of probability). Principal component analysis does not decide in advance how many components for which to search, and it has been used in an increasing number of studies.

Utility
It has been argued that knowledge of a person's race is limited in value, since people of the same race vary from one another. Witherspoon and colleagues (2007) have argued that when individuals are assigned to population groups, two randomly chosen individuals from different populations can resemble each other more than a randomly chosen member of their own group. They found that many thousands of genetic markers had to be used for the answer to "How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?" to be "never". This assumed three population groups, separated by large geographic distances (European, African and East Asian). The global human population is more complex, and studying a large number of groups would require an increased number of markers for the same answer. They conclude that "caution should be used when using geographic or genetic ancestry to make inferences about individual phenotypes", and "The fact that, given enough genetic data, individuals can be correctly assigned to their populations of origin is compatible with the observation that most human genetic variation is found within populations, not between them. It is also compatible with our finding that, even when the most distinct populations are considered and hundreds of loci are used, individuals are frequently more similar to members of other populations than to members of their own population".

This is similar to the conclusion reached by anthropologist Norman Sauer in a 1992 article on the ability of forensic anthropologists to assign "race" to a skeleton, based on craniofacial features and limb morphology. Sauer said, "the successful assignment of race to a skeletal specimen is not a vindication of the race concept, but rather a prediction that an individual, while alive was assigned to a particular socially constructed 'racial' category. A specimen may display features that point to African ancestry. In this country that person is likely to have been labeled Black regardless of whether or not such a race actually exists in nature".

Race and medicine
There are certain statistical differences between racial groups in susceptibility to certain diseases. Genes change in response to local diseases; for example, people who are Duffy-negative tend to have a higher resistance to malaria. The Duffy negative phenotype is highly frequent in central Africa and the frequency decreases with distance away from Central Africa, with higher frequencies in global populations with high degrees of recent African immigration. This suggests that the Duffy negative genotype evolved in Sub-Saharan Africa and was subsequently positively selected for in the Malaria endemic zone. A number of genetic conditions prevalent in malaria-endemic areas may provide genetic resistance to malaria, including sickle cell disease, thalassaemias and glucose-6-phosphate dehydrogenase. Cystic fibrosis is the most common life-limiting autosomal recessive disease among people of European ancestry; a hypothesized heterozygote advantage, providing resistance to diseases earlier common in Europe, has been challenged.

Information about a person's population of origin may aid in diagnosis, and adverse drug responses may vary by group. Because of the correlation between self-identified race and genetic clusters, medical treatments influenced by genetics have varying rates of success between self-defined racial groups. For this reason, some physicians consider a patient's race in choosing the most effective treatment, and some drugs are marketed with race-specific instructions. Jorde and Wooding (2004) have argued that because of genetic variation within racial groups, when "it finally becomes feasible and available, individual genetic assessment of relevant genes will probably prove more useful than race in medical decision making". However, race continues to be a factor when examining groups (such as epidemiologic research). Some doctors and scientists such as geneticist Neil Risch argue that using self-identified race as a proxy for ancestry is necessary to be able to get a sufficiently broad sample of different ancestral populations, and in turn to be able to provide health care that is tailored to the needs of minority groups.