User talk:Graban

Human Genome Diversity Project
I have a mathematical problem reconciling the number of populations sampled by the Human Genome Diversity Project - Centre d'Étude du Polymorphisme Humain (HGDP-CEPH) cited in various locations, indicated below, with the 8 Human mitochondrial phylogeny haplogroups presented in pages such as: Mitochondrial Eve, and the 7 Y-chromosome haplogroups presented in pages such as: Y-chromosomal Adam.

8 × 7 = 56; however, when the number of populations sampled is explicitly stated in cited articles, there are only a maximum of 52 populations sampled in all of the HGDP-CEPH related pages I have come across to date (with one exception, described below). Multiple cited sources (Ref #'s 3, 24, 26, and 27) on the Human genetic clustering page provide an explicit number of sampled populations which varies widely between 15 and 52 populations; however, no cited source uses all 56 expected populations.

One source cited on the Human genetic clustering page lists an undisclosed number of populations, simply classified as: geographically distributed individuals, which does not help resolve this perceived problem:

The Foundation Jean-Dausset - CEPH (accessed 2 Jan 2020) comes no closer to answering the question, as it states that it possesses, "A resource of 1063 lymphoblastoid cell lines (LCLs) from 1050 individuals in 52 world populations ..." without any claim to having either a complete set of populations, or any explanation why only 52 of the mathematically indicated 56 populations have been sampled.

As far as I understand, an extinction of one haplogroup would result in either 49 populations (one mitochondrial haplogroup becomes extinct) or 48 populations (one Y-chromosome haplogroup becomes extinct). However, since Human genetic clustering lists at least five cited sources with a number of populations sampled > 49 (Ref #'s 2, 3, 24, 26, and 27), and the The Foundation Jean-Dausset - CEPH has samples from 52 populations, an extinction event cannot explain the difference between the maximum number of 52 sampled populations in any cited work or database, and the expected number: 56 populations. Thus, from my perspective, there are four missing populations that have not (yet?) been sampled.

If someone could point me to either a source that contains all 56 expected populations, or a source with an explicit reason why only 52 of the 56 expected populations exist to be sampled, I would appreciate such edification. —————————————————————

More troubling, one source cited on the Human genetic clustering page contains a superset of 185 populations, through the inclusion of "133 additional African populations and Indian individuals." This population set should only be possible if there are ~13 to 14 mitochondrial haplogroups, and ~13 to 14 Y-chromosome haplogroups (√185 claimed populations, assuming a near equal distribution of male/female haplogroups).

I am troubled, because I believe the following problems exist either within Tishkoff et. al., 2009 ; or with the way Tishkoff et. al. is currently grouped with dissimilar studies.


 * Either the maximum number of populations is defined by the existing number of haplogroups (8 and 7, see above), or the claim of additional populations (185 total) needs to be substantiated through the inclusion of new haplogroups, which is not the case.


 * The authors may have performed some double accounting when they state, "HGDP-CEPH plus 133 additional African populations and Indian individuals." Either there are ~20 African and Indian populations and individuals that have been counted twice (once in HGDP-CEPH and once in the 'additional' category), plus ~110 new populations, or there are ~130 new populations that have never before been accounted for in the HGDP-CEPH database.  Neither case seems plausible, given all other available information.


 * The authors have misused the term "populations", and perhaps other terminology, in such a manner that the lexicon or taxonomy of this study is incompatible with that of all other studies listed in the comparative table on the page Human genetic clustering

Someone who is much more intimately involved in this field, please either challenge the claim of "185 populations" made by Tishkoff et. al., 2009 ; or compare Tishkoff et. al. to other studies using identical terminology, lexicon or taxonomy; particularly involving the definition of "populations", to achieve clarity. Right now, Tishkoff et. al. is incorrectly compared to the other studies in the table on the page Human genetic clustering due to the problematic or inconsistent definition of "populations"; and as such, it makes an insupportable claim about the number of populations according to the taxonomy or lexicon of all other studies in the table, and the HGDP-CEPH database.

Thanks,

Graban (talk) 09:21, 2 January 2020 (UTC)