User:Wobble/sandbox/2

Serious concerns
What is the validity of this article? For one thing ethnic groups are not biologically valid populations. Is the criterion an ethnic group or a nation? These are not necessarily the same thing. What about groups that have been sampled numerous times from numerous different populations, each of which has a different frequency of haplogroups/haplotypes? Is it OR to pool samples from different studies? Can we do this when different studies genotype for different SNP mutations (and when the mutations that define haplogroups change constantly)? I've had a look at the Welsh data and they are not accurate at all. Wilson et al. (2001) is cited as a source for 89% of Welsh samples (Anglesey) being R1b, but Wilson et al. (2001) didn't genotype for R1b (the marker M343, currently used to define R1b was probably not available at that time, ad even if it was, it didn't define R1b at that time, for a time P25 defined R1b), this group probably represents P(xR1a1a) (i.e. a genotyping for 92R7 (haplogroup P) and then a genotyping of this set for M17 (R1a1a), so this "R1b" is actually "samples +ve for 92R7" minus "samples +ve for M17"). For another thing it states that these samples are derived from Anglesey, but that only applies to the samples cited for Wilson et al. (2001), the haplogroup I samples cite Rootsi et al. (2004), but Rootsi cites Capelli et al. (2003) for these samples, and Capelli et al. (2003) collected samples from three Welsh populations Llangefni, Llanidloes and Haverfordwest, Rootsi et al. (2004) have pooled these data to produce a large meta-sample. The original three populations were not identical in the proportions of haplogroups. Furthermore there is an additional paper that sampled from Wales, Weal et al. 2002 sampled from Llangefni and Abergele. These papers have genotyped for different Y-SNPs because different SNPs are available at different timepoints. There are two ways we can show these data, we can either pool all "Welsh" data from the three different studies (Wilson et al. (2001), Weal et al. (2002) and Capelli et al. (2003)) and pretend that these have some biological meaning, but this produces the problem that in Weal et al. (2002) (and Wilson et al. (2001)) Hg 1 is actually P(xR1a1) (genotyped for 92R7 excluding subgroups of SRY1532.2/SRY10831.2 which defines group R1a1), which is not really R1b, though most of it's members probably belong to R1b. On the other hand Capelli et al. (2003) genotype for M173 (R1) and M17 (R1a1a), which produces haplogroup R1 (xR1a1a), again it's not really R1b, though most of it's members probably do belong to R1b. Personally I think this is very close to OR if we claim that these are R1b, who can defend this? If we do this are we then saying that a nation/ethnic group is a biologically valid population? Effectively we are then claiming that this population is homogeneous. But we already know that it is not homogeneous. On the other hand we can display the data separately, we can have a Welsh (Abergele) row, and a Welsh (Llangefni) row and a Welsh (Haverfordwest) row etc. That would be more valid, but then these populations are not ethnic groups, and it doesn't really solve the problem of the haplogroup designation, it would be better to have a list of SNP markers genotyped rather than a list of haplogroups. After all haplogroups change constantly, usually at least once a year at the present rate, but the SNP markers do not change, only that the haplogroups that the marker defines changes. So if M17 has been genotyped and is present, it is always there, irrespective if M17 defines haplogroup R1a, or R1a1 or R1a1a (and it has been used to define all of these groups at different periods of time). Actually this article rather scares me, it's trying to pretend that ethnic groups are biological units, something I think most anthropologists would be very sceptical about.

So I suggest that we have a rethink. I support having a list of populations (however they are defined, in the paper being cited) and the frequency of any defined mutation within that population. That would be less confusing, would be less open to OR and would be more transparent.