Wikipedia:Articles for deletion/Y-DNA haplogroups by ethnic groups


 * The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review).  No further edits should be made to this page.

The result was   keep. Nobody but the nomination is convinced that this violates WP:SYNTH. Fences &amp;  Windows  16:58, 26 November 2009 (UTC)

Y-DNA haplogroups by ethnic groups

 * – (View AfD) (View log)

This is a somewhat unusual request for deletion for a number of reasons. The article is heavily referenced and there is a plenty of detail. In fact a lot of work has gone into this article. However there are numerous issues that make the existence of this article problematic. I think it is admirable that editors have tried to create a list of haplogroups by ethnic group, but unfortunately the effort has resulted in issues that could violate WP:NOR and in particular WP:SYNTH. It is practically impossible to have a single article that describes the haplogroup profile of every ethnic group in the world.

For those unfamiliar with haplogroups, this article List of R1a frequency by population lists the same information but for just one haplogroup, that is Haplogroup R1a (Y-DNA). The list is quite long, and the article is about 66kb which is relatively large. However this is just one haplogroup, there are in fact hundreds more just like haplogroup r1a. Therefore it is practically impossible to cram all such information into one article. Despite the fact that Y-DNA haplogroups by ethnic groups is nowhere near being comprehensive, it is already illegible with one needing to scroll up and down to match a percentage with an ethnic group and a haplogroup. Not to mention that hundreds of haplogroups are missing and several ethnic groups are missing. In short this article is mission impossible. It was feasible in the early days of y-chromosome genotyping when there was very little information. But now there is wealth of data available. I suggest that this project be abandoned, and editors instead focus on particular haplogroups, in a manner similar to List of R1a frequency by population. Wapondaponda (talk) 00:45, 19 November 2009 (UTC)
 * Can you explain you remarks about WP:SYNTH, which you made for example on the HGH page? Hopefully this is not referring to things like 10+2 is 12 or 10 out of 100 is 10%. Are there really cases which go beyond this? If not then I do not see the point of the deletion proposal really.--Andrew Lancaster (talk) 12:08, 19 November 2009 (UTC)


 * keep- many editors have been adding to this, some may be using this as a resource for page improvements. I don't want to trip up other editors. It may not be possible to cram all into one article, but not all Y-DNA have their own list or are abundant enough to deserve their own separate list. Muntawanda, don't you have something more important to do, other than stirring sleeping dogs?PB666 yap 06:22, 19 November 2009 (UTC)
 * Actually I think this is something important. The article lists just 10 haplogroups in no particular order. Typically most studies list them in phylogenetic order from haplogroup A, B, DE, CF, D, E etc. However, some of the basal haplogroups such as A, B, DE, D and  E which are some of the most important haplogroups are not included. Furthermore, there are no East Asian, Southeast Asian, Australasian and Native American ethnic groups included. How exactly is an ethnic group defined for this article.  There are something like 800 languages spoken in New Guinea. Furthermore, Pdeitiker, I think you are aware, since you have been creating maps, that some ethnic groups are sampled multiple times with resulting in different haplogroup frequencies. How would this be incorporated into an already illegible article. In short, in its current form the article isn't going anywhere. Better delete it, if any information is to be salvaged, editors can try to create more manageable articles, say at the haplogroup level rather than at the global population level. Wapondaponda (talk) 07:13, 19 November 2009 (UTC)
 * I don't want to make that decision. And I don't understand why you are trying to force that decision. There are things that are clearly less encyclopedic on Wikipedia, lists of smoking jackets worn by ex-famous singers, list of all kinds of what-nots. What you need to do is inform the people who have added to that list whether they think they might use that list at sometime, before questioning my decision, see how many wiki-folken who would be upset if it where trashed?PB666 yap 08:48, 19 November 2009 (UTC)


 * Keep. Very useful, encyclopedic and informative -- particularly the ability to click on each haplogroup, and see a ranked list of which population it is most strongly found in, with link-through citing to the original sources.  This is a very valuable data appendix to our main article Human Y-chromosome DNA haplogroup and our increasingly detailed articles on individual haplogroups.  Having a central data page is much better maintainable, much better verifiable; and much easier to standardise, police, and quality-control than scattering the information across the whole of Wikipedia; and allows us a comprehensiveness which is simply not appropriate for pages on individual haplogroups.  It's also an enormous help for reader verifiability of what we're writing: when in general terms on a page we discuss where a haplogroup has been found, they can turn to this page and get the full detail of precisely where studies have been done, and what they have reported.
 * Turning to Wapondaponda's specific issues, as for WP:NOR and WP:SYNTH I just don't see it. This article is straightforwardly relaying the data that has been reported; it is not adding any layer of original interpretation, nor synthesising any inferences or conclusions.  The other charge has more merit - that this is a Europe-centric listing, which focusses on the haplogroups most common in Europe (indeed the horizontal listing is broadly ordered by their prevalence in Europe), and ignores the most important haplogroups in Africa and Asia.  This is a fair comment, and has been brought up on the talk page.  The answer, I think, is to create further pages: Worldwide prevalence of the haplogroups most commonly found in Africa, and Worldwide prevalence of the haplogroups most commonly found in Asia; and to rename this one something like Worldwide prevalence of the haplogroups most commonly found in Europe.  IMO I think that would be more reader-friendly than widening this page beyond what can reasonably fit in the width of one screen.  But it doesn't affect that the outcome of this discussion should be keep.  AfD is for articles that are irredeemably flawed, where there is no prospect of making them into anything that has a place on Wikipedia.  This well-sourced presentation of data from the original sources does not fit into that class.  Jheald (talk) 09:14, 19 November 2009 (UTC)


 * Comment. I don't think we should delete it until individual haplogroup lists, like the R1a one, are created. So, since the R1a stuff is safe and sound in its own article, i think it'd be ok to remove the duplicate info from this article. I don't think there is really any rush to delete this. I don't think 'incomplete' or 'disorganised' are reasons to delete outright - they are just reasons why we should try to improve it. So i guess, something like this article could be a hub for other lists like the R1a list we've already got and intend to create. I wonder how we go about updating lists like these, though? For example, the R1a list, has two "Caucasus - Armenia - Armenians" listed. Why are there two? What happens when more reliable data for "Caucasus - Armenia - Armenians" is found? Do we add it too? Shouldn't there be some kind of limit (would we really need to have a dozen entries for "Caucasus - Armenia - Armenians")? What kind of criteria do we use to limit what we add to such lists? Publication dates? The size of the samples taken? So i wonder what kind of rules we should use for inclusion of data.--Brianann MacAmhlaidh (talk) 09:40, 19 November 2009 (UTC)
 * It's much more maintainable to keep the data in one place, not put it into the individual haplogroup articles. It also shows what other markers were tested, and what the results were.  If a dozen studies have been done in the Caucasus, with different sampling locations and different results, it is useful for all of them to be accessible.  This is easier when there is a single separate data page for the full detail (cf WP:SUMMARY), and it makes it easier to systematically add new studies.  Jheald (talk) 10:03, 19 November 2009 (UTC)
 * For the record, I support the concept of organizing haplogroup frequencies in a user-friendly manner, that can enable any user to quickly determine the haplogroup profile of an ethnic group, or ethnic groups that possess a haplogroup. However, this article, in its current form cannot achieve this. I think we can all agree that the amount of haplogroup data that is available cannot fit into a single article and be meaningful. I have recently been looking into just E1b1b lineages, the amount of data from this haplogroup is enough for one article alone. I would like to hear from those who support keeping this article, how best can this large amount of information be handled. Wapondaponda (talk) 10:30, 19 November 2009 (UTC)

I also agree with Brianann's points that there are a lot of issues to consider. For example sample size varies significantly, from as few as 28 to as many as 2000. Furthermore many ethnic groups are listed multiple times, eg northern egyptians vs southern egyptians, Albanians are listed six times. The result is an apples and oranges situation which risks becoming a WP:SYNTH. The R1a table does list sample sizes, which I believe is more accurate. Wapondaponda (talk) 10:47, 19 November 2009 (UTC)
 * Please explain how multiple data sets for one population would be WP:SYNTH. I should mention that I have seen difficulties in some cases with overlapping data sets being presented as multiple data sets. (When a new article uses old data, but adds to it for example. In the past this was often not clearly mentioned by authors.) When detected I have always just deleted the older version. That seems good enough to me?--Andrew Lancaster (talk) 12:11, 19 November 2009 (UTC)


 * Strong Keep. DNA studies (as far as I have come across them in linguistics) rarely cover the whole planet. I would say that proportionally there are probably dozens of studies into marginal groups like the Aynu, Burushaski and Khoisan in spite of their low population numbers because they're "interesting" groups on various levels. I doubt there is data for all. If we feel a particular group is under-represented, then we could add a request for more data or explain, if that's the case, that it doesn't exist. The thing that might want changing is a) alternating background colour for rows and b) listing the same ethnicity in the same cells but with a line break between (e.g. Greeks 1-4). Akerbeltz (talk) 12:09, 19 November 2009 (UTC)
 * I would have to disagree about the whole planet as all major population groups have been sampled. Part of formulating the Recent African origin of modern humans model involved sampling as many distinct populations as possible. If a single population didn't fit the RAO, then the theory would be disproved. So there is haplogroup information from South America, Polynesia, Australasia and of course China which is not currently in the article. I agree that having complete information is not a prerequisite for an article. However, it is also clear that the current article cannot accomodate more data if indeed it is to be representative of global haplogroup frequencies. The current article only deals with haplogroups that are found in Europe, even though there is a mention of some African and Middle Eastern populations. Currently several sub-haplogroups consititute E1b1b, but they are not listed in the article. Apart from E1b1a and E1b1b, all other subclades of haplogroup E are not mentioned. The issue of subclades applies to all the haplogroups mentioned. Also missing are macrohaplogroups A, B ,C D, O, H, F, and K each having several different subclades. Each haplogroup requires two columns, one for the frequency and the other for sample size. So while the current content is only barely manageable, any additional information is likely to cause significant usability issues.


 * In light of the missing information, what therefore should be the future of this article?. I agree in principle with Jheald's suggestion, that this articles should be broken down into smaller articles. The smaller articles can be linked to a main list page, or by a category. My preference would be a breakdown by haplogroup first because only then would it be viable to create a haplogroup profile by ethnic group. However, it might be easier to cut and paste information Africa and the middle east into new articles.Wapondaponda (talk) 14:08, 19 November 2009 (UTC)


 * Keep We keep general articles as well as specific ones. It's good to have this material together. But it's assumed that those who want more details will go to the detailed articles. There should thus be links to Wikipedia articles on the haplotypes for each group of people, as well as to the haplotypes themselves. For the most part, such articles--or even  sections of articles)  do not now exist.   DGG ( talk ) 18:53, 19 November 2009 (UTC)

Refocusing the discussion
I think there is a general consensus that the information contained in this article is useful and should be included in this encyclopedia. The debate is whether the information should be as it is currently or in some other forms. The issues that need to be addressed include: Of course incomplete information is better than no information. But complete information is better than both incomplete information and no information. Currently this article is woefully incomplete. Wapondaponda (talk) 21:28, 19 November 2009 (UTC) I have tried to add some information so as to get a feel of how including more of the global haplogroups would affect the article. this version, has many of the haplogroups found around the world. The cells aren't properly aligned, but that's because I tried to automate the process, and didn't work out that well. Trying to manually tinker with a wikitable that size involves a lot of repetitive edits and the opportunity for mistakes is high. With most of the global haplogroups, the article is about 200kb but I haven't added any data or ethnic groups, just haplogroups. Unfortunately, I only have access to a low bandwidth connection, so editing an article that size takes forever. Wapondaponda (talk) 15:45, 20 November 2009 (UTC)
 * Usability-Is it possible to have an article that includes all the major haplogroups and their frequencies in the various ethnic groups around the world, all tabulated in one article.
 * Scope-The current article is actually Y-DNA haplogroups by "European" ethnic groups. The haplogroups of much of India, East Asia, Southeast Asia, Australasia, the Americas and much of Africa are not included in this article. Should we therefore rename this article to reflect its content.
 * At what level in the phylogeny should haplogroups be tabulated. That is should macrohaplogroups or subclades be listed.


 * I would think the answers are pretty self-evident: the usability and level question can be answered together by saying you divide into as many haplogroups as editors can agree works. The scope question is more one of "can this article be improved?" Or not?--Andrew Lancaster (talk) 20:52, 20 November 2009 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.