Talk:Haplogroup E-M215/Archive 11

SNPs Tested
Andrew Lancaster taking out notes on SNPs tested. Please discuss specifically why these notes are being taken out for each publication.Egenetics (talk) 15:18, 30 January 2013 (UTC)

With respect to downstream lineages the other M215/M35 section in the table implies: Egenetics (talk) 15:34, 30 January 2013 (UTC)
 * 1) Trombetta (2011), Cruciani (2004) anything that is E-M35 but NOT V68,V257,M123,V6,M293,V42,V92
 * 2) Semino (2004) anything that is E-M35 but NOT M123,M78,M81
 * 3) Hassan (2008), anything that IS E-M215 but NOT E-M78
 * 4) Wood (2005), anything that is E-M35 but NOT M78,M81


 * Either I am missing something basic here or you are. But at least please stop just reverting and try to understand first! Your comment above shows that you think this is only about your complex footnotes but then you must not be looking at all of the many things you reverted. I see several things just basically wrong in your reverts:-


 * The table I originally proposed only used the Cruciani data set because it is the only data set with a full and consistent testing of those markers. You changed the table by adding a lot of studies that tested very different sets of mutations, but you included a note to say that "?" would be the mark showing readers when an SNP was not tested and so I can accept that. BUT, you have placed ? in very few cells and placed "0.00%" in most of them which seems just wrong to me. 0.00% means there was testing, but there was not such testing in any of the cells where I placed "?" instead of "0.00%". I took my time and checked this in each of those publications you have added.
 * You are showing results for V257 testing which never happened in ANY of the new data you inserted. Again, this is just wrong. M81 testing does not give accurate results for V257 testing, even if it is approximately similar most of the time.
 * Coming to the footnotes, this is simply a question of how to present information. You are moving it all to footnotes, and that is misleading. Information which is one single column should all be defined one single way. People should never need to look at a footnote to understand the BASICS of what they are looking at and please consider that this is also simply not necessary. Any interested reader can see how much M35 is left over from the total M35 which is mentioned in every case, and whichever SNPS are shown as having been tested. The "?" signs can guide them about which SNPs were tested (if you will allow them to be accurate).--Andrew Lancaster (talk) 16:07, 30 January 2013 (UTC)
 * My understanding of the table is to show frequencies of E-M215 >50% in as many populations tested so far as possible, not just cruciani's dataset. The markers "?" are only for cells where a value for the "EM215/M35+ other" field is reported. In other words, if you have such a value reported then we don't know from the other fields in the table which SNP would be positive or negative, however by cross referring with the footnote, you can check what specific SNPs were tested in the publication. If the "EM215/M35+ other" field shows a 0.00% then that means that everything is accounted for in the fields reported in the table, therefore no need to show "?"
 * Yes, that is correct, but we also know that all E-M81 is E-V257, but not all E-V257 is E-M81, the issue offcourse is the Borana, so either a note can be attached to it or some other similar solution can be sought.
 * I don't understand what you are saying about the footnotes, I just want the reader to know which SNPs were tested when he comes across the "EM215/M35+ other" field with a value in it. If you have another simpler and cleaner way to display this information share it but do not delete it.Egenetics (talk) 16:27, 30 January 2013 (UTC)
 * No we can not place 0.00% into any cell where there was no test. 0.00% means there was a test. People will not look up the paper to check, because we are saying these markers were tested if we write this. These must be removed, or else we remove all these new sources.
 * Similarly M35+ other should only be given a value when it has really been measured, and that means all the other ones named have been tested. Putting any number in will mean these tests have been done and no one will go looking at footnotes or original papers to check this.
 * Concerning V257 why did you revert the small fix I made???? You have deliberately made the table wrong.
 * If you allow the above fixes, the table already gives all the information needed, and your footnotes would not longer be needed. Your footnotes are not good again because normal readers are being given wrong information which they will supposedly then have to double check to realize that it is wrong. That is not the right way to make a data table.--Andrew Lancaster (talk) 16:57, 30 January 2013 (UTC)
 * What are you talking about? please be precise. there are a given number of sublclades in the table, these are not ALL the subclades of E-M35, if a given dataset only tests for a subset of these SNPs and all the data fits into a subset of these SNPs then naturally those subclades that fall outside of the tested subset will have a null value. Example Periera (2010) tested for M78,M81,M123. It also reported for E-M35* which would mean anything that is E-M35 but not M78,M81 or M123. So for the case of the Tuareg from Mali EVERYTHING was either M78 or M81, and NOTHING was reported for E-M35*, which would mean that all the subclades of E-M35 that we have showing on the table that are NOT M78 or M81 will and should have NULL values.
 * If any subclades under M35 have been tested then you can sum the frequencies of those subclades and come up with a MINIMUM value for E-M35. For the special case of E-M215, if M281 has not been tested then the summation would also be a MINIMUM value for E-M215. I think this is straight forward.
 * Concerning V257, I reverted it because we need a solution on what to specify for the Borana first before you do the fix.
 * What wrong information am I EXACTLY giving to normal readers? Again be please be precise, if it is wrong then fix it, don't take out the whole thing.Egenetics (talk) 17:26, 30 January 2013 (UTC)
 * No, a 0.00% value means a test was done on the whole dataset and not one had a positive result. If there was no test done there should be a "-" or an "NA" or nothing or a ?, but never 0.00%. I do not know how to be more precise than saying that every case which you reverted from ? to 0.00% is a case where the authors simply did not report any such testing. You are marking "no test" and "all tested as negative" in exactly the same way! Of course that is misleading.
 * No, it is not straightforward because the definition of "other" is not clear or simple in your format. You are mixing apples and pears. The basic idea of any data table is to avoid things like that. Just put one type of number in each column.
 * Apparently concerning V257 you did not read my edit or see what I did even though you reverted it! I adjusted it to only show M-81. I did the same for the Marrakesh Berbers. It makes the table correct, but you made it wrong.
 * I did fix it. You reverted without even understanding what you were reverting. I did not just delete things! I checked every source and I converted everything so it would be correct, and so that apples are in the apples columns.--Andrew Lancaster (talk) 17:49, 30 January 2013 (UTC)
 * If you can not give concrete and precise examples of what you are talking about, there is nothing more that I can add to what I wrote above. So I will ask you once again to give concrete and precise examples with the populations shown in the tableEgenetics (talk) 17:58, 30 January 2013 (UTC)
 * How precise can I be? You know precisely what you reverted or not? Here is a link: . Each and every precise exact specific case where you changed a "?" to a 0.00% is just plain simply wrong and misleading by a factor of 100.0000%. The change from M81 to V257 is also plain wrong, just wrong, with no possible other way of describing it, and you know it. And so on. It is all clear and precise. So please precisely self revert.--Andrew Lancaster (talk) 18:08, 30 January 2013 (UTC)
 * You can be a lot more precise, here was my specific example:
 * Population : Touareg from Mali
 * Source : Periera et. al (2010)
 * SNPs Tested: M78,M81,M123
 * Reported results :
 * M78 : 9.09%
 * M81 : 81.82%
 * E-M35 (x M78,M81,M123) : 0.00%


 * So from the above information and the remaining fields that we show in the table that are downstream from E-M35, what POSSIBLE values can the following have: M293, V6 and V92.


 * Can they possibly have a value other than ZERO?? If so, please explain the other possibilities that the Tuareg from Mali can have for M293,V6 and V92 for the given dataset-Egenetics (talk) 18:52, 30 January 2013 (UTC)
 * The M81 is an issue that I have already acknowledged, please address my above question separately which is a different issueEgenetics (talk) 18:56, 30 January 2013 (UTC)
 * You are very demanding. Instead of making me type an inventory of mistakes 3 or 4 times over, if you do not know what you are reverting please just do not revert, and especially do not revert a second time? And before you demand answers to questions, why not make a half effort to show that you read the answers you already got?
 * Let's start with the "separate issue". If you admit you are wrong about V257 versus M81, revert yourself concerning that double revert of yours. If there is an open question, what on earth is it?
 * You have also twice reverted the formatting work on did on the source you added. But I am sure you did not even know you did that right? But now that I point it out to you, will you please show some decency and revert yourself on that too?
 * Thirdly I will just take your Pereira example, and I will not go through every single cell: In the current table there is no column which is called "E-M35 (x M78,M81,M123)". There is a column for other types of E-M215, and by implication of the way the table is structured this column is for any E-M215 results which are not in any of the other named columns. Readers coming to this article will read it that way. They will not read your mind. So from the information in Pereira we can not fill in this column, and nor can we for example say how much M281 was in population. So we can not fill in the M281 column. Wherever I placed a "?" in a cell, it was for such a case where we have no clear and obvious way of measuring it as zero or any other value, at least based on the published data.
 * If you do not understand then you should never have made reverts.--Andrew Lancaster (talk) 20:00, 30 January 2013 (UTC)
 * The open question is that the one case of the Borana dataset was E-V257+ M81-, therefore having him sit in a column labeled M81+ would not be correct, unless there is a qualifier/note either for the cell or the column.
 * Yes that was my mistake, see below in my response to Dougweller.
 * there is no column which is called "E-M35 (x M78,M81,M123)" precisely why I had my labeles/notes per each reported unclassified value, the same labels you got rid of without making an explanation of why they were 'wrong', like you claimed.Egenetics (talk) 23:18, 30 January 2013 (UTC)

Egenetics, I'm struggling to figure out why you don't seem to understand what Andrew means, for instance "No we can not place 0.00% into any cell where there was no test. 0.00% means there was a test." and "Similarly M35+ other should only be given a value when it has really been measured," seem clear to me and are concrete examples. If I saw 0.00% in a table I would assume there was a test. I've reverted you.

Um - neither of you can remove any information from the article or replace any reverted information right now, as you are both at 3RR or over. Egenetics, I can see you are working hard, but you really need to listen to Andrew (and now to me). I appreciate the fact that you are open about previous IP addresses and would not want you to get blocked (or you either of course, Andrew, but I wouldn't expect you to go over 3RR as Egentics has done). Dougweller (talk) 20:20, 30 January 2013 (UTC)
 * FYI the article has been on the edge of 3RR for almost a month now.--Andrew Lancaster (talk) 20:39, 30 January 2013 (UTC)


 * Dougweller, I was not aware that I can only make three reverts/day/article, I am now however, it was frustrating to see Andrew Lancaster just throw away the work I input into the table last night, I had put in the "?" only for those populations that had results reported in the unclassified or "other" field of the table, so that means that having a "?" for populations that have non-zero values in the unclassified field is not the same as having a "?" for those that have zero values in the unclassified field, because for the latter the remaining fields that we show in the table are most certainly 0.00%, wether tested or not, but not for the former. Thus, I had put in qualifying notes (SNPs tested) for each case of cells that had non-zero values in the 'other' field, and it was frustrating to see these notes being deleted as well, hence why I opened this section in the talk page.
 * With respect to the other reversions with respect to E-M81/V257 although I agreed with in prinicipal, I was waiting for some type of a solution we should specify for the Borana (from Kenya) case, as that would make their results wrong, we need some tyoe of qualfying note or another. And with respect to the formatted reference, that was entirely my fault as I did not even see it when I did the full revert, I can blame that on my relative in-experience in WP editing.Egenetics (talk) 20:59, 30 January 2013 (UTC)


 * 3 reverts a day isn't an entitlement, read WP:3RR carefully - a pattern of 3 reverts a day is usually edit warring and gets an editor blocked. We can use NA or a dash where there was no test. Dougweller (talk) 22:01, 30 January 2013 (UTC)
 * How does using NA or a dash solve the problem I described above? Are you talking about using a NA or a dash for the datasets where no tests AND with zero values reported in the "other" field, while using "?" for those with no tests AND non-zero values reported? OR are you talking about simply using a NA or a dash in place of ALL the "?" that are out there now after your revert? In the case of the latter, I don't understand what it is supposed to solve, In the case of the former it would solve part of the problem, but it would not solve the fact of the "other" column showing values of "?" which would be untrue, the way to solve that would be to enter the reported values along with my labels.Egenetics (talk) 23:33, 30 January 2013 (UTC)
 * I obviously did not throw away any work. I adjusted the presentation of material in an orthodox way so it would not be misleading. It is obvious, for example, that we can not report testing that never happened. Now that I look at this I think I can see what you are trying to do. By adding the zeros for old articles published before the discovery of E-M293 in Henn et al, you then create your mixed bag column at the end, which is apparently trying to carry on this theme of "East Africa". I note in particular how this column relates to what you say about usage number (3) of "Horn of Africa" in discussion above. --Andrew Lancaster (talk) 05:37, 31 January 2013 (UTC)
 * You would think that exercising a rudimentary knowledge on phylogenetics and basic mathematics would have taken care of your repeated paranoia against some 'sinister hidden motive' of mine, first, I was accused of being  a user by the name of Causteau then I was accused of one called Munwandi, now I am purposefully putting zeros into cells to make it 'carry an East African theme', that absolutely doesn't even make sense, it just shows that with all your experience at Wikipedia comes also a tremendous amount of baggage.
 * How many populations from those listed in the table would likely have some significant levels of E-M293 from what we know? The answer is 2, the Maasai and the Iraqw from Tanzania, again of these 2 populations how many did I have zeros entered in the untested variants of E-M35, the answer is 1 and that is for the Iraqw, why? Again, Because ALL their E-M35 variants belonged to E-M78, so we know that none of the other variants listed in the table can have values other than zero. But for the Maasai, since they possibly had other variants, which I had labeled as E-M35(x M78,M81), I left the cells for the other variants with a “?”. Which makes your silly accusation fall flat on its face, my arguments for using East Africa instead of the Horn of Africa throughout the entire article stands absolutely on its own merits, which is stick to the sources and use consistent regional terminology throughout the article that comes from these sources and not using terminology that you wish to use. Egenetics (talk) 14:38, 31 January 2013 (UTC)
 * The above argument is WP:SYNTH based on primary sources or at least borderline. The onus is on you concerning those points, but it never justified knee jerk reverts which removed several different things that you do not even apparently know you removed. Concerning the merits of your arguments aimed at removing all mention of the region formerly known as the Horn of Africa, I sure got the main one: in your own words you are "not interested" in arguing whether your words mean the same as the sources, you "just want to show that your terminology usage is DIFFERENT from what is cited in the primary sources". This was discussed at WP:RSN and the merits of your arguments did not stand. I think I have rarely seen such unanimity on WP. But please respond above, in the section about that issue. Concerning the table, please see below.--Andrew Lancaster (talk) 15:24, 31 January 2013 (UTC)

Some suggestions for Egenetics to respond to:
 * Concerning the "?" symbol I can see how it might also mislead, because at least in a handfull of cells we can make fairly obvious extrapolations of what the number would be.
 * One obvious solution is simply to use another symbol such as "-" or an abbreviation such as "NA" (meaning of which, "not tested", to be noted above the table).
 * I personally think there is no need to insert extrapolations in this rather simple table. Remember this article has daughter articles with more data, and indeed nothing is stopping anyone adding more tables to show other things.
 * However we could discuss particular cells where you think extrapolations might be reasonable. But the onus is on you as the proposer of this that you should propose these cases on the talk page. You need to make sure that you extrapolations are considered obvious to other editors, because otherwise we conflict with WP:RS.
 * If we do agree to put in any extrapolations in any cells, I believe they should be formatted and annoated in a distinct and clear way to indicate that they are not test results.
 * There might also be some cases where merging some cells from one row can help present uncertainty and lack of testing in a particular data set. But again I would ask to see it discussed and justified in each case. This could otherwise certainly go wrong. For example consider that not all the papers even test M35 (Hassan).
 * As mentioned before, I do not believe the footnotes you originally proposed in a mixed bag column at the end solve any of these problems for a normal reader. Even a reader who is used to this subject would still have to make guesses about what is intended in each column.--Andrew Lancaster (talk) 09:58, 31 January 2013 (UTC)

PS added after reading the rant above: the idea that extrapolated results should not misleading look like real test results is a standard one, and not just some crazy demand I made up. And as far as I am aware avoiding extrapolations altogether is the more approved method. Also, even where extrpolations are obvious: lets say E-M35=E-M78, then the meaning of this will be obvious to readers interested in that subject WITHOUT adding lots of fake results and complex footnotes. But anyway there are some ideas above for consideration.--Andrew Lancaster (talk) 15:29, 31 January 2013 (UTC)

Like I explained to Dougweller above, what cells are you putting the NA or dash symbols in ?

I will outline the problem and my proposal to the solution again, there are two questions that need to be asked on how to represent the untested populations for the various E-M35 variants.
 * 1) Do these populations have a zero value reported for their respective E-M35* results?
 * 2) Do these populations have a non-zero value reported for their respective E-M35* results?

Depending on the answer to the above questions the non tested E-M35 variant cells need to be addressed with different symbols, for example if the populations fall in the category of (1) above they could have an NA, dash or 0% and if they fall in the category of (2) above they could have a “?”, whatever the case they need to be distinguished from each other depending on the anwer to the above question.

Here below is a concrete example using 2 of the populations that are to be found in the Wood (2005) publication.

Looking at the above, obviously the Iraqw fall into the 1st category above and the Maasai fall into the 2nd, so for anything that is downstream of E-M35 and as is shown in the current table of the article, this is roughly how they could be reported,

Notice I used a NA for case 1 (Iraqw), but anything can be used really, including a 0.00%, which would still be true, the point is to differentiate it with the symbols entered for case # 2 (Maasai).Notice also the * I have by the value of 'other M215+' column of the Maasai, this is where my deleted footnotes belong, because it is absolutely essential for the reader to know what SNPs were tested and what makes them fall in the 'other M215+' group.

The next question would be what values to enter for the E-M215 field for these populations, well it is true the E-M215 SNP was not tested for them, but it is also true that all E-M35 belongs to E-M215, so it would be incorrect to have a “?” in that field too because of the fact that we KNOW that these populations have a MINIMUM of E-M215 that is equivalent to the E-M35 amount reported. This is all I have for right now, I will add some more issues to it later Egenetics (talk) 17:28, 31 January 2013 (UTC)

In answer to your 2 questions all cells I have marked as "?" are cells where no testing was done. Concerning your table, I do not think that your distinction between NA and ? has any obvious definition that a reader can make sense of. But anyway, how about this way of presenting it, using a format to show extrapolated numbers, and simply uniting cells where the divisions are not known: Or But the problem will be that this type of extrapolation is only possible in a few of the cases where you want to fill in the cells with something. Some of the extrapolations you have wanted to insert are based on "reasonable guesses", not extrapolations, and I think these can NOT be used at all. And as mentioned, I think footnotes are not a way to make the last column work. A table should give its basic information without looking at footnotes. Footnotes are for additional information. I think you have to ask yourself why you want that last column so much at all, and why you want to fill in every cell. None of the table adjustments I have proposed mean any true loss of information except for those two V257* individuals, (it is easy to see that M35=M78 there is nothing left) and I do not think they were the subject here in this part of the article. This part of the article is about total M215 and only where it is most frequent. Why overload it? We can put other information into other tables in other sections. I think you have to remember the context of the table.--Andrew Lancaster (talk) 18:42, 31 January 2013 (UTC)

I think your first option is workable, I like it better than the second one because (a) the label “other M215+” would include the Hassan data (b) it retains the figures from the E-M35*(x...) data, while the other option does neither of these. If we use the 1st option then, the united cells per row would have differences per publication, for instance, the above example works for the Wood (2005) publication, for the Semino (2004) and pereira(2010) only the “M35+ M293+, M35+ V92+ , M35+ V6+ ,Other M215+” fields need united, since they tested for M123, similarly for Hassan (2008) everything except M78 needs to be united. This would roughly explain what I was trying to relay, but would ultimately loose some precision in explanation due to the footnotes not being present.

The Next issue is how to handle the entries in the E-M215 field and off-course the few odd ball E-V257+ M81- cases. Egenetics (talk) 19:22, 31 January 2013 (UTC)


 * I do not follow your explanation very well because won't any option have some problems with some of these sources in reality? For example concerning the the two options above, one will work for better for some sources, and the other for others. Surely that is inevitable? Personally I propose deleting the last column. Perhaps the information in it is ironically going to be easier for most readers to understand if the column is not even there? (Crazy I know.) But in the meantime I will try to make the table here for the talkpage.
 * On to the second point I do not get why you keep putting off discussion of V257 and M81 as it seems simple to count the V257 men involved and then we just have to decide are we showing M81 or V257. If V257 then we pretty much have to delete all sources except Trombetta et al unless we are going to make this table ridiculous and forget the aim of informing. Perhaps you did not notice that the exact difference is explained in the supplementary table of Trombetta et al? So getting the numbers involves no guessing. This is just a presentation judgement.--Andrew Lancaster (talk) 20:28, 31 January 2013 (UTC)


 * The First option retains the most amount of information is what I am saying in a nutshell.


 * I am not putting off the the E-V257+ M81- issue, this is not the only thing I'm doing.I don't think getting rid of all those populations just for 2 odd ball ones is reasonable. Here is a proposal: Leave the column as M81+ and put in 0.00% and 65.52% for the Borana and Marakesh berbers respectively, then in the "Other M215+" for those populations put in 14.28% and 3.45% respectively for the Borana and Marakesh berbers.Egenetics (talk) 22:00, 31 January 2013 (UTC)

Other M215 version
Just for now I am going to use "NT" for not tested. Just an idea, but maybe using NA or any common symbol also keeps people from thinking about what the cells mean. Italics, for now, means extrapolated.

Other M35 version
To me these both look over-loaded, but if my aim was to avoid empty cells the second option wins. The big gaps are only now for Hassan in the second version, and in reality that is also the truth: that data holds big mysteries because of those missing markers.--Andrew Lancaster (talk) 21:31, 31 January 2013 (UTC)
 * By the way, it would be good if you would check for errors. This type of work is awkward. At least the way I just did it online.--Andrew Lancaster (talk) 21:37, 31 January 2013 (UTC)


 * Yes the second table (which was the first option in your proposal), is the better option. I will check the numbers for errors Egenetics (talk) 22:05, 31 January 2013 (UTC)


 * The numbers look fine, as I had done the calculations before. There is just a few minor tweaks (format:decimal point, % etc...) and my proposal on how the M81 should be reflected in the Borana and the berbers that I talked about above and have added below. One other thing, the referencing that you did earlier today by adding Trombetta (2011) source to all the stand alone Cruciani (2004) refs for 'consistency', was not really correct, Trombetta (2011) only tested the individuals that had an M35* or M215*, so datasets that did not have these individuals were not retested. As an example, Mozabite berbers do not need to show Trombetta (2011) as a source, Cruciani (2004) is enough Egenetics (talk) 22:29, 31 January 2013 (UTC)
 * Borana M78 was wrong adjust from 70%-->71.43% Egenetics (talk) 23:35, 31 January 2013 (UTC)
 * There is one more population that needs to be added from another source, the Nilo-Saharan speaking Datoga from Tanzania, only E-M35 was tested and they had 19/35 --> 54.28%, the source is from Tiskoff (2007),"History of Click-Speaking Populations of Africa Inferred from mtDNA and Y Chromosome Genetic Variation" we don't have this source in the master list so it needs to be added, I am not sure how to add it to the master list myself. I have added the population in the appropriate row in the table below however.Egenetics (talk) 00:30, 1 February 2013 (UTC)

I need to look again, but I noticed one thing of interest in that Tishkoff paper: None of the E3b mutations (M78, M81, M123, V6) or B2b mutations (P6 or P7) were observed among the Tanzanian samples, with the following exceptions—Datog: one M78+ individual, one V6+ individual; Mbugwe: one M78+ individual. So we can add some columns on that? (Also relevant to discussions of V6 distribution.)--Andrew Lancaster (talk) 03:59, 1 February 2013 (UTC)
 * Another new article! (Our timing to start working on these articles is good?) . By my reading, for THIS table, the only relevant new(?) data which is SNP tested and over 50% is Libya and Egypt and maybe we could add yet another "Morrocco"? (Saudi, Yemeni, and Slovak data is STR prediction based, but also not >50% E-M35.) These Genographic Consortium papers are often very frustrating. I can not find any definition of population sizes and actually the paper says Cyprus, Egypt, Lebanon, Libya, Jordan, Morocco, Palestine, Syria and Tunisia are in fact "previously published data from our laboratory" but no papers are named, so maybe the samples are just on yhrd.org or something? For now I will just add the paper to our references. --Andrew Lancaster (talk) 10:28, 1 February 2013 (UTC)
 * We can perhaps use the STR haplotype spreadsheet for population size information, and also for some kind of feeling about the quality of this data. Analysis of the STR table to confirm it is the same population as the SNP reports, which the article says should amount to 2884. But below seems to show some slight variation, and there are also lots of "NA" samples that apparently had no clear SNP reading(?), and that would not be good news. (Failure of the test is likely to be non random with respect to haplogroup.) Another concern is that the populations are so vague. For example Libyan Tuaregs have been tested in a paper. Could that be the Libyan population? I am feeling some doubts about whether to use this paper.--Andrew Lancaster (talk) 12:11, 1 February 2013 (UTC)


 * Good Catch on Tishkoff (2007), The Material and Methods section said that they tested for 15 polymorphisms, but the results table only showed 10 of them.
 * I haven't looked into this new paper closely, but if it has just STR predicted haplogroup assignments and not SNP tested I don't think it is good enough. I will look into it later Egenetics (talk) 12:27, 1 February 2013 (UTC)
 * I think we can use this new paper, the numbers from the STR data sheet (Table S2) work out, i.e. they match with their frequency data (Table S3), that is if we report the sample size as (Total - NA), see below. Also, we would have to assume that the SNP tested is only E-M35.
 * Two concerns: Can you make the total population size line up with this number they give of 2884? And how do you interpret what those NA cases mean?--Andrew Lancaster (talk) 15:52, 1 February 2013 (UTC)
 * No, I am not sure where those extra 50 samples are coming from, maybe a typo?. The reported frequencies (at least for E1b1b1) do however match up with what is reported in Table S3 for all the populations listed, I think the 2884 number may be a typo. There is also the case of the 490 Turks that they do not list in the paper as either coming from other literature or from their 'previously published data', I am guessing it is from the former.
 * I understand the NA case to mean an SNP defined sample but with <60% probability when the STR haplotype of the sample is run through a predictor . When it comes down to it the only new set of populations that this paper would add to our table are the Libyans and Egyptians, if we add the Cruciani (2007) data I mentioned earlier, that could represent the Egyptians (at least the Southern ones), the only ones missing would then be the Libyans. Egenetics (talk) 16:57, 1 February 2013 (UTC)
 * But I thought these were SNP tested, not predicted? Anyway, I am not strongly opposed or for. It might indeed be a good idea to fill in the patchwork and use it for the Libyans. But what about the Tunisians?--Andrew Lancaster (talk) 17:47, 1 February 2013 (UTC)
 * The Tunisians from Semino (2004) are in the table, although at much less frequency than what this study shows. In case you haven't read my comment above from yesterday, could you also remove the Trombetta (2011) citations you indiscriminately added yesterday to the Cruciani (2004) dataset, because that is wrong, that citation should only be added for a selected few of the Cruciani (2004) dataset, namely for the East Africans and the Marakesh berbers. Egenetics (talk) 18:21, 1 February 2013 (UTC)
 * I thought you had already done it? Anyway, I think you are familiar enough with tables now. Just need to get you using citation templates next. Concerning extra lines I am quite OK with your idea to add just this Libya line and Southern Egyptians from Tishkoff.--Andrew Lancaster (talk) 21:41, 1 February 2013 (UTC)

Egenetics (talk) 14:30, 1 February 2013 (UTC)

Another dataset that can be used is the Southern Egyptians dataset from Cruciani (2007),N=79, all E-M78, 40/79 Egenetics (talk) 14:42, 1 February 2013 (UTC)
 * Well spotted.--Andrew Lancaster (talk) 16:15, 1 February 2013 (UTC)

should we include language info in the frequency table
Many or maybe even all papers we are referring to, also mention language family. It is likely interesting to many readers.--Andrew Lancaster (talk) 11:38, 3 February 2013 (UTC)


 * Depends on which level, if on the macro level then almost all the populations listed are Afroasiatic speakers, segregating which Afroasiatic branch they each belong to maybe unnecessary, and confusing as well, for instance, what language would most of the North Africans be listed as, Arabic or berber ? Another point is that we really do not have a lot of Column space left considering the width of the current table.
 * My personal preference, if we are going to add any more Additional columns, would be a column for the specification of the region where the populations come from as tabulated in the relevant publications, as that would be more informative for the average reader. Egenetics (talk) 16:51, 3 February 2013 (UTC)
 * Your points about the languages make sense. Obviously you already know that I think adding non standard geneticists terms for geographical regions would add nothing helpful for a better understanding of either genetics or geography. The names we have are at least clear.--Andrew Lancaster (talk) 19:03, 3 February 2013 (UTC)

stepping back and looking at the article as a whole
Apologies already for a longish "rant". As we have a new editor busy here now I want to give my personal opinion about this article - or at least what worries, and disappoints me about the article and the typical types of edits it gets. Two examples of what this causes would be the big footnotes in the M78 discussion, and indeed the new paragraph in the lead. Putting aside the question of how right these bits are, it must surely be obvious that there is too much repetition in this article, and sentences have been re-hacked so many times that any normal reading must surely find it hard to even work out what this article is about. In the past, I've made efforts to help normal readers by for example trying to explain in normal words what it means to be a common human Y haplogroup. But what happens is you write something like "it is most common in Africa" and then you get so many objections that we end up having a paragraph explaining exactly where it is found, which is already discussed in another section. That is why there is so much repetition in the article. Concerning that lead. I'd have no objection to simply deleting that second paragraph from the lead as it stands right now. It is achieving none of its original aim of helping lay readers. But I would prefer it if we could at least put some sort of "human language" comment to help less informed readers work their way into this subject matter. I will deliberately avoid making any suggestion! More generally I hope that I can encourage ALL editors interested in this article NOT to focus on single words and phrases, but to try to read through the whole narrative and imagine a non-expert reader.--Andrew Lancaster (talk) 20:45, 2 February 2013 (UTC)
 * The most striking problem is that if you read through the article, it looks (even more than an average WP article) "made by a commitee".
 * Reason is that most editors who work on this article are VERY interested in certain small issues, and so they insert single words or phrases, and defend those edits hard, but they never look at the contexts they are editing in.


 * OK, I will make a proposal, to replace the current second paragraph: E-M215 is thought to have its origins in Africa, and it is still most common in parts of Africa. But in modern populations it is also found in Eurasia, especially in Europe and the Middle East.--Andrew Lancaster (talk) 12:20, 4 February 2013 (UTC)
 * I am not sure exactly what you are wanting to achieve, but I thought the 2nd paragraph was ok before you made a change here . The above proposal seems very broad to me. If there is a standard way haplogroups are introduced in Wikipedia, i.e. substructure, distribution, etc... then we should follow it, but I doubt such a standard has ever been created, so we should follow the basic standard procedures of how they are generally introduced in genetics papers. Egenetics (talk) 14:55, 4 February 2013 (UTC)
 * But we are not writing a genetics paper, and our article has quite a different format (no abstract in WP for example, and no sections on origins, distribution etc in published articles). And anyway do they have a "standard procedure"? There are actually no published articles about one haplogroup. I do not understand what is so complex or controversial about basic editing priorities like these:
 * Avoid repetition.
 * Make the lead broad and easy to understand. Put details in the body.
 * Are you really opposed to those sorts of things or do I misunderstand? Concerning what is normal on WP genetics articles, well keep in mind that a lot of experienced wikipedians feel uncomfortable with the whole genre of these articles. It has been suggested more than once that they should be deleted. My little rant about is my understanding about one of the biggest problems: too many of the editors are not interested writing a readable encyclopedia article which any person with a reasonable education can read.--Andrew Lancaster (talk) 16:58, 4 February 2013 (UTC)

Are These Pages Regressing?
The E1b1a and E1b1b pages are now full of the more archaeic labels E3a-M2, E-V38, and here E-M215. How can anyone tell that E3a-M2 and E-M215 are directly related to eachother, if they are not E1b1a and E1b1b? The old terms are confusing. I would like to suggest, that all the terms used are E1b1a and E1b1b, with a clarification at the top of the page that E3a-M2 and E-M215 have been replaced by E1b1a and E1b1b. Now isn't that a lot tidier? MrSativa (talk) 19:10, 1 August 2015 (UTC)
 * Eupedia has much more concise and accessable overview of E1b1b, excuse me, E-M215. http://www.eupedia.com/europe/Haplogroup_E1b1b_Y-DNA.shtml MrSativa (talk) 20:02, 2 August 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 15 external links on Haplogroup E-M215 (Y-DNA). Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20150402102624/http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0096074%2F to http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0096074%2F
 * Added archive https://archive.is/20121205040152/http://mbe.oxfordjournals.org/cgi/content/full/msm049/DC1?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=cruciani&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT to http://mbe.oxfordjournals.org/cgi/content/full/msm049/DC1?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=cruciani&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT
 * Added archive https://web.archive.org/web/20080406180100/http://hpgl.stanford.edu/publications/EJHG_2004_v12_p855.pdf to http://hpgl.stanford.edu/publications/EJHG_2004_v12_p855.pdf
 * Added archive https://web.archive.org/web/20090304100010/http://dirkschweitzer.net/E3b-papers/Hassan-Sudan-2008-AJPA.pdf to http://dirkschweitzer.net/E3b-papers/Hassan-Sudan-2008-AJPA.pdf
 * Added archive https://web.archive.org/web/20090305052142/http://dirkschweitzer.net/E3b-papers/KingAHG-08-72-205.pdf to http://dirkschweitzer.net/E3b-papers/KingAHG-08-72-205.pdf
 * Added archive https://web.archive.org/web/20120216123633/http://hpgl.stanford.edu/publications/AJHG_2004_v74_p000-0130.pdf to http://hpgl.stanford.edu/publications/AJHG_2004_v74_p000-0130.pdf
 * Added archive https://web.archive.org/web/20120216124036/http://hpgl.stanford.edu/publications/AJHG_2004_v74_errata.pdf to http://hpgl.stanford.edu/publications/AJHG_2004_v74_errata.pdf
 * Added tag to http://www.snp-y.org/files/fc197f8127f9bda2e22c6d314bb08ddb9fb887ff.onofri2006.pdf
 * Added archive https://web.archive.org/web/20130528022914/http://peer.ccsd.cnrs.fr/docs/00/51/83/10/PDF/PEER_stage2_10.1038/ejhg.2010.21.pdf to http://peer.ccsd.cnrs.fr/docs/00/51/83/10/PDF/PEER_stage2_10.1038%252Fejhg.2010.21.pdf
 * Added archive https://web.archive.org/web/20080506041100/http://www.ajhg.org/AJHG/abstract/S0002-9297(07)63221-2 to http://www.ajhg.org/AJHG/abstract/S0002-9297(07)63221-2
 * Added archive https://web.archive.org/web/20031125151213/http://hpgl.stanford.edu/publications/Science_2000_v290_p1155.pdf to http://hpgl.stanford.edu/publications/Science_2000_v290_p1155.pdf
 * Added archive https://web.archive.org/web/20060315210510/http://hpgl.stanford.edu/publications/AJHG_2002_v70_p265-268.pdf to http://hpgl.stanford.edu/publications/AJHG_2002_v70_p265-268.pdf
 * Added archive https://web.archive.org/web/20090305052141/http://publishing.royalsociety.org/media/proceedings_b/papers/RSPB20063627.pdf to http://publishing.royalsociety.org/media/proceedings_b/papers/RSPB20063627.pdf
 * Added archive https://web.archive.org/web/20110721091053/http://hpgl.stanford.edu/publications/AHG_2001_v65_p43.pdf to http://hpgl.stanford.edu/publications/AHG_2001_v65_p43.pdf
 * Added archive https://web.archive.org/web/20120218090544/http://download.cell.com/AJHG/pdf/PIIS0002929708005478.pdf?intermediate=true to http://download.cell.com/AJHG/pdf/PIIS0002929708005478.pdf?intermediate=true
 * Added archive https://web.archive.org/web/20080626054459/http://mbe.oxfordjournals.org/content/vol22/issue10/images/large/molbiolevolmsi185f04_ht.jpeg to http://mbe.oxfordjournals.org/content/vol22/issue10/images/large/molbiolevolmsi185f04_ht.jpeg

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 20:52, 29 October 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on Haplogroup E-M215 (Y-DNA). Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20150721164227/http://ubm.opus.hbz-nrw.de/volltexte/2015/4075/pdf/doc.pdf to http://ubm.opus.hbz-nrw.de/volltexte/2015/4075/pdf/doc.pdf
 * Added archive https://web.archive.org/web/20130730041839/http://www.cell.com/AJHG/image/S0002-9297(08)00592-2?imageId=gr1&imageType=large to http://www.cell.com/AJHG/image/S0002-9297(08)00592-2?imageId=gr1&imageType=large
 * Added tag to http://www.ebc.ee/EVOLUTSIOON/publications/Shen2004.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 11:00, 17 January 2018 (UTC)