Talk:Genome-wide association study

Difference between humans
Note: the familiar statement that humans differ from each other by only about 0.1 % seems inaccurate because the two haplotypes of a diploid human genome differ by about 0.5 %, and that was shown for an individual with European ancestry through both parents (PMID: 17803354). Differences might be greater with greater geographical variation in ancestry. Daniel haft (talk) 19:39, 22 April 2009 (UTC)

The statement seems correct for SNPs only and is also true for comparisons between a Han Chinese and the two European genomes (Nature 456, 60-65 (6 November 2008) | doi:10.1038/nature07484, see supplementary figure 2)

while the 0.5% is a more complex measure (all heterozygous bases, i.e., SNP, multi-nucleotide polymorphisms [MNP], indels, [complex variants + putative alternate alleles + CNV]/genome size; [2,894,929 + 939,799 + 10,000,000]/2,809,547,336) 192.38.114.138 (talk) 13:14, 20 July 2009 (UTC)
 * This might be the reason for the difference. --hroest 07:40, 18 May 2010 (UTC)

Hypothesis free
I'm concerned about the statement that GWAS are "hypothesis-free." It's quite the opposite: every SNP tested is a hypothesis, and any analysis must be subject to corrections for multiple-hypothesis testing. —Preceding unsigned comment added by Uri.laserson (talk • contribs) 05:01, 15 October 2009 (UTC)
 * They are hypothesis free compared to a single gene study --- there is no a priori hypothesis, which part of the genome might be involved and all are tested. This does not mean that there are no hypothesis that are tested  but rather that a flat prior is assumed. Maybe reformulation might alleviate that...--hroest 07:40, 18 May 2010 (UTC)
 * I'm concerned by the statement as well, and I'll change it. I have heard "hypothesis-free" being used before, but it is not the right and it is not a commonly used. There is very much a hypothesis behind, namely that testing all SNPs will reveal which are associated. Correct words would be "unbiased" vs "candidate-driven". --LasseFolkersen (talk) 20:01, 25 October 2011 (UTC)

Claim that noncoding are almost certainly NOT causal
I reverted the recent edit that claims that noncoding are almost certainly NOT causal. That is not true. I agree that most GWAS hits still are not complete clear on which particular SNP in a block is the actual causal variant, but for the ones where it is known (e.g. SORT1-locus) it is noncoding SNPs that are ultimately causal. --LasseFolkersen (talk) 09:31, 22 November 2011 (UTC)

Under construction
I'm starting up a major revision of this article. So far I expanded the backgrounds and methods. Will get back to "genes identified" and "limitations". Help and suggestions welcome!--LasseFolkersen (talk) 14:40, 6 December 2011 (UTC)

I have tried to keep all the good bits that were already in the text (even though I rewrote some of them). One exception however: "...these [ARMD] genes can predict half the risk of ARMD between siblings, and it is among the most successful examples of GWAS.". I think this sentence is highly useful, but the wording of prediction of half the risk between siblings is a bit confusing in comparison to the more commonly used proportion of heritability explained (which I also use later on). So apologies for removing a good bit - I just think the article will be less confusing without it --LasseFolkersen (talk) 08:38, 7 December 2011 (UTC)

Ok, I'm pretty much done with the expansions I planned. Only need to clear up the citations. CitationBot has problems, but I'll try it later again. Then some grammar and prose check-ups, and then I'll have a couple of colleagues read it through. Other comments welcome! --LasseFolkersen (talk) 12:16, 7 December 2011 (UTC)

Peer review updates
I now had time to go through the reviews as found in the request for peer review page. The changes to the main text have already been comitted, and here is the point-by-point response.

From User:Cryptic C62

1) Except for the introduction sentence, all "GWAS" have now been replaced with GWA study/ies. This choice was made for easier pluralization.

2) "From 2005" was changed to "published in 2005"

3) Today was replaced with the template, as this is definetly a statement that will age quickly.

4) Surprisingly was deleted

5) I have added the following summary of limitations section to the lead section "Several GWA studies have received criticism for omitting important quality control steps, rendering the findings invalid. Largely this type of criticism can and is overcome in more modern publications. However, the fundamental methodology still have opponents."

From User:The Rambling Man

1) ok, I have changed the caption to follow the correct way of captioning things here.

2) ok, I have removed the full stop after the caption

3) ok, the User:Cryptic C62 suggestions have been included

4) I think anybody who reads an article called genome-wide association study will think less of it, if it expands DNA to deoxyribonucleic acid. "DNA" is so routinely used that the whole spectrum from science journals to popular newspaper articles assumes knowledge of it and leaves it out of acronym lists.

5) The SNP acronym is already expanded to single-nucleotide polymorphisms at first occurence. Could you perhaps be more specific on where you are aiming.

6) ok - I use the Schena 1995 reference because that's the landmark microarray publication.

7) you might be right that "unity" is a more formal way. I didn't change it now though, because my aim was to loose as few readers to formality as possible. Feel free to change yourself if you think it is important.

8) ok. I have moved it. It's a pity that the article doesn't have a lead figure though.

9) ok, I have now used the correct format for dashes.


 * That should solve all the comments in this peer review. The only thing I miss in this article now is a good lead picture. Reviewer 2 asked for the previous lead, the manhattan plot, to be moved to the methods sections and I can't think of a better picture to illustrate GWAS. Suggestion, anyone? --LasseFolkersen (talk) 15:28, 21 December 2011 (UTC)

Late peer-review related comments
Apologies for the lateness, but I was invited to contribute to the peer review through my talkpage.

Lead:
 * I think the first sentence needs rewording; specifically "to see if any variant is associated with a trait" is unclear.
 * I also don't think genetic variant should link to SNP, it's not the only possible genetic variant (as the dab shows) and if we're talking about SNPs it should just say SNP; it seems SNP GWAS are the focus of the rest of the lead. Admittedly other genetic variants can be used for GWAS so I suggest either a more general reference to genetic variation or removing the link... or we decide the entire article is about SNP GWAS, with a brief mention to the alternatives.
 * The lead is pretty hefty. I think it could be cut down, for example the details of the first GWAS (number of case/controls, outcome) aren't needed here. The lead should summarise the main article, so this detail should have been repeated in Results anyway (I suggest it's just moved there).
 * The second lead paragraph is also bloated, sentences like "The results are then read into computers, where they can be analyzed" could be removed.
 * Couple of word repeats in: "...genetic variants ... if any variant is..."; "Typically ... and typically...".

Body: (Prelim)
 * Has their been any debate on use (which I note is consistent) of GWA studies? Personally, I would prefer GWAS, that's the term I'm most familiar with, but if this has been decided by prior consensus then ignore me by all means.
 * edit: I should have read the above comments more closely!
 * In 'Background', the sentence "In addition to the conceptual framework other enabling factors made GWA studies possible" could be reworded for clarity, or preferably merged into the following sentence as it is being used as a verbose conjunction.
 * I'm not convinced WikiGWA deserves a mention in 'Results', if it is to be kept it needs a third-party source.
 * In 'Clinical applications', the final paragraph is of a dubiously-notable specific example published in Bioessays, I suggest it is removed.
 * "SNPs associated with diseases are currently numbered in the thousands" requires a citation or should be removed, presumably some of these associations are exceptionally weak so if it remains it should be further qualified.
 * The limitation section has wording issues, for example: "Although these issues can be taken care of, it is not always done" sounds like synthesis and adds little. The following sentence also needs attention. They both could be removed and the citation kept with the previous statement.
 * There's also a quote without a citation directly following it (I've marked as such).

Overall, the citations are mostly present when needed (I have not investigated the references themselves yet), the blue-linking is about right and the language is not overly technical. My criticisms are that sometimes the language could be tighter and I feel this article is missing some content, I suggest there are additional examples which convey the power and findings of these studies (failing that a CC-licensed publisher could add some nice diagrams, e.g. PLoS Genet.).  Jebus989 ✰ 23:43, 4 January 2012 (UTC)

Point-by-point response for lead:

1) Could you come with more suggestions on what is unclear? Is it the use of the word trait? Or association? Instead of trait I could use "property" or "characteristic" or something like that. Association is more difficult to get around, but not impossible if I just use more words. How about "(WGA study, or WGAS), is an examination of many common genetic variants in different individuals to see if they are more or less common depending on the characteristics of the individuals."

2) Changed genetic variants to point to genetic variants (thanks for pointing out). About the focus of the article on SNPs and/or "other variant types" (CNVs, etc) - a CNV study is also a GWA study, but I the vast majority of all GWA studies are on SNPs. So I kept focus on that, while still mentioning the more general genetic variants. Hence "Typically SNPs are investigated".

3) Sure. I moved the sample size info to results.

4) Ok. I removed a lot of it. The bits about risk-SNPs being outside coding regions were not even in the main text so I removed it also. I could probably fill in a little about that in the main later. TODO

5) The variants-variants is the same sentence you comment on in 1). See there. The typical-typical... I would argue for keeping it this way. The reason is that it is not only SNPs and it is not only majory diseases. Only the vast majority of studies are. But if we don't write it with a "typical"-disclaimer all the people in the non-major-disease/other-phenotype/CNV groups will conclude that the text is wrong.

Point-by-point response for prelim body:

1) yes, 'GWAS' is a bit more widely used historically but I feel this is turning towards GWA study/ies, because it is so much easier for the writer to indicate pluralization. And so it also makes for more readable wikipedia article.

2) I think I understand what you mean - how about "In addition to the conceptual framework, the following technological factors enabled GWA studies: One was the advent...." then it is more connected, and by writing technological it's emphasized that it is requirements apart from just the idea to test for many associations. Or did I understand this comment correctly?

3) you are right - it is not very noteworthy yet. I think their manuscript for it must still be in review (I'm not associated with that group by the way). In contrast the most noteworthy attempt (the PNAS article reference, with http://www.genome.gov/gwastudies) is just so horribly in lack of updates that it seems wrong to let it stand as the sole source. How would you think about just removing the web-page reference but leaving the sentence "however, more updated resources might be found...".

4)I'll extend this SORT1 example rather than leave it. The bioessays-article was just a review covering all the research on it, which is much more heavy-weight. I'll do better and expand that, because I think this is one of the most notable examples at least of cardiovascular GWA studies. Certainly more than the Peginterferon-link which was here previously, but which I've heard very little about outside of wikipedia. TODO

5) Ok I added the reference from the lead that was already used for the same. For now. You are undoubtly right that several of these are of less than strong significance, but finding out exactly how many is probably original research.

6) I reworded it to "Ignoring these correctible issues has been cited as contributing to a general sense of problems with the GWA methodology", because I think it is important comments from some of the heavy weighters in the field (Joe Pickrell etc. in the reference). Did this address your concern?

7) I don't know. This factory-science-sentence was here before I started so I left it, to not upset previous authors. I'll just contract it into the cousin-frankel reference which was definetly a verifiable and much discussed opionion piece.

That's it for now. I know I left some to-do loose ends, but it's getting late so I'll get back to them later. --LasseFolkersen (talk) 16:40, 5 January 2012 (UTC)


 * To reply to 1), I meant it in the sense that it could make GWAS seem like a genome-wide functional annotation exercise, as if we're looking to assign any variant any function. I am unable to suggest a clearer sentence, however, and it's a minor criticism which is cleared up by the following sentence. I think everything else I agree with, good work!  Jebus989 ✰ 09:53, 9 January 2012 (UTC)


 * Ok, now I have also cleared up the few to-do's I had postponed and added some PLoS genetics figures that looks a bit more professional. Also gave a custom-made figure to explain the methods a try. I'm not sure if it adds anything though --LasseFolkersen (talk) 10:08, 16 January 2012 (UTC)

Undo of first GWAS study
After some research I reverted the changes on first GWA study to the 2005 CFH study by Klein et al. In addition I added "the first successful GWA study". As far as I could see, the results from earlier studies such as the one by Ozaki et al were never reproduced successfully.--LasseFolkersen (talk) 11:17, 7 August 2012 (UTC)
 * Thanks for explaining.  Blue Rasberry    (talk)   12:49, 7 August 2012 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Genome-wide association study. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20111205231931/http://www.omim.org:80/ to http://www.omim.org

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 08:19, 9 January 2017 (UTC)

"Phenome-wide association study" listed at Redirects for discussion
An editor has asked for a discussion to address the redirect Phenome-wide association study. Please participate in the redirect discussion if you wish to do so. Pam D  19:58, 18 February 2020 (UTC)

Addition of Agricultural Application
I added another application of GWAS, its Agricultural Application. I cited two articles to support the idea that GWAS is useful for developing new pathogen-resisted cultivars and improving yields. Would anybody be willing to review it for me? MaggieKKK (talk) 00:27, 10 December 2020 (UTC)KKK

Addition of Conservation applications
I added a new section called Conservation applications to provide information about GWA studies as a tool for genetic diversity research and conservation. Aharveypdx (talk) 02:58, 21 March 2023 (UTC)