Talk:Single-nucleotide polymorphism

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 24 March 2020 and 29 April 2020. Further details are available on the course page. Student editor(s): Grahamla44.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 09:24, 17 January 2022 (UTC)

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 11 January 2021 and 21 April 2021. Further details are available on the course page. Student editor(s): CNArmstrong.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 09:24, 17 January 2022 (UTC)

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 23 August 2021 and 10 December 2021. Further details are available on the course page. Student editor(s): Enigmadxx.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 09:24, 17 January 2022 (UTC)

"Single nucleotide variation"
What happened to the page for this? Not all SNVs are SNPs, since polymorphism implies a greater plurality than a spontaneous mutation that may not generally occur in the population. Additionally, SNVs are the superset of SNPs that also includes single nucleotide indels. — Preceding unsigned comment added by 118.209.225.12 (talk) 05:28, 19 June 2013 (UTC)
 * Agreed. I don't think the definition of SNV is correct, that it is necessarily not germline.  There's also no reference provided.  Here are some answers I've found.


 * "a SNP is when an aberration is expected at the position for any member in the species – for example, a well characterized allele. A SNV on the other hand is when there is a variation at a position that hasn’t been well characterized – for example, when it is only seen in one individual. It is really all a question of frequency of occurrence" (https://www.biostars.org/p/9397/)
 * "A polymorphism is something which is seen in atleast 1% of a population. An SNP is a single nucleotide change(substitution) which we see in more than 1% members of a population. A single nucleotide variation is just a variation in a single nucleotide without any limitations of frequency." (https://www.quora.com/Genomics-What-is-the-difference-between-an-SNP-and-an-SNV)
 * Multiple types of SNVs [includes definitions for "germline SNVs" and "somatic SNVs". http://cbcb.umd.edu/omics/Talks/Moult-Omics.pdf
 * I will update the definition on the page.
 * Traversc (talk) 21:30, 13 January 2017 (UTC)

SNV is an awfully confusing term. The frequency definitions make no sense. I realize it’s usage has creeped into some fields of study - so we have to live with it - but it is a specific type of SNP. A conversation about rare alleles, Identity by descent (IBD) and Identity by state (IBS) should be added. Those three concepts are older than SNV and provide clarity. Yutgoyun (talk) 12:54, 11 October 2020 (UTC)

There is certainly confusion between the terms SNP and SNV in the genealogy communities. They tend to prefer to use SNP exclusively. But to be clear, a SNP is a particular type of SNV, not the other way around. A SNP is a SNV that is common enough to merit the prefix "poly" (which means "many") before the root "morph" (which means "change"). Usage hasn't changed over the years, but only recently have scientists begun to be able to inexpensively assay the rarer SNVs that are not SNPs. So the word SNV is showing up in the literature more. Jaredroach (talk) 22:40, 26 October 2020 (UTC)

Initial uses of "SNV" in early 2010s focused on somatic substitutions. However, because "variation" literally sounds more general than "polymorphism", researchers who were not aware of the history started to use "SNV" for germline substitutions as well. Probably the majority of researchers nowadays think all SNPs are SNVs. It is unfortunate but I don't see the trend can be reversed at this point. Heng Li (bio) (talk) 18:55, 18 April 2021 (UTC)

"fair use"
I'm new, and before charging into anything, I'd like to understand better what is considered "fair use" in the Wiki world. To me, this article exceeds "fair use" guidelines, as the information is straight offf of this page: http://www.ornl.gov/sci/techresources/Human_Genome/faq/snps.shtml The source is not directly credited, with only an oblique attribution. Is this considered legit and consistent with Wiki guidelines?

Regards --Daffyd 10:55, 27 October 2005 (UTC)


 * "Fair use" usually refers to a specific legal definition within copyright law. The content in question is provided by the US Government and is not copyrighted at all, so legally there's nothing wrong with copying it verbatim. Not that it's polite. --Mike Lin 17:27, 27 October 2005 (UTC)


 * Thanks for the clarification, makes sense. --Daffyd 17:55, 27 October 2005 (UTC)


 * It is unethical to use the material without crediting the source. If I had run across this in a student paper, the student would be charged with plagiarism.  What is even worse is that the definition is wrong and misleading.  The author(s) clearly don't know the meaning of polymorphism, and confuse cause and effect (e.g., "...a SNP might change the nucleotide sequence...." or "SNPs are generally considered to be a form of point mutation....").Ted 01:26, 20 January 2006 (UTC)


 * The only problem there is that the author(s) get(s) SNP confused with what would be true of a UEP. Nagelfar 01:16, 14 March 2007 (UTC)

In science, sources must *always* be sited. In addition, it is important not to copy things when you do not understand them fully. Generally, if you don't understand the material well enough to write your own description, you shouldn't just copy the information here. Thank you for contributing, but more care should be exercised since some of the information was not interpreted correctly so was misrepresented. I corrected a few of the errors but it would take a while to fix so I will try to return when I have more time. But I wonder, how many readers came to this page and walked away with wrong information in the interim? Ed 27 Jan 06

Folks, Wikipedia is not a scientific publication. The purpose of an encyclopedia is not the claim and exposition of our original work, but merely the accurate presentation of fact. The article in question is in the public domain and, though flawed, was written for this same purpose, so its copying here, from a reasonably trustworthy source, is in fact appropriate.

It is a separate issue that the original article was itself written with somewhat sloppy language, and certainly we will improve upon it.

Finally, I'd point out that it is quite rare for encyclopedia articles (even those on scientific subjects) to cite sources beyond the level of "for further reading", so many Wikipedia articles are already quite extraordinary in this respect.

--Mike Lin 07:10, 29 January 2006 (UTC)

Plagiarism is a problem in all areas, not just scientific publications. As a stub, reproductions of material from other sources is OK if it is properly cited. Wikipedia is an academic publication in the broad sense of both 'academic' and 'publication'. Ted 13:46, 30 January 2006 (UTC)


 * Daffyd, in answer to your question, yes, this is considered "legit and consistent" because the source website is linked in the "References" section, and the use does not violate copyright law. -- Reinyday, 17:44, 30 January 2006 (UTC)

Correction: allele
Quoting from this article - "For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. In this case we say that there are two alleles : C and T."

This definition of allele is incorrect. C and T comprise a nucleotide pair. DNA molecules are chains of nucleotide pairs. Many nucleotide pairs are not located within a genetic locus. Here is the definition of allele from the glossary section of the Human Genome Project Information website 

Allele: Alternative form of a genetic locus; a single allele for each locus is inherited from each parent (e.g., at a locus for eye color the allele might result in blue or brown eyes).

Lgfree 01:52, 17 June 2006 (UTC)lgfree

The rub is that SNPs are not related to loci in the classical sense. SNPs can be located within (expressed) genes (possibly giving rise to alleles) or between genes -- it makes no difference for the SNP. Or, if you want, because we are simply looking at single nucleotides, every nucleotide is a possible "marker locus." In that case, differences between alternate forms of that single nucleotide marker locus take on the four possible nucleotides. I'd hate to put something like that in the lead paragraph. Maybe it can be explained better somewhere else in the article. Ted 02:40, 17 June 2006 (UTC)

Thinking along these lines, it may be better to use the word variant, rather than allele. gringer 06:12, 9 July 2007 (UTC)

Definition query
I'm not sure if this statement is correct: Almost all common SNPs have only two alleles My understanding is that there are only two possible options (alleles) at a single nucleotide locus. If I'm wrong, would it be useful to have an example of a SNP that has more than two alleles?

Buzwad 11:59, 22 March 2007 (UTC)


 * They're quite difficult to find (mostly because most high-throughput genotyping assays don't work well with non-dimorphic SNPs), but they do exist. Here's a link to one that seems to have a fairly large amount of validation: http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=1977161 gringer 01:17, 6 July 2007 (UTC)
 * Except I've just noticed that the reported frequencies do not include A/T, so it may not be a "real" SNP of this nature. Regardless, it indicates that people are allowing for the possibility of more than two alleles at a single nucleotide locus. gringer 01:23, 6 July 2007 (UTC)


 * rs332 is a tri allelic small indel, close enough to being a snp for the ncbi (dbsnp rs332). It is also the cause of 50% of Cystic Fibrosis cases . Cariaso 16:48, 19 August 2007 (UTC)
 * rs3091244 is a legitimate triallelic snp ncbi rs3091244 Cariaso 16:09, 10 October 2007 (UTC)


 * This is contradictory:"This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms."

' Almost all ... have only two' vs. 'the lesser of the two alleles for single-nucleotide polymorphisms' seems to be contradictory. —Preceding unsigned comment added by 98.207.80.224 (talk) 17:57, 17 February 2009 (UTC)

SNP vs Single Nucleotide Polymorphism
In most academic publications, abbreviations are acceptable in the text as long as they have been defined previously. Given the short form (SNP, pronounced "snip") is significantly quicker to say than the expanded form (Single nucleotide polymorphism), and the abbreviation has been stated in the first sentence, wouldn't it be reasonable to use SNP for subsequent uses in this article? I notice that there have been two instances in the revision history where there has been a change from the abbreviated form to the expanded form (or vice versa). I'm mostly mentioning this because 65.40.126.255 has changed the text back to the expanded form without noticing the previous history comment by 82.139.86.128 on 17 April.

gringer 01:56, 12 July 2007 (UTC)


 * I would say it is ok to use "SNP" once the abbreviation has been stated in the first sentence. &mdash; fnielsen (talk) 08:24, 4 September 2008 (UTC)

Possible vandalism?
I just reverted this because it was an anonymous IP's only, unexplained edit which changed a word to its opposite. Could someone make sure it was vandalism? 128.240.229.65 (talk) 15:50, 10 January 2008 (UTC)

Thanks, The idea of a bot which randomly changes words to their antonyms will now give me nightmares. In this case I agree with the anonymous editor. This 1% definition seems to have crept in due to the hapmap project. The largest hapmap populations were 120 people, so if a variation occurred in less than 1% of a population it was unlikely to be discovered during the hapmap. There doesn't seem to be any basis for the 1% rule, except the costs and limitation of the technologies of the day. A single nucleotide change which occurs for 1 person in 1000 should still be considered a snp. Cariaso (talk) 05:50, 30 January 2008 (UTC)
 * "a minor allele frequency of less than or equal to 1%" would probably have been overlooked
 * "a minor allele frequency of greater than or equal to 1%" is definitely a snp.

Missense mutation
How does this relate to the "missense mutation" article. This layman needs clarification. —Preceding unsigned comment added by 212.24.80.93 (talk) 12:32, 11 August 2008 (UTC)


 * This concept was not explained and should be: A missense mutation is a nonsynonymous mutation. I have now made a section title "Types of SNPs" and expanded it with a infobox. There is room for improvements. &mdash; fnielsen (talk) 08:28, 4 September 2008 (UTC)

At the risk of stating the obvious, SNPs most often are a result of a point mutation in one of ancestors' germline cells.

The genomic distribution of SNPs is not homogenous; SNPs usually occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and fixating the allele of the SNP that constitutes the most favorable genetic adaptation.

There is an interesting question why, in general, SNPs resulting from both synonymous (silent) and missense mutations should "usually occur in non-coding regions more frequently than in coding regions". Is it linked to variance of point mutation rates in euchromatin and heterochromatin (nucleobase tautomerization, their spontaneous or oxidative deamination, or pyrimidine dimerization?), the varying effectiveness of DNA repair mechanisms (like base excision repair), their combination, or perhaps something else? One could guess that rather not because of natural selection or recombination, in the case of synonymous mutations not affecting the rate of splicing errors. There is a possibility that a more generalized process, similar to repeat induced point-mutation (or RIP) could account for the disparity between coding and non-coding regions mentioned above. 80.240.162.190 (talk) 12:08, 30 December 2012 (UTC)

More Info
I see that there is nothing said about how many SNPs have currently been mapped in humans. I would like to see a short table of species and how many SNPs are currently known in each... (This would need to be date stamped though.) See: dbSNP summary

--Jahibadkaret (talk) 16:22, 3 September 2008 (UTC)

Reference for 4Qk10 SNP?
This article mentions a SNP called 4Qk10 that should be common in Ashkenazi Jews and rare in Cubans. I can't find any references to back this up - neither dbSNP nor Google can clarify this. Does anyone have a reference for this?

Stinusl (talk) 12:05, 25 November 2008 (UTC)


 * Agreed. That name does not follow common patterns for naming snps. Furthermore while Ashkenazi populations are well researched, this is far less common for Cubans. The statement was added by an IP with no previous history of edits. I encourage the next interested party to remove the text. Cariaso (talk) 07:50, 26 November 2008 (UTC)


 * done. Cariaso (talk) 20:46, 28 November 2008 (UTC)

hyphen
I believe in the more traditional way of using hyphens. They got lax about teaching it so long ago that even most professors no longer use it habitually, but magazines and newspapers do, so everybody still understands it, and that makes it possible to save it, and its worth saving for reasons noted below.

In some cases it disambiguates and adherence to a single style is preferable. One may, for example, omit a question mark at the end of many questions and still expect the reader to recognize them as questions, but in some cases the question mark conveys substantial information, and adhering to a single style&mdash;that of always including it&mdash;is therefore better than adhering to a style that always excludes it.

This is about polymorphisms involving a single nucleotide, NOT about "nucleotide polymorphisms" that are single. In this case, such a disambiguation may be like adding a question mark to a sentence that people would already have recognized as a question, but adhering to the style helps maintain the habit so that it's there for cases where it helps more than that.

See also hyphen for more on this.

That is why I changed the article's title, adding the hyphen. Michael Hardy (talk) 19:38, 28 December 2008 (UTC)

a SNP / an SNP
grammar is not my forte but I feel this change is not correct. When I read the text out loud I hear "a snip" not "an S-N-P". Cariaso (talk) 15:53, 22 April 2009 (UTC)
 * I do too head snip, and corrected/changed it. &mdash; fnielsen (talk) 20:00, 22 April 2009 (UTC)

The word Polymorphism
Molecular biologists use the word "polymorphism" quite differently from the rest of biologists. By the most common biological meaning of the word, polymorphism refers to the existence of clearly diffentiated phenotypic groups in the same population of a species. Molecular biologists use this word to talk about DNA sequence changes, regardless of the phenotype. I think this can be very confusing to the general reader, so the terminological disctintion should be clearly noted on the article. This also applies to all the articles with the "molecular meaning" of polymorphism. It's already done in the Polymorphism article.--Earrnz (talk) 01:48, 4 May 2009 (UTC)


 * "Phenotype" is variable at the molecular level, but I think molecular biologists might be misusing the phrase SNP also. A SNP is by definition a polymorphism, not a mutation. People with SNP's may not appear any different on the outside but in reality, the "phenotype" for that gene may be different in that it has an extra amino acid added to the catalytic site, etc, in such a way that it's biological activity is different.  We don't really have the technology yet to elucidate whether SNP's make such subtle, yet selectable differences on allelic phenotypes -- for example, a SNP that reduced the catalytic activity of an enzyme to 90%.


 * As far as "polymorphisms" are concerned, I think the article needs to better define this point. The average human cell has hundreds of thousands of random single-nucleotide mutations. These should not be confused with SNPs, which are true polymorphisms.  In other words, to be a SNP it has to occur at >1% of the population. Different races have variations of SNP's, but within an ethnic group, most SNP's are consistent.  The implications of being polymorphisms as opposed to random mutations is very important and I think the article needs to add the "greater than 1%" part of the definition in order to make this clear.   Johnny California

Example Section
Firstly, the Example section needs work. Secondly, what do we consider a "valid example" of a SNP? Annotated SNPs? Notable SNPs?.. Geno-Supremo (talk) 16:17, 21 May 2009 (UTC)

Grammar error in first sentence
A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome (or other shared sequence) that differs between members of a species (or between paired chromosomes in an individual).

Should this read:

A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual).

?

Number of SNPs in TAS2R38 (Examples section)
The reference I snarfed from the TASR2R38 article for the PTC tasting and number of SNPs only mentions 3 SNPs, not 6, but it could be that it's only talking about SNPs that affect PTC tasting, not all the SNPs in that gene. (Assuming there's a difference.) Being mostly clueless, I won't touch that bit. The Crab Who Played With The Sea (talk) 03:56, 24 March 2010 (UTC)

Dr. Steve Ligget
http://en.wikipedia.org/w/index.php?title=Single-nucleotide_polymorphism&diff=prev&oldid=358109401. This unsourced statement makes the first sentence even harder to read. Proposed for rapid removal. Cariaso (talk) 00:57, 5 May 2010 (UTC)
 * Google suggests the name should be Stephen B. Liggett (we appear to have no article). I agree that it is just noise in the lead, and since the lead is supposed to be a summary of the article (which has no mention of the name), I support removal. An edit summary might be "unsourced, see talk". Johnuniq (talk) 01:58, 5 May 2010 (UTC)

The picture
I would consider changing the colours of the picture. I don't like the two strands having tgnbvhjgvbhe same colour as adenine and thymine nucleotides.

212.126.224.100 (talk) 10:24, 27 August 2010 (UTC)

Definition in first line of the article is too narrow
Quote: "A single-nucleotide polymorphism (SNP, pronounced snip; plural snips) is a DNA sequence variation occurring when a single nucleotide — A, T, C or G — in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in a human"

The end part of this definition is awkward "differs between members of a biological species or paired chromosomes in a human" sounds like SNPs are only defined as differences between paired chromosomes if they are in humans. Also humans are biological species so it's an awkward distinction to have here in the definition. It's ambiguous at best...

Picture should be changed
For someone who is trying to figure out SNP genotyping, the picture caused me a lot of confusion. It shows a C/A polymorphism; however, C/A is rare because the usual pairs are C/T and G/A. The picture also does not match up with the example in the first paragraph (a CT SNP). I think that it would be better to us the picture found on this page:

http://www.isogg.org/wiki/Single-nucleotide_polymorphism

71.93.120.216 (talk) 01:26, 10 April 2015 (UTC)AmateurBehavioralGeneticist


 * So sorry about that. The article has several issues and needs a rewrite. I'll set aside some time this weekend to sort it. SW3 5DL (talk) 03:34, 8 May 2015 (UTC)


 * I've now updated the image to G/A. gringer (talk) 13:28, 8 May 2021 (UTC)

Common SNP
Quite a few papers and books define common SNP to be what this article (and Nature) define as just SNP. So I'm guessing they use a wider definition of SNP. 86.127.104.145 (talk) 05:27, 14 October 2017 (UTC)

Forensics
The information regarding use of SNPs vs STR fingerprinting for forensic use in identifying criminal suspects and crime victims is significantly out of date; see https://en.wikipedia.org/wiki/Genetic_genealogy#Law_enforcement and https://en.wikipedia.org/wiki/Forensic_genealogy Pseudonymous Cognomen (talk) 04:11, 4 August 2019 (UTC)

Linkage disequilibrium
I think this part of the article should be punctually revised:

"LD is affected by two parameters: 1) The distance between the SNPs [the larger the distance, the lower the LD]. 2) Recombination rate [the lower the recombination rate, the higher the LD].[7]".

Instead, it should read:

LD is affected by two parameters: the distance between the SNPs (the larger the distance, the lower the LD) and the recombination rate (the lower the recombination rate, the higher the LD).[7]

In addition, I think it's worth noting that even though this statement is accurate to a degree, there are several other factors that also affect linkage disequilibrium that are worth noting. Some of those factors include gene flow, mutations, natural selection, and genetic drift within the population. . This source I provided to support my suggested edit is also a more recently written article (2008) versus the 2001 article cited as source [7]. Grahamla44 (talk) 00:52, 7 February 2020 (UTC)

The punctual revision sounds good. But actually - should it be expanded a little? Like along the lines you suggest, because distance and recombination is not enough.

For example sometimes I get students (and myself) confused if there is Snp1 snp2 snp3 snp4 ... Sequentially by position, and snp1 and snp3 is high LD and snp2 and snp4 is high LD, but other combinations are not. This can happen, often, when using R2 as LD metric, but it becomes very illogical why when the text is written like above with only distance and recombination rate as factors. Yinwang888 (talk) 05:41, 11 February 2020 (UTC)

You are correct. I modified the grammar to indicate that there are other factors that affect LD. Jaredroach (talk) 22:49, 26 October 2020 (UTC)

The definition of SNP should not require a frequency threshold
This topic has been reoccurring in this Talk but without detailed explanations. I am providing a bit history with supports from the literature. One of the first uses of SNP is documented in PMID:7937143. It has no mention of a frequency threshold. The dbSNP paper in 1999 and the human genome in 2001 did not apply a frequency threshold because at that time we simply hadn't sequenced enough samples to measure frequency. When the 1000 Genomes paper was published in 2015, we had enough samples to measure frequency, but we didn't set a threshold, which is apparent in the abstract of the 1000 Genomes paper. If we required a frequency threshold on the definition of SNP, all these landmark papers would be invalidated. Conversely, the origin of the frequency requirement is mysterious. I couldn't find papers supporting the claim.

It is also impractical to apply a frequency threshold to the definition of SNP. The culprit is "the population" in the Nature definition –– which population? Is it a population in a local area, in a continent or in the whole world? Allele frequencies can vary a lot depending on the scope we are looking at. Sometimes an allele of high frequency in, say, African can be entirely missing from European. Then we wouldn't be able to generally tell if a substitute is a SNP if we had a frequency requirement.

-- Heng Li (bio) (talk) 19:25, 18 April 2021 (UTC)


 * The correct approach to searching for the literature date is to find the origin of the word "polymorphism". It dates at least back to 1945. For example, Ford writes, "Polymorphism is the occurrence together in the same habitat of two or more distinct forms of a species in such proportions that the rarest of them cannot be maintained by recurrent mutation." (https://doi.org/10.1111/j.1469-185X.1945.tb00315.x).
 * Since the word "polymorphism" has a well-defined meaning, as does the word "variant", there doesn't seem to be overriding incentive to redefine polymorphism to mean all variants.
 * I agree with you that the definition of population is context dependent. That in turn means that the definition of SNP is context dependent. For example, if a SNP that is present in a population today completely disappears (or drops below the frequency threshold) from the population in the future, it will no longer be a SNP.
 * Among the several citations mentioning a SNP threshold is Brookes (1999), "Single nucleotide polymorphisms (SNPs) are an abundant form of genome variation, distinguished from rare variations by a requirement for the least abundant allele to have a frequency of 1% or more." The essence of SNPs.
 * Landmark papers will not be invalidated, no matter the outcome of this discussion. Jaredroach (talk) 22:42, 18 April 2021 (UTC)


 * I am not sure if "polymorphism" is well-defined. It has been used inconsistently in the literature. Even papers requiring a threshold don't actually apply it. For example, Brookes (1999) defined SNPs with a frequency but many studies cited in the review included rare substitutions because these studies wouldn't have enough data to ascertain the frequency down to 1%. What we can do now is to acknowledge the inconsistency. Anyway, your edits are more accurate. Thanks. Heng Li (bio) (talk) 15:34, 23 April 2021 (UTC)

Wiki Education assignment: PHMD 2040 Service - Learning Spring 2023
— Assignment last updated by JustinxLane (talk) 19:04, 2 June 2023 (UTC)

Citation Added
Citation added regarding the redundancy of the genetic code and citation added supporting the difference between types of SNPs mutations. Statistic updated about SNPs within the population accompanied by appropriate reference and citation --Monoclonalantibodies (talk) 03:15, 22 March 2023 (UTC)