Talk:Non-coding DNA/Archive 1

New Scientist article
There's an article in this week's New Scientist 'Junk' DNA gets credit for making us who we are which should be useful for references. Richerman (talk) 02:11, 21 March 2010 (UTC)

Junk DNA
Regarding this paragraph: "A large amount of sequence in these genomes falls under no existing classification other than "junk". For example, one experiment removed 0.1% of the mouse genome with no detectable effect on the phenotype. This result suggests that the removed DNA was largely nonfunctional."

I object to the suggested conclusion that the removed DNA was largely nonfunctional. Although I would not go the the opposite extreme and declare that all junk DNA is functional unless proven otherwise, it is very difficult to exclude all potential functions experimentally.

Even if the 0.1% removed DNA is not normally functional, it may still provide survival value in unusual situations and natural environments. For example:

1. Part of the removed noncoding DNA may cause increased production of an enzyme that detoxifies a protein in some food the mature female mouse can tolerate but is lethal to a fetus. Since this toxic protein was not in the labchow, this possibility was not excluded by the experiment. 2. Perhaps the removed noncoding DNA may cause altered behavior during exposure to environmental stresses that would cause infertility in mice that fail to alter their behavior. This would not be detected by protocols that do not include such stresses, or prevent such altered behavior, or do not test for effects on fertility. 3. Perhaps some of the noncoding DNA may provide antigen-like templates that enhance antibody immunity to pathogens that mice frequently encounter in a garbage pit, but not in a clean lab. 4. Similarly, perhaps some of the noncoding DNA provided antigen-like templates that cause colon cells to recognize normal flora as friendly and exempts them from immune attack. I'm sure you can think of dozens of possibilities that were not excluded experimentally. Greensburger (talk) 05:30, 25 April 2010 (UTC)

Merge request
Set up merge request. "Junk DNA" is a non-scientific throwaway term that should be avoided. All that can be said is that this stuff does not get translated into proteins and in many - though quite likely not even "most" - cases, apparently also not transcribed into RNA, hence, it is non-coding. Anything beyond that is not based on evidence (though lack of a function is hard to prove in the current state of knowledge and research technology), and while some apparently randomly-evolving sequences indeed do not seem to have any function at all, the question of whether this stuff might have some function is only seriously being researched since maybe 2000; essentially it were Gregory's C-value enigma papers that kicked it off - and in astoundingly many cases where research is being conducted, non-coding DNA does appear to have some sort of function, namely in gene regulation (it appears that the largest part of the "regulatory" code of the genome is contained in some way or another in noncoding sequences). See also Non-coding RNA, Science: 632-635, and the recent reviews by Biemont/Vieira and Willingham/Gingeras , and recent Takifugu and gulper eel publications. A possible way to merge would be to copy/paste, rm "junk" with "noncoding" as appropriate and add an intro statement like: "Noncoding DNA, colloquially also referred to as 'junk DNA' (but see below), ..." Dysmorodrepanis 12:15, 4 December 2006 (UTC)


 * It was discussed before,, on Talk:Junk DNA and most people disagreed. You should reply to arguments there if you have new idea. --Baldzac 15:14, 4 December 2006 (UTC)

The regions of the code we're talking about are so vast, that some will always be described as "junk". This term was coined by the leading researchers, and is just as "scientific" as any other term. Do not merge. —Preceding unsigned comment added by 205.118.81.24 (talk) 21:56, 23 October 2007 (UTC)
 * "Coined" yes, "used today" no. Do a PubMed search and

Dysmorodrepanis (talk) 10:29, 12 December 2007 (UTC) -> see Talk:Junk DNA


 * Junk DNA refers to the 98% of DNA of unknown function, non-coding DNA is a subset of junk DNA.

Speh 12:29, 16 January 2008 (UTC)

Simply because a portion of DNA does not code for anything (non-coding) does not mean that it has no known function (junk DNA). enhancers, promoters, operators, and many other sequences that code for nothing (produce no RNA) have functions. Increasing or decreasing the rate at which genes are depending on what proteins are binding, but these sequences themselves do not code for anything. On the other hand, junk DNA could code for something, it could code for an RNA that hasn't yet been discovered or a protein that similarly hasn't been discovered. Junk DNA simply indicates that the DNA's purpose has no known function, non-coding is a statement that the DNA has some function that doesn't produce RNA or protein. The line, I would have to admit is hazy, but I think that there's enough of a line that I would recommend not merging. —Preceding unsigned comment added by 12.216.237.235 (talk) 05:06, 25 February 2008 (UTC)


 * Wow. How can Wikipedia get something so wrong for such a long time? Wiki politics will ensure that it'll never be corrected either. Non-coding DNA is entirely different from junk DNA. Merging these articles is simply wrong and misleading. Get it fixed. —Preceding unsigned comment added by 88.104.237.107 (talk) 10:57, 17 June 2010 (UTC)

For example, one experiment removed 0.1% of the mouse genome...
1% changed to 0.1% Paper abstract says "We deleted two large non-coding intervals, 1,511 kilobases and 845 kilobases in length, from the mouse genome". It's 2.3Mbp in total. Mouse genome length is ca. 2.5-2.7Gbp ( http://www.informatics.jax.org/mgihome/other/mouse_facts1.shtml ), thus only 0.1% was deleted. 89.251.107.8 (talk) 16:18, 29 August 2010 (UTC)

non-coding RNA
The article defines non-coding DNA as DNA which does not code for protein. If you are following this definition, then you have made some rather significant omissions (e.g. tRNA, snRNA, rRNA, telomerase RNA, etc). —Preceding unsigned comment added by 24.65.85.66 (talk) 02:53, 16 October 2010 (UTC)


 * Non-coding RNA has its own article...Not all non-coding DNA produces RNA (or non-coding RNA), so they're separate topics. Then again, most of the topics you list are discussed here, with links to individual articles. Did you bother reading the page? &mdash; Scientizzle 03:55, 16 October 2010 (UTC)

Junk Dna reins exposed:
The nice thing about scanning articles is that sometimes something really sticks. Ergo:

In a revealing experiment, a mouse was bioengineered to be deficient in one essential growth factor. It was kept alive by the inclusion of the missing 'element' in it's diet. After three generations an offspring produced AN ANCIENT FORM of this 'missing' enzyme that had and has THE ABILITY TO PRODUCE THE MISSING COFACTOR {Q. and this came from where? A. Junk Dna}.

When this can happen for this 'one [isolated] example', the same applies to all of the other 50,000 enzyme systems; meaning that the purpose of the content of Junk Dna is now clear, and that the phrase non-coding is at best erroneous, and at worst misleading. ~ Best Regards Betaclamp (talk) 06:38, 16 January 2011 (UTC)


 * Your conclusion is erroneous on a few levels. First, it is a hasty generalization to extrapolate the results of one study on one enzyme to the genetics of every enzyme. [Also, I couldn't find a citation for this study...care to share?] Secondly, while much of what is often called "junk DNA" (erroneously, and most commonly by the lay press who won't let the term die) is actually genetic information of as-yet-undetermined function, some of it does likely meet any reasonable definition of "junk". There are sequences (e.g., pseudogenes) that are without present function but can serve as the raw materials for future evolution--they might be imagined as the junk that was just thrown into the recycle bin and may serve future use! A huge portion of the genome, however, is made up of viral and mobile elements that, for the organism harboring said genome, very often do fit within the definition of DNA that provides "little specificity and conveys little or no selective advantage to the organism", as stated in the article.
 * In reply to "the phrase non-coding is at best erroneous, and at worst misleading", I have to object strongly. There's a rather simple delineation between a coding sequence and a non-coding sequence: the former produces a protein product, the latter does not. Many examples exist of important gene products that lack a protein endpoint (see noncoding RNA). Through mutation, a non-coding sequence can become a coding sequence (particularly those psuedogenes that have had an inactivating mutation reversed). "Non-coding" should not be inferred to mean "never capable of resulting in a coding sequence following selective pressure" and I don't think such an interpretation actually flows from the material on this page...if so, please point out the offending passages. &mdash; Scientizzle 16:27, 17 January 2011 (UTC)

20% verses 80%
"About 80 percent of the nucleotide bases in the human genome may be transcribed" but the reference article claims that about 90% of the transcribed yeast genome is non-coding, which translates to (100%-98%)/(100%-90%) = 20% of the human genome may be transcribed, not 80%. Mollwollfumble (talk) 16:37, 5 April 2011 (UTC)

ENCODE project
This page is probably 'junk' now lol needs some serious revision re ENCODE project....

Jinx69 (talk) 03:28, 7 September 2012 (UTC)


 * Roger that. Quoting from ENCODE Project Writes Eulogy for Junk DNA; Elizabeth Pennisi; Science 7 September 2012: Vol. 337 no. 6099 pp. 1159-1161 DOI: 10.1126/science.337.6099.1159:

"'This week, 30 research papers, including six in Nature and additional papers published by Science, sound the death knell for the idea that our DNA is mostly littered with useless bases. A decadelong project, the Encyclopedia of DNA Elements (ENCODE), has found that 80% of the human genome serves some purpose, biochemically speaking. “I don't think anyone would have anticipated even close to the amount of sequence that ENCODE has uncovered that looks like it has functional importance,” says John A. Stamatoyannopoulos, an ENCODE researcher at the University of Washington, Seattle.' and" "ENCODE drives home, however, that there are many “genes” out there in which DNA codes for RNA, not a protein, as the end product. The big surprise of the pilot project was that 93% of the bases studied were transcribed into RNA; in the full genome, 76% is transcribed. ENCODE defined 8800 small RNA molecules and 9600 long noncoding RNA molecules, each of which is at least 200 bases long. Thomas Gingeras of Cold Spring Harbor Laboratory in New York has found that various ones home in on different cell compartments, as if they have fixed addresses where they operate. Some go to the nucleus, some to the nucleolus, and some to the cytoplasm, for example. “So there's quite a lot of sophistication in how RNA works,” says Ewan Birney of the European Bioinformatics Institute in Hinxton, U.K., one of the key leaders of ENCODE"


 * Someone who understands the topic well enough to accurately preserve the historical sense of Junk DNA while correcting this article needs to have a go... Cheers - Williamborg (Bill) 17:50, 10 September 2012 (UTC)
 * I removed a good portion of this because its presentation mistakenly conflates "junk DNA" with noncoding DNA. I will continue to work in relevant pieces of the ENCODE findings. &mdash; Scientizzle 20:44, 10 September 2012 (UTC)
 * Appears to walk the middle while resolving some of the misimpressions this article created.
 * Appears to walk the middle while resolving some of the misimpressions this article created.
 * Appears to walk the middle while resolving some of the misimpressions this article created.

Keep up the good work - Williamborg (Bill) 04:22, 12 September 2012 (UTC)

Addition of Silencers
I added Silencers as a transcription factor site. I formatted it along with the other regulatory region sections. Clucaj (talk) 01:58, 26 April 2013 (UTC)

Misinterpretation of ENCODE?
I know nothing about the subject, but reading online I see many claims that the popular media's representation of ENCODE's discoveries is skewed. The article's current version seems to mirror the media's. Can someone knowledgeable in this subject review this? Ratzd&#39;mishukribo (talk) 20:12, 10 September 2012 (UTC)


 * Working on it. Many editors have conflated noncoding DNA with "Junk DNA". The former describes all DNA sequences that do not encode for a protein product, whether the sequences have a known function or not. The latter is largely a media-driven term that refers to the ever-decreasing proportion of the genome with no known--and no likely--function in the genetic regulation of an organism. There still is "junk DNA" to be sure: endogenous retroviruses, for example, are "junk" by any reasonable interpretation of the word. That the vast majority of noncoding DNA has known or likely regulatory functions, though, isn't really a surprise to modern geneticists. &mdash; Scientizzle 20:43, 10 September 2012 (UTC)
 * From Talk:ENCODE, a good link discussing this problem: Most of what you read was wrong: how press releases rewrote scientific history. Repeating myths may make good stories, but it breeds confusion. See the ENCODE news, John Timmer, Ars Technica, September 10 2012 &mdash; Scientizzle 21:27, 10 September 2012 (UTC)
 * Another similar link: Special AMA: The Encyclopedia of DNA Elements (ENCODE) Consortium on reddit. I like this quote from one of the ENCODE researchers, Michael Hoffman, task group chair (large-scale behavior) and a lead analyst (genomic segmentation):"Part of the problem is that there are multiple definitions of 'biological function' and 'junk DNA.' Things that are 'functional' under some definitions are 'junk' under others. It's especially worth noting that the definition of 'junk DNA' used most often by the public and even by most scientists is different from the original definition used by evolutionary biologists. Under one definition ('reproducible biochemical activity'), the ENCODE Project Consortium found that 80% of the genome had function. If you use a definition based on looking only at regions of the genome under purifying selection, you might get as little as 5%. I feel like these are upper and lower bounds and that any other definition will be somewhere in the middle depending on what their threshold for 'function' is. TL;DR: Between 5% and 80%, depending on how you define function."This was in response to a question from Larry Moran, who has been very critical of some of the ENCODE reporting. &mdash; Scientizzle 21:38, 10 September 2012 (UTC)

There are two sections at Talk:ENCODE rather relevant to this discussion. &mdash; Scientizzle 14:07, 11 September 2012 (UTC)


 * Thanks! Ratzd&#39;mishukribo (talk) 17:56, 11 September 2012 (UTC)
 * ENCODE's theories have been debunked. They made the error of y failing to appreciate the crucial difference between "junk DNA" and "garbage DNA".
 * ENCODE's theories have been debunked. They made the error of y failing to appreciate the crucial difference between "junk DNA" and "garbage DNA".

http://gbe.oxfordjournals.org/content/early/2013/02/20/gbe.evt028.short?rss=1 QuentinUK (talk) 12:24, 24 February 2013 (UTC)


 * Transcribing DNA to RNA consumes energy. Moving the transcriber proteins along the chromosomes consumes energy. Opening and closing the DNA double (or sometimes triple) helix consumes energy. Large amounts of "noise" activity is energetically absurd. And what about the missing inheritance? The argument used against ENCODE, that there would be too many fatal mutations if most DNA was not unnecessary, thus only shows that there must be function-sensitive genetic correction mechanisms multiple times more efficient than natural selection (genetic diseases cured by the organisms themselves by natural self-splicing outnumbering those eliminated by natural selection several to one).

95.209.69.116 (talk) 16:27, 6 May 2013 (UTC)

Unsourced statement in introduction
The Encyclopedia of DNA Elements (ENCODE) project[1] reported in September 2012 that over 80% of DNA in the human genome "serves some purpose, biochemically speaking".[2] This claim has, however been criticized by some in the genomics community.

No source. 184.153.187.119 (talk) 02:39, 15 October 2012 (UTC)


 * Now it seems the lead has been replaced with this:
 * "This conclusion however is strongly criticized by other scientists.[4][5]"
 * "other scientists" suggest criticism is widespread, when it looks like there is only a small handful of dissenters.
 * The ENCODE Project was researched by many more scientists than the 7 or 8 of article linked to. There are endless examples of biochemical function in diverse non-coding regions all throughout the scientific literature.  This antiquated idea that most or any of the genome is still 'junk' seems to be much more of a Fringe view, and thus it is questionable whether their views hold any weight to be mentioned in the article.  184.153.187.119 (talk) 16:35, 24 February 2013 (UTC)
 * My impression is that there are large numbers of scientists who think that the ENCODE report overstated its results to some degree. They showed that over 80% of the genome is transcribed, but in the absence of evidence that the resulting RNA is biologically active, they didn't really show that all of this serves some purpose. Looie496 (talk) 16:51, 24 February 2013 (UTC)
 * See http://gbe.oxfordjournals.org/content/early/2013/02/20/gbe.evt028.short?rss=1, full paper, free download, for a source QuentinUK (talk) 18:33, 24 February 2013 (UTC)
 * It seems that there are only a few scientists who disagree. I think the statement should be rephrased because it suggests wide-spread criticism when that's not the case. The criticism seems to come from the six scientists in the article above. ENCODE had 442 scientists working on the project. So how is a response from 6 scientists suddenly deserving of more weight? It certainly needs to be rephrased. --86.21.101.169 (talk) 14:30, 15 March 2014 (UTC)
 * My impression is that there are large numbers of scientists who think that the ENCODE report overstated its results to some degree. They showed that over 80% of the genome is transcribed, but in the absence of evidence that the resulting RNA is biologically active, they didn't really show that all of this serves some purpose. Looie496 (talk) 16:51, 24 February 2013 (UTC)
 * See http://gbe.oxfordjournals.org/content/early/2013/02/20/gbe.evt028.short?rss=1, full paper, free download, for a source QuentinUK (talk) 18:33, 24 February 2013 (UTC)
 * It seems that there are only a few scientists who disagree. I think the statement should be rephrased because it suggests wide-spread criticism when that's not the case. The criticism seems to come from the six scientists in the article above. ENCODE had 442 scientists working on the project. So how is a response from 6 scientists suddenly deserving of more weight? It certainly needs to be rephrased. --86.21.101.169 (talk) 14:30, 15 March 2014 (UTC)
 * It seems that there are only a few scientists who disagree. I think the statement should be rephrased because it suggests wide-spread criticism when that's not the case. The criticism seems to come from the six scientists in the article above. ENCODE had 442 scientists working on the project. So how is a response from 6 scientists suddenly deserving of more weight? It certainly needs to be rephrased. --86.21.101.169 (talk) 14:30, 15 March 2014 (UTC)

Historical revisionism
There is a significant amount of bias and revisionsim in statements about junk DNA.

"Initially, a large proportion of noncoding DNA had no known biological function and was therefore sometimes referred to as "junk DNA", particularly in the lay press. However, it has been known for decades that many noncoding sequences are functional. "


 * 1) This was not "particularly in the lay press". I clearly remember scientific journals and biologists calling it junk DNA.
 * 2) Decades? too imprecise. I think this it ins only since 2000 that people have started to reevaluate the concept of Junk DNA.

"The term is used mainly in popular science and in a colloquial way in scientific publications and it has occasionally been suggested that its connotations may have delayed interest in the biological functions of noncoding DNA."

Again, this article is being revisionist and misleading. Back in the 1980s everyone believed that it was junk DNA and thus would not bother to really investigate it. There would be no research money for doing something like this. It is misleading to characterise this as something that non-scientists are doing to come up with a cute word that doesn't reflect the science community. It was indeed the term used by geneticists. Lehasa (talk) 14:14, 24 July 2014 (UTC)
 * Regarding your first point, "particularly in the lay press" and "I saw it one or more times in a journal" are not mutually exclusive. Decades is imprecise I agree, but the article cites (for example) this 1980 paper discusses a type of function for ncDNA, I expect 3.5 decades falls within most people's expectation for decades. I don't doubt the term "junk DNA" has in the past and continues to be used by geneticists. benmoore 18:50, 24 July 2014 (UTC)

Gregory and Comings
Since functional non-coding DNA was described more than a decade before junk DNA was used for the first time, we can confidently say that no one knowledgeable has ever intentionally equated all non-coding DNA with junk DNA. Comings made it very clear in 1972 that he considered junk DNA a (80+%) subset of non-coding DNA (a pretty good estimate!) and I cannot figure how Gregory came to the conclusion that he used it for all non-coding DNA in that article. Quoting Gregory here is quote mining and gives undue weight to one person's misreading. Ramos1990 wants to further imply that Gregory himself equated non-coding DNA with junk DNA in his 2005 book, by skipping "junk DNA [is] an inappropriate moniker for noncoding DNA in general" and quoting something later in that paragraph: "However, dismissing it as no more than "junk" in the pejorative sense of "useless" or "wasteful" does little to advance the understanding of genome evolution. For this reason, the far less loaded term "noncoding DNA" is used throughout this chapter and is recommended in preference to "junk DNA" for future treatments of the subject". He merely recommended avoiding the term. Try replacing "non-coding DNA" with "bird" and "junk DNA" with "chicken": "However, dismissing it as no more than "chicken" in the pejorative sense of "yellow-belly" or "milquetoast" does little to advance the understanding of fowl psychology. For this reason, the far less loaded term "bird" is used throughout this chapter and is recommended in preference to "chicken" for future treatments of the subject." Oddly enough, Gregory changed his mind on using the term and co-wrote an article The case for junk DNA" in 2014 (currently reference 10). Afasmit (talk) 21:05, 10 June 2015 (UTC)
 * Attribution resolves the matter because the source explicitly says it and it puts the weight on the person who made the claim. There is no undue weight issue here. You did not quote the whole phrase, which included the context, when you thought that I skipped something. Gregory does credit Ohno's early usage of Junk DNA to be on pseudogenes and because pesudogenes are only a small part of the genome he says it is "inappropriate" to use the term "junk DNA" in that context. He says "Not only is "junk DNA" an inappropriate moniker for noncoding DNA in general because of the minority status of pseudogenes within the genome sequences, but it also has the unfortunate consequence of instilling a strong a priori assumption of total nonfunction." This only refers to Junk DNA as defined by Ohno here. What is quite relevant and clear is what Gregory says at the beginning of the section, "In this era of front-page genome biology, the term "junk DNA" has become a popular descriptor for noncoding DNA among both scientists and the informed laity." (p.29) which highlights the current understanding of what Junk DNA generally refers to and his own recommendation at the end of the whole section "for future treatments of the subject."(p.31) to use the less loaded term "noncoding DNA" instead of "Junk DNA" (whole phrase already quoted). Why would he be explicitly recommending this term switching at the end of the junk DNA section if he meant something else, as apparently you are thinking? Furthermore, even in Gregory's new article of the case of Junk DNA he does specify that junk DNA referred to noncoding DNA: "These sequences [pseudogenes] are what Ohno initially referred to as “junk” [13], although the term was quickly extended to include many types of noncoding DNA [15]." On top of that others have understood Junk DNA to be referring to noncoding DNA. Nessa Carey who has written a whole book on Junk DNA (Junk DNA: A Journey Through the Dark Matter of the Genome. Columbia University Press. 2015) is a recent example.
 * Early scientists used junk DNA in different ways (e.g. Ohno and Comings had different views) and since it was often done in very informal and sloppy ways then there is bound to be some differences in how later researchers like Gregory or Nessa or others interpret the earliest uses of the term. If all you are concerned about is the "all nocoding DNA", then perhaps we can change the wording to "the majority of noncoding DNA" or "most noncoding DNA" or simply to "generally noncoding DNA". We as editors can only follow what the reliable sources say on the issue and attribute to who made the claim. What you or I think is wrong in what a source says is irrelevant to Wikipedia because Wikipedia is an encyclopedia and it includes diverse viewpoints, even ones one dislikes or disagrees with. It is not our position to psychologize why a source says what we don't like. The fact that it says it and is reliable, per wikipedia's requirements, is enough. Now, unless another source does the psychologizing, then that can be cited to offer other perspectives. Mayan1990 (talk) 02:15, 11 June 2015 (UTC)
 * Early scientists used junk DNA in different ways (e.g. Ohno and Comings had different views) and since it was often done in very informal and sloppy ways then there is bound to be some differences in how later researchers like Gregory or Nessa or others interpret the earliest uses of the term. If all you are concerned about is the "all nocoding DNA", then perhaps we can change the wording to "the majority of noncoding DNA" or "most noncoding DNA" or simply to "generally noncoding DNA". We as editors can only follow what the reliable sources say on the issue and attribute to who made the claim. What you or I think is wrong in what a source says is irrelevant to Wikipedia because Wikipedia is an encyclopedia and it includes diverse viewpoints, even ones one dislikes or disagrees with. It is not our position to psychologize why a source says what we don't like. The fact that it says it and is reliable, per wikipedia's requirements, is enough. Now, unless another source does the psychologizing, then that can be cited to offer other perspectives. Mayan1990 (talk) 02:15, 11 June 2015 (UTC)

Disagreement regarding junk DNA
Yesterday made an edit to the Junk DNA section with edit summary better attribute; the edit was reverted by  with edit summary Revert typical POV edit. I am reverting back, because it is clear that the edit by Ramos1990 is not a "POV edit", but rather improves the neutrality of the section by providing explicit attribution for views that are not universally held. Looie496 (talk) 13:41, 9 November 2015 (UTC)

Biochemistry professors who still use "junk DNA"
Dr. Larry Moran still claims (Jan 20, 2016) that most DNA is junk DNA. Sadly, he harshly criticizes another biochemist who is actually looking into whether some non-coding DNA is really useless. He actually seems to contradict himself. Maybe someone can clarify it. (i) "No knowledgeable scientist ever claimed that all noncoding DNA was junk" vs (ii) "I'm very confident that it won't change the fact that 90% of our genome is junk."

It's really surprising that after all we know about the functions of non-coding DNA as well as the results of ENCODE, there are still scientists whose thinking hasn't evolved along with our discoveries. The only reason that I'm actually posting this to the Talk page is as an illustration that there are still a number of scientists (well, at least one) who are still on the junk-DNA bandwagon. Lehasa (talk) 18:04, 24 January 2016 (UTC)
 * Larry is stating the near-consensus opinion of people who study molecular evolution. I've glanced over his text and I don't see anything that amongst his colleagues (which includes me) would be considered controversial. If there is a fault, his choice of picture may have been intended to make her look lightweight. He even gave her the benefit of the doubt with her "88% of changes to the genetic code that correlated with the disease were found in the junk DNA.", which, as one commenter pointed out, are variants usually just linked to the mutations actually having an impact on the disease.
 * There is no contradiction in the obviously true statement "No knowledgeable scientist ever claimed that all noncoding DNA [i.e. 98%+ of the genome] was junk" and "I'm very confident that it won't change the fact that 90% of our genome is junk.", though I personally wouldn't be surprised if the true number is as little as 87% or so. Much of the confusion lies in the unfortunate term "non-coding DNA", which doesn't mean that it doesn't code for functional information, but only that it doesn't code for a protein sequence. As another commenter pointed out, Graham (and probably you as well) simply conflate "non-coding DNA" with "junk DNA". It is quite likely that a scientist zapped over from the 1970s would be surprised that as much as 8-10% of non-protein-coding DNA is functional, but Ohno's concept does not suffer from that.
 * Though faulted for poor wording and emphasis in the abstract, the ENCODE project paper never claimed the demise of junk DNA. That was a painful misinterpretation of an editorial staff writer who wrote "ENCODE Project Writes Eulogy for Junk DNA" accompanying the paper. There is no "junk-DNA bandwagon", but there clearly is a bandwagon playing the tune that our entire genome is functional. The wagon is largely populated with creationists (incl. intelligent designers), which you'll find as always joining the linked discussion. They're hatching their bets though, as one dropped the notion that any amount of junk DNA isn't a surprise to him since "we might expect that in our fallen state"... Afasmit (talk) 02:31, 25 January 2016 (UTC)
 * Thanks. I'm still not entirely clear about all of this. So non-coding DNA is not junk-DNA (it could make tRNA for example) - correct? So why they does Moran say "I'm very confident that it won't change the fact that 90% of our genome is junk.", instead of saying that 90% of our genome is *non-coding* ? How can you prove that something doesn't have a function? If someone 100 years ago found a cellphone in the desert and its battery was dead, it would look like a non-functioning rectangle of plastic and glass. They would claim that is is non-functioning. It was the same with many of our glands and the spleen - before we knew their function, people thought they were useless.  So I don't understand how someone can claim that they KNOW that some part of the genome has no function and is junk. One test that it would have to pass is just remove all of the junk DNA from an organism and see if it is still viable. Has anyone done this to prove their theories? Lehasa (talk) 23:46, 25 January 2016 (UTC)
 * You're correct that non-coding DNA ≠ junk-DNA. And saying "90% of our genome is non-coding" would not convey the same meaning, as "non-coding" is almost always interpreted as "non-protein-coding".
 * There are many lines of evidence that make non-functionality of most of the genome the obvious null-hypothesis. This means that you have to prove that it is functional, because non-functionality is expected; for body organs it is the other way around; why would one presume them to be useless? Among the lines of evidence are the mutational load problem, the C-value paradox and the fact that we have shown that most of our genome is made up of the decaying carcasses of transposable elements (bits of DNA that survive by constantly making copies of themselves elsewhere in a genome). You're cellphone analogy is apt in this respect, though someone finding a cellphone 100 years ago (sic) would have another thought altogether probably;-). It's like the litter that transposable elements leave behind; it had a function once (not for the dessert/cell, but for the person/transposable element who'd dropped it there), but now is just taking up space until it erodes away and looks like any other bit of dessert. It's junk as far as the dessert is concerned.
 * Experiments removing large chunks of genomes have indeed been done, both in the lab and, more convincingly, in nature. The lab experiments cannot remove all junk DNA, because we don't know yet where all the functional bits are (they're scattered throughout). And there is also the problem that when people showed, for example, that mice artificially lacking huge chunks of presumed non-functional DNA appeared perfectly healthy, they couldn't prove that the removed DNA doesn't give a wild-type mouse just a slight edge under some circumstances. On the other hand, nature, having more time and less empathy, has done very thorough experiments. We can point at creatures like the pufferfish or bladderwort that have genomes ~10 times smaller than a few dozen million years ago (all closely related species are larger, indicating that that is the ancestral state; we're reconstructing a lot of those ancestral genomes in silico these days). Also, the number of 8-15% functional DNA in our genome is deduced from looking at where mutations have accumulated at random (i.e. without any indication of selection for or against them). These studies have been done very carefully, for example taking into account that some functionality arose after our speciation from chimpanzee, and 15% functionality really is the very upper bound. Afasmit (talk) 04:47, 26 January 2016 (UTC)

Minor editing help needed

 * 1) I don't know how to add "et al" to my citation in the Genetic switches section. I am not that good with references. Can someone fix it?
 * 2) Lead paragraph: "a term that has elicited strong responses over the years" -- this is meaningless. What sort of responses? Obviously some pro and some con. It would be better to say that "junk-DNA" is on the whole being replaced with "non-coding DNA" although the former term is still used a lot.
 * 3) Grammar: opening paragraphs: "and making the complexity of species." This is poor grammar. How do you make a complexity of one or more species. It should be rewritten to say what it is intended to. Unfortunately, I don't really know, so I can't help.

Thanks. Lehasa (talk) 23:53, 21 February 2016 (UTC)


 * I can fix the reference, but I'm not comfortable with the sentence it is used for: For example a long non-coding RNA (lncRNA) molecule was recently found to assist in preventing breast cancer, by stopping a genetic switch from getting stuck. The phrase "stopping a genetic switch from getting stuck" is incomprehensible as written. Looie496 (talk) 00:03, 22 February 2016 (UTC)
 * Yes. Good point. Maybe someone who understands the genetics better than I do could fix it. It's just really cool that an lncRNA is able to affect the occurence of breast cancer. Lehasa (talk) 00:50, 23 February 2016 (UTC)
 * I removed that sentence 3 "and making the complexity of species" before seeing your criticism. It seemed garbled to me too. Indeed, perhaps it's bad form to say, but it looks like something inserted by a creationist. In fact the whole article smacks of attempts to big-up ENCODE's implications from such a perspective. I also removed a sentence that argued that ENCODE had changed our view of what fraction was protein coding. Of course, it didn't. In general, I find the whole article somewhat bitty. Too many editors, with too many perspectives. Yes, it's Wikipedia, I know! 163.156.213.193 (talk) 13:07, 2 March 2016 (UTC)
 * I clarified one of the meanings of "complexity of species". Some of the sources note that noncoding DNA contributes to the sophistication of any species since many eukaryotes have 70-90% noncoding DNA. Another point they make is that noncoding DNA are quite involved complex networks of isolated genetic activities that affect the organism overall, such as in the case of the breast cancer example mentioned earlier in the thread.Mayan1990 (talk) 19:11, 2 March 2016 (UTC)
 * I'm not convinced that there is sufficient evidence that the 'sophistication of a species' and '70-90% noncoding' are correlated. There is no doubt that complex phenotypes involve complex regulation, and complex regulation involves noncoding DNA, but there is no reason as yet to suppose that the entirety of the noncoding fraction, or even the bulk of it, is involved in this task. 80.42.190.156 (talk) 20:17, 2 March 2016 (UTC)
 * You may not be convinced but that is what the sources discuss, one of which is Nessa Carey's book on the topic of Junk DNA and others too such as Kevin Morris's text. On top of that no one is making a claim on "entirety of the noncoding fraction, or even the bulk of it". That is a quantitative claim not found in the article and in fact the functions and lack thereof are still being debated in research (probably will not end for a very long time considering the size of genomes and the amount of activity in cells per organism - not an easy task at all). I have noticed that many seem to read more into the article than is necessary, especially in light of older methods (comparative genomics) yielding lower estimates while newer methods (empirical and chemical) are yielding different estimates. In wikipedia, we can only cite what the relevant sources say on the issue.Mayan1990 (talk) 20:39, 2 March 2016 (UTC)
 * If you are going to insert the phrase "complexity of species" you need to define it. It still seems so vague or ill-defined as to be meaningless to me. Lehasa (talk) 21:50, 3 March 2016 (UTC)  P.S. I do think that the article is slowly improving. Yay!
 * "Indeed, perhaps it's bad form to say, but it looks like something inserted by a creationist. In fact the whole article smacks of attempts to big-up ENCODE's implications from such a perspective." I would just comment that who says something is not really relevant to the content. On the other hand, it is really important to check for NPOV and do one's best to guard against bias and hidden agenda. About the Encode, stuff, I need to look at it more, but it seems that ENCODE really will force some changes in how people combine evolution and genetics. It's not a small experiment, it's a surprising and potentially revolutionary finding.  Lehasa (talk) 21:57, 3 March 2016 (UTC)
 * I'm not convinced that there is sufficient evidence that the 'sophistication of a species' and '70-90% noncoding' are correlated. There is no doubt that complex phenotypes involve complex regulation, and complex regulation involves noncoding DNA, but there is no reason as yet to suppose that the entirety of the noncoding fraction, or even the bulk of it, is involved in this task. 80.42.190.156 (talk) 20:17, 2 March 2016 (UTC)
 * You may not be convinced but that is what the sources discuss, one of which is Nessa Carey's book on the topic of Junk DNA and others too such as Kevin Morris's text. On top of that no one is making a claim on "entirety of the noncoding fraction, or even the bulk of it". That is a quantitative claim not found in the article and in fact the functions and lack thereof are still being debated in research (probably will not end for a very long time considering the size of genomes and the amount of activity in cells per organism - not an easy task at all). I have noticed that many seem to read more into the article than is necessary, especially in light of older methods (comparative genomics) yielding lower estimates while newer methods (empirical and chemical) are yielding different estimates. In wikipedia, we can only cite what the relevant sources say on the issue.Mayan1990 (talk) 20:39, 2 March 2016 (UTC)
 * If you are going to insert the phrase "complexity of species" you need to define it. It still seems so vague or ill-defined as to be meaningless to me. Lehasa (talk) 21:50, 3 March 2016 (UTC)  P.S. I do think that the article is slowly improving. Yay!
 * "Indeed, perhaps it's bad form to say, but it looks like something inserted by a creationist. In fact the whole article smacks of attempts to big-up ENCODE's implications from such a perspective." I would just comment that who says something is not really relevant to the content. On the other hand, it is really important to check for NPOV and do one's best to guard against bias and hidden agenda. About the Encode, stuff, I need to look at it more, but it seems that ENCODE really will force some changes in how people combine evolution and genetics. It's not a small experiment, it's a surprising and potentially revolutionary finding.  Lehasa (talk) 21:57, 3 March 2016 (UTC)
 * If you are going to insert the phrase "complexity of species" you need to define it. It still seems so vague or ill-defined as to be meaningless to me. Lehasa (talk) 21:50, 3 March 2016 (UTC)  P.S. I do think that the article is slowly improving. Yay!
 * "Indeed, perhaps it's bad form to say, but it looks like something inserted by a creationist. In fact the whole article smacks of attempts to big-up ENCODE's implications from such a perspective." I would just comment that who says something is not really relevant to the content. On the other hand, it is really important to check for NPOV and do one's best to guard against bias and hidden agenda. About the Encode, stuff, I need to look at it more, but it seems that ENCODE really will force some changes in how people combine evolution and genetics. It's not a small experiment, it's a surprising and potentially revolutionary finding.  Lehasa (talk) 21:57, 3 March 2016 (UTC)
 * "Indeed, perhaps it's bad form to say, but it looks like something inserted by a creationist. In fact the whole article smacks of attempts to big-up ENCODE's implications from such a perspective." I would just comment that who says something is not really relevant to the content. On the other hand, it is really important to check for NPOV and do one's best to guard against bias and hidden agenda. About the Encode, stuff, I need to look at it more, but it seems that ENCODE really will force some changes in how people combine evolution and genetics. It's not a small experiment, it's a surprising and potentially revolutionary finding.  Lehasa (talk) 21:57, 3 March 2016 (UTC)

Functionality
A woman is able to bring forth a male or female child. The ability to conceive male children remains in next generations, regardless of whether male or female children have been conceived in between. So obviously, the human genome (of both men and women) contains the data needed to build both males and females. Males and females also have different Sexual_characteristics, which also includes variations of organs both genders possess (for instance men have larger lungs, larger hearts hearts and smaller breasts than women).

So, there needs to be some sort of "genetic switch" that determines whether the human that is being build in the womb will become a male or a female. The switch would then order the 23rd chromosome to become XX or XY and will then also trigger certain organs to become the "male version" or the "female version" thereof (for instance the male version of the lungs would be the larger lungs, the female version would be smaller lungs).

Has any research been done on all this, and has this "sex trigger gene" been identified already ? (is it NR3C4 ?) Also, where is the "zipped data" (ie male or female version of lungs) found in the genome ? My guess is that it's in the Noncoding DNA (so the "junk DNA" -dna code not bound to any protein and thus inactive).

This would allow to explain the function just upto 1% of the noncoding DNA (since the other gender's version of the DNA (=coding DNA) is also but 1%, so logically, the female version of this same DNA (noncoding) would be similar in size). For references, see here. Still it's a start.

I also assume that the (many) genes that are responsible for the creation of the male organs (penis, scrotum, ...) and the female organs (vulva, labia, ...) would be located (only) in the 23rd chromosome, but that the organs that both sexes possess (breasts, lungs, heart) would be on any other chromosome (not on the 23rd chromosome). I also seem to think that rather than that all genes that make up an organ would be located next to each other (and on the same chromosome), they can actually be found over many chromosomes.

Another thought I'm having is that, if the above is true, stripping all noncoding dna would cause an organism no longer able to conceive both sexes (rather only just one). This sex will always be female I assume (as only women can conceive and the woman can then replicate only her own dna).

KVDP (talk) 08:08, 24 June 2017 (UTC)
 * Um, this isn't a forum... Chiswick Chap (talk) 10:45, 25 June 2017 (UTC)
 * I agree that this belongs more properly on the Science Reference Desk, but the answer, briefly, is that the sex trigger switch is the Y chromosome. The Y chromosome is very small in comparison to the others, and all it really contains is a set of switches that instruct the other chromosomes to build a male rather than female body -- the actual code for both types of bodies is distributed throughout the other chromosomes. Any basic book on genetics will explain all this in more detail. Looie496 (talk) 16:55, 25 June 2017 (UTC)
 * I agree that this belongs more properly on the Science Reference Desk, but the answer, briefly, is that the sex trigger switch is the Y chromosome. The Y chromosome is very small in comparison to the others, and all it really contains is a set of switches that instruct the other chromosomes to build a male rather than female body -- the actual code for both types of bodies is distributed throughout the other chromosomes. Any basic book on genetics will explain all this in more detail. Looie496 (talk) 16:55, 25 June 2017 (UTC)

Can someone fix citation No6?
The citation goes to pdf that is not available. The article is available free at http://www.sciencedirect.com/science/article/pii/S0960982212011542 Can somebody fix it? I'm not that good with the source editing. Thanks HlTo CZ (talk) 14:21, 13 July 2017 (UTC)


 * Done. To see how the edit was accomplished please use the View history tab when on the Article page (not this talk page) and from there click Compare selected revisions. WurmWoode  T   22:46, 1 August 2017 (UTC)

odd wordings
In the lead para, it says "a rising percentage is being shown to have regulatory functions". I think a less clumsy choice for the word "rising" would be "increasing." But more to the point, the sentence is not clear what the rising/ increasing is relative to. I *think* it means that it's been increasing in recent years relative to what scientists thought previously, but I'm not sure enough to make an edit.

Likewise, about 2/3 of the way down, there's a sentence "The significance of noncoding DNA mutations in cancer was explored in April 2013.[49]" I'm sure it's nice that this work was done in that month, but surely there's a better way to explain the work. At the very least, the names of the researchers would be appropriate, but better would be some kind of summary of the results. Again, I'm not enough of a biochemist (read: not at all) to know how to word the results. Mcswell (talk) 19:07, 14 December 2018 (UTC)

Appropriate references
A lot has been written about junk DNA but not all of it is high quality scientific information. I propose that we drop the reference to Nessa Carey's book because it does not contribute to the discussion. It is deeply flawed and extremely biased. We should not be referencing material that is scientifically incorrect. Here's a review of Carey's book.

https://www.nature.com/articles/520615a

We should also be cautious about referencing articles by science writers who are not experts on the subject. This includes newspaper articles and commentaries in major science journals. I'm thinking especially of articles by Elizabeth Pennisi in Science. She is not a scientist and she is certainly not an expert on genomes and junk DNA. Much of what she writes is either wrong or misleading. If you agree with her point of view then you should be able to find papers in the scientific literature that support your view. — Preceding unsigned comment added by Genome42 (talk • contribs) 23:41, 14 May 2022 (UTC)
 * Sorry but that is not how wikipedia works. Wikipedia is not a source for truth or what you think is true. It is a collection of multiple views published under reliable sources (publishers that do regular editorial oversight like peer review). "Science" is a scientific journal and "Columbia University Press" is an academic publisher. Both of the works of Pennisi and Carey are reliable sources per WP:RS since they meet the criteria. If you find other sources then they can be juxtaposed by attribution such as "According to Carey...." and the "According to [person with a different view] ....." and leave it at that. Nathaniel Comfort's piece is about the metaphorical discussion of junk DNA in a few books. But that is more about the debate on junk DNA than about a study on junk DNA. Is there a branch in genomics on junk DNA specifically? Is it just a philosophical name game among scientists? Scientists have different ways to understand these things - so there are multiple opinions on it in reliable sources like Science, Nature, etc. Hope this helps.Ramos1990 (talk) 23:54, 14 May 2022 (UTC)
 * This discussion illustrates the danger of having Wikipedia edited anonymously. How is the reader to judge the balance between the opinions of unknown editors with unknown qualifications and those of scientists who know what they are talking about: Joe Felsenstein, Larry Moran, Nathaniel Comfort and others, including, most notably, Sydney Brenner? You say "both of the works of Pennisi and Carey are reliable sources..." Well, Pennisi is primarily a journalist and Carey is primarily an administrator with extremely few original publications: do you really think their views carry as much weight as those of Sydney Brenner? In summary, I don't agree that Genome42's contributions should be censored. I've looked at the history, and I don't see any cogent reasons advanced for reverting these. Athel cb (talk) 07:21, 15 May 2022 (UTC)
 * There is no censorship on this end. Censorship is usually when someone removes a reliable source. I was restoring them, not removing them since both have been there for years with no issues. That claims above that Nessa is not a scientist or is unscientific is easily falsified by looking her up. She even did stuff for the Royal Society of Chemistry, so it looks like so she is not some amateur. And the complaint on Pennisi also is weird since she does work for the journal Science and has a relevant background. If she was so bad as the editor above claims, why has she been doing such extensive work for Science since 1996? The news pieces from such journals get strong editorial oversight from experts in the field. Here a similar source cited in the article right now . No one seems to be complaining about this one or its author Ewen Callaway. You cannot pick and choose. Both of these are reliable sources and they are secondary, which is what wikipedia recommends over primary sources when possible.Ramos1990 (talk) 09:22, 15 May 2022 (UTC)
 * "That claims above that Nessa is not a scientist or is unscientific is easily falsified by looking her up. She even did stuff for the Royal Society of Chemistry, so it looks like so she is not some amateur." I did look her up before making my first comment, and I see no reason to change my view. As for the "stuff for the Royal Society of Chemistry" that she did, maybe you should alert Web of Science to this so that it can be listed. Regardless of that, can you cite the reference here? Probably you are referring to the book Epigenetics for Drug Discovery that she edited, but did not otherwise contribute to.  Athel cb (talk) 09:48, 15 May 2022 (UTC)

@Ramos1990 Your reference to the article by Ewan Callaway in New Scientist reveals how strongly you are biased against junk DNA. He says, “In recent years, researchers have recognised that non-coding DNA, which makes up about 98 per cent of the human genome, plays a critical role in determining whether genes are active or not and how much of a particular protein gets churned out.”

Really? Do you honestly believe that intelligent molecular biologists have only discovered regulatory sequences “in recent years”? That’s just nonsense. We’ve been studying them since the 1970s, There are reliable scientific sources that should be included in Wikipedia references and then there’s garbage like the Ewan Callaway article that should not.

BTW, you are wrong about your claim that Pennisi’s articles “get strong editorial oversight from experts in the field.” If that were true we would not be in the mess that we’re in now with respect to junk DNA. Genome42 (talk) 13:27, 15 May 2022 (UTC)
 * That is an example of a source already in the article. I did not put it in. You may not like it, but science news articles are a secondary source and are considered reliable in wikipedia when they are published by publisher with editorial oversight like New Scientist and of course Science. Sometimes they are used because they quote experts in them and provide a mini review on the topic, albeit imperfectly, etc. Please read the policy on what constitutes a reliable source WP:SOURCETYPES. It is pretty broad. Wikipedia prefers secondary sources like books and reviews since primary sources like research papers have issues. It says " Articles should rely on secondary sources whenever possible. For example, a paper reviewing existing research, a review article, monograph, or textbook is often better than a primary research paper."Ramos1990 (talk) 19:00, 15 May 2022 (UTC)
 * I get that part. But not all secondary sources are reliable or accurate so we need to be careful which ones we choose so we don't mislead our readers. It's OUR responsibility to examine these sources carefully and filter out the ones (such as the Ewan Callaway article) that are obviously not good science. When one of us identifies such an article you shouldn't just dismiss the criticism on the grounds that it's been there for several years and none of the historical editors recognized the problem. Genome42 (talk) 20:26, 15 May 2022 (UTC)
 * You say that "a paper reviewing existing research, a review article, monograph, or textbook is often better than a primary research paper." Maybe, but so far you've come up with a journalistic article in Science, a popular book and now an article in a popular magazine. Where are the serious reviews in serious publications (like Annual Reviews of Biochemistry) that support your position? In any case, might does't make right: as Joe Felsenstein commented yesterday in Larry's blog "Unfortunately there are many more genomicists and molecular biologists [than experts in molecular evolution], so the vote is still heavily against junk DNA." Can you cite just one expert in molecular evolution who agrees with you? Sydney Brenner was asked in an interview in 2009 if he was ready to make a public confession that he was wrong about junk: he wrote back saying "I am prepared to reduce it from 96% to 95.8%". OK, that was before what Joe calls "the 2012 ENCODE disaster", but I don't think that he revised his view before he died in 2019.  Athel cb (talk) 09:40, 16 May 2022 (UTC)
 * Not me saying it. It is the wikipedia policy saying it. I merely quoted it. The piece that Genome42 mentioned above from Nathaniel Comfort, who is a historian of medicine by the way, and is not a not molecular evolutionist or molecular biologist or genomics researcher still meets the criteria for a reliable source too. It has editorial oversight from a notable journal. There is room for multiple sources and multiple views, not just one. The threshold for sources is usually around does the source have some editorial oversight like some degree of fact checking mechanism. This in contrast to self-published sources which do not have such editorial mechanisms in place like blogs or self published books.Ramos1990 (talk) 12:32, 16 May 2022 (UTC)
 * @Ramos1990
 * I think you are still missing one of the main points in this discussion. It's a good idea to back up facts with an appropriate reference but, as you point out, there is a lot of controversy about junk DNA. Just because an opinion is published in a credible publication doesn't mean that it's a scientific fact.
 * This article, and many others on Wikipedia, tilt strongly toward skepticism about junk DNA and opponents of junk DNA (such as Nessa Carey and Elizabeth Pennisi) are referenced as though they are stating scientific facts. The article by Ewan Callaway in New Scientist is a perfect example. It appears in a section on "Regulating gene expression" in the following sentence "Some non-coding DNA sequences determine the expression levels of various genes, both those that are transcribed to proteins and those that themselves are involved in gene regulation (Callaway 2010)." That's just a sneaky way of getting a paper opposing junk DNA into the Wikipedia article. There's no way that the Callaway article is a primary source for the statement that genomes contain regulatory sequences!
 * We are not discussing the location where reliable sources are published. We all know the difference between blogs and scientific journals. What we are discussing is how to recognize whether a source is reliable (Corey, Pennisi, Callaway aren't) and whether unreliable sources should be treated as scientific facts. Genome42 (talk) 14:07, 16 May 2022 (UTC)

Forensic anthropology
The section on "forensic anthropology" is irrelevant, extremely biased, and scientifically inaccurate. I propose to delete it within a few days unless there are rational objections. — Preceding unsigned comment added by Genome42 (talk • contribs) 13:45, 16 May 2022 (UTC)
 * I agree. This section consists mainly of irrelevant chit-chat.  Athel cb (talk) 09:39, 17 May 2022 (UTC)

Regulating gene expression
This entire section with all of its short subsections (operators, enhancers, silencers, etc.) seems irrelevant in an article on non-coding DNA. The main point is covered in the short section on "Promoters and regulatory elements" under "Types of non-coding DNA sequences." This is not the place to discuss regulation in any detail. All we need to do is explain the various fractions of non-coding DNA and cover the individual fractions in separate Wikipedia articles. That works for regulatory elements, centromeres, telomeres, origins of replication, noncoding genes, scaffold attachment regions, introns, pseudogenes, and transpsons but for some strange reason there's resistance to having a separate article on junk DNA even though every scientist agrees that junk DNA exists. (There's debate over the amount of junk DNA but not over its existence.)

I propose to delete this section unless someone comes up with a rational reason for keeping it.Genome42 (talk) 16:02, 16 May 2022 (UTC)


 * Yes. The main problem with this section that reads like an excerpt from a textbook is that it is irrelevant to the topic of the article. It can be replaced by a reference to a suitable textbook.  Athel cb (talk) 09:51, 17 May 2022 (UTC)

Move section on 'Fraction of non-coding genomic DNA'
I propose moving the section on "Fraction of non-coding DNA" to below the section on "Types of non-coding DNA sequences." This will make it easier to discuss the fraction of the genome devoted to known functional non-coding DNA and the fraction that is either unknown DNA (dark matter) or junk DNA. Fight now, the "Fraction" section ignores all of the known functional non-coding DNA and implies that differences in genome size are entirely due to non-coding DNA. While this is not inaccurate, the differences are actually only due to a fraction of non-coding DNA; namely, that fraction that may be junk DNA. This inadvertently perpetuates the myth that non-coding DNA is equivalent to junk DNA.

It will be easier to explain the issue of genome size AFTER describing all of the functional non-coding DNA such as regulatory sequences and noncoding genes. We can explain that differences in genome size in eukaryotes are possibly due to transposon-related sequences and intron sequences and not to coding regions or functional non-coding DNA.

If there are no objections, I will move the section in a few days and then we can edit it for accuracy and relevance. Genome42 (talk) 15:26, 17 May 2022 (UTC)

"Knowledgeable scientists" and lack of neutrality
My concern is with these excerpts from the article: "However, knowledgeable scientists have known for decades that many noncoding sequences are functional....No knowledgeable scientist ever said that all noncoding DNA was junk....The general consensus among knowledgeable scientists is that a large percentage of the human genome is junk DNA. Naturally, this junk DNA is all noncoding DNA but that does not mean that all noncoding DNA is junk."

The phrase "knowledgeable scientists" appears 3 times. Is it appropriate? What does that mean? I think this is meant to refer to an argument between certain scientists or factions of scientists. This approach seems to violate Wikipedia's neutrality principle

Also, consider especially the sentence "No knowledgeable scientist ever said that all noncoding DNA was junk." First, is that factually correct? I'm not sure about that. I believe some very esteemed research biologist have said that all or virtually all of noncoding DNA had no function. And, again, who are the "knowledgeable scientists" who've never said this, and who are the un-knowledgeable scientists who have said it? This needs to be made explicit, or the whole contentions issue should be dropped altogether.

And focus on this sentence from the article: "Naturally, this junk DNA is all noncoding DNA but that does not mean that all noncoding DNA is junk." This seems to be an argument that the writer is having with the Intelligent Design theorists. If the writer wants to add a section about how Intelligent Design promoters have used and/or abused the non-coding DNA issue to promote Intelligent Design and to attack mainstream scientific evolutionary theories, perhaps the writer should create a new section in the article to address that. As it is, I think this violates the neutrality principle of Wikipedia.Credidimus (talk) 02:58, 25 November 2013 (UTC)


 * It has been known for a long long time that there are almost always regions directly upstream from a start codon where transcription factors can bind. Those constitute functional noncoding DNA, and anybody who didn't know that would certainly not be a "knowledgeable scientist" in any meaningful sense at least regarding genetics.  So I don't think the statements are inaccurate.  But I agree that "knowledgeable scientist" is not a very good term. Looie496 (talk) 15:25, 15 March 2014 (UTC)

I totally agree with the lack of neutrality pointed out here. Don't just sweep this under the rug. Lehasa (talk) 14:16, 24 July 2014 (UTC)


 * I think the term "knowledgeable scientists" is appropriate. What would be the alternatives to distinguish them from other professionals working in the field of genomic research who are unaware of or willfully ignoring evolution theory and thus denying the consequences of evolution on the genome level. You just cannot have it both ways. Either you accept the existence of junk DNA or you have to officially state that evolution theory got it all wrong for at least 50 years. Osteonectin (talk) 05:30, 18 May 2022 (UTC)

Splitting proposal
I propose that junk DNA be split back to a separate page called junk DNA since these are different concepts and junk DNA has enough material to be its own page.

For example non-coding DNA simply refers to any part of the genome that cannot be coded into protein, junk DNA is non-functional yes but not non-coding because it can be be coded into junk RNA and junk proteins.

— Preceding unsigned comment added by GospodinovD (talk • contribs) 14:39, September 7, 2021 (UTC)


 * I don't know if it should be split, but I agree there should be a separate page for junk DNA. There's plenty material for that. Haddarr (talk) 23:49, 23 January 2022 (UTC)
 * Junk DNA should definitely have it's own entry. We need to squelch the idea that junk DNA and noncoding DNA are synonyms or were ever thought of as synonyms by scientists who were up-to-date in molecular biology in the 1960s and 1970s.
 * The whole idea of "noncoding DNA" is actually a bit silly. We never talk about "non-centromeric DNA" or "non-origin DNA" or "non-regulatory DNA." The term "noncoding DNA" is probably a relic of the time when some non-expert scientists thought that the only important DNA in a genome was the DNA in open reading frames.
 * There's powerful evidence that 90% of our genome is junk and this evidence needs to be explained in a Wikipedia article. Genome42 (talk) 14:51, 13 May 2022 (UTC)
 * I do not know if there really is enough material to split the article since no one really seems to have expanded the junk DNA section here. It is hard to even find papers on the topic. It is a colloquial and sloppy term that even genomics researchers acknowledge is very problematic and therefore not usually used in research. Junk can be useful and still be junky or junk can be discarded material which was once useful or junk can be literally made into something useful. It is ambiguous and it coinage was colloquial in 1960s.Ramos1990 (talk) 19:54, 14 May 2022 (UTC)
 * I am Larry Moran. You can look at the discussions about me above. I know quite a bit about junk DNA and I can assure you that there is enough material to warrant its own page.
 * This article on non-coding DNA already has lots of stuff on junk DNA and that's a problem since it confuses readers about the distinction between noncoding DNA and junk DNA.
 * There are hundreds of papers on junk DNA so the amount of material isn't a problem.
 * It's not true that all genomics experts dismiss junk DNA. In fact, many of the acknowledged experts on the evolution of genomes agree that most of our DNA is junk and it's fair to say that almost all the experts on molecular evolution favor junk DNA.
 * The term junk DNA is still widely used today and it has a well-defined meaning. All of the scientific evidence supports the idea that most of our genome is junk and that needs to be explained in its own article. As you can see, even the material in this article is strongly supportive of junk DNA. The view that 90% of our genome is junk has far more support than the view that most of our genome is functional and that will become clear if we can write an unbiased article on the subject. The problem here is that several of the editors, including you, don't seem to be very knowledgeable about the subject. Genome42 (talk) 23:25, 14 May 2022 (UTC)
 * There is no way to verify who you are on wikipedia. Many people claim to be famous people here so that is not an argument that is valid or carries any weight on wikipedia. And merely claiming it is not a reason for anyone to believe what you are saying either. On top of that if you really are Larry Moran then there is conflict of interest issues where you cannot push your POV on an article. Especially since there are other viewpoints on the matter, for instance Carey and Pennisi whom you want to get rid of an censor out of the article.Ramos1990 (talk) 23:36, 14 May 2022 (UTC)
 * There is no way to verify who you are on wikipedia. Many people claim to be famous people here so that is not an argument that is valid or carries any weight on wikipedia. And merely claiming it is not a reason for anyone to believe what you are saying either. On top of that if you really are Larry Moran then there is conflict of interest issues where you cannot push your POV on an article. Especially since there are other viewpoints on the matter, for instance Carey and Pennisi whom you want to get rid of an censor out of the article.Ramos1990 (talk) 23:36, 14 May 2022 (UTC)

Just as side note, there was a "Junk DNA" page a decade ago and it was decided merged with this article. FYIRamos1990 (talk) 00:13, 15 May 2022 (UTC)
 * You have clearly stated your bias above: you don't believe in junk DNA. Does that mean you have a conflict of interest because you are pushing your POV in the article? It's clear to me that you and the other editors are tilting against junk DNA even though you try to appear neutral. I'm trying to correct that tilt but you guys are blocking me because you think you have reached a "consensus" that your view is correct. That's not fair.
 * I will edit the article to make it clear that the definition Nessa Carey uses is not accepted by any credible scientist and you guys can try and defend your position. Similarly, if we are ever allowed to write a separate article on junk DNA, I will demonstrate why Pennisi is misrepresenting the history of the field.
 * If you are really are Larry Morran, then you have an identifiable POV and there are policies on that. I have no issue with junk DNA, but you seem to have a sole purpose of censoring opposing views from reliable sources which would violate WP:NPOV which is another policy on wikipedia. It could result in another edit war and get you blocked for WP:disruptive editing. You already violated WP:3RR and if you keep this "I'm going to push my view in no matter what" attitude you will not like the consequences of behavioral deviance. It is shunned upon on wikipedia. This is not a blog where you can post whatever you like.
 * I am trying to compromise with you here. Other editors reverted everything you wrote. I restored some of it because I am assuming good faith and thought some of it was useful. Don't ruin it.
 * At the moment Nessa Carey's definition is not even used in this article. She is just grouped with other sources on other general claims supported by other sources so not sure what relevance there is targeting her in this article. Penninsi is only referenced once on a general claim too. So not sure what is the issue.Ramos1990 (talk) 01:34, 15 May 2022 (UTC)
 * Is there any evidence for the statement that "Many people claim to be famous people here"? If there is, you should should be able to give one or two examples. I don't remember ever seeing one.  Athel cb (talk) 07:27, 15 May 2022 (UTC)
 * Usually they are sockpuppets. Here is one who tried to be Tom Cruise . Blocked. Multiple Bill Clintons, , . All blocked. What I meant by that was that impostoring occurs quite a bit. It happens to wiki editors too . Even when a celebrity is confirmed to be who they are, they tend to stay way from related articles to avoid conflict of interest issues. You cannot be the President and dictate to other editors what belongs or does not belong on presidential pages.Ramos1990 (talk) 10:04, 15 May 2022 (UTC)
 * OK, I confess that I hadn't thought about nutters who claim to be Bill Clinton, etc., but I'm surprised that you put Genome42's statement that he is Larry Moran on the same level, as everything Genome42 has written here agrees, both in content and style, with what Larry Moran has written (on his blog and elsewhere).  Athel cb (talk) 08:01, 16 May 2022 (UTC)
 * @Ramos1990
 * Let me see if I understand your position. Are you saying that knowledgeable experts should not be allowed to edit Wikipedia pages because their perspective on the science might be in conflict with the biases and prejudices of the non-experts who are currently controlling the content of the page?
 * Is there any evidence for the statement that "Many people claim to be famous people here"? If there is, you should should be able to give one or two examples. I don't remember ever seeing one.  Athel cb (talk) 07:27, 15 May 2022 (UTC)
 * Usually they are sockpuppets. Here is one who tried to be Tom Cruise . Blocked. Multiple Bill Clintons, , . All blocked. What I meant by that was that impostoring occurs quite a bit. It happens to wiki editors too . Even when a celebrity is confirmed to be who they are, they tend to stay way from related articles to avoid conflict of interest issues. You cannot be the President and dictate to other editors what belongs or does not belong on presidential pages.Ramos1990 (talk) 10:04, 15 May 2022 (UTC)
 * OK, I confess that I hadn't thought about nutters who claim to be Bill Clinton, etc., but I'm surprised that you put Genome42's statement that he is Larry Moran on the same level, as everything Genome42 has written here agrees, both in content and style, with what Larry Moran has written (on his blog and elsewhere).  Athel cb (talk) 08:01, 16 May 2022 (UTC)
 * @Ramos1990
 * Let me see if I understand your position. Are you saying that knowledgeable experts should not be allowed to edit Wikipedia pages because their perspective on the science might be in conflict with the biases and prejudices of the non-experts who are currently controlling the content of the page?
 * @Ramos1990
 * Let me see if I understand your position. Are you saying that knowledgeable experts should not be allowed to edit Wikipedia pages because their perspective on the science might be in conflict with the biases and prejudices of the non-experts who are currently controlling the content of the page?

Are you threatening to block me because you think my POV will compromise the scientific accuracy of the article? That’s ridiculous. You will have plenty of opportunity to challenge the accuracy of the material that I post just as I’m challenging the accuracy of the current article on non-coding DNA and the POV of the editors who are censoring my revisions. Genome42 (talk) 13:41, 15 May 2022 (UTC)
 * Not saying contributing is not allowed. But that acting like one is a sole authority or the only expert on this page or topic and dictating to other editors and calling everyone else you disagree with non-experts can become an issue because for one you have no evidence that everyone else has no expertise. Such comments show bad faith and bad behavior. By the way this article looks, many people through the years with knowledgeable expertise wrote most of it little by little since it is pretty technically written already. But about the COI stuff, imagine if the Pope or Richard Dawkins were editors and constantly edited atheism and religion pages 'the way they wanted it', for example. Lots of issues would emerge like personal beef with others that does not belong on wikipedia. Just saying be careful, assume good faith on others, and be willing to compromise with others. That is why wikipedia has policies everyone has to abide by to ensure articles are democratic, collaborative, constructive, and not individual or authority based. Experts disagree with each other all the time too. No one is above anyone here.Ramos1990 (talk) 18:27, 15 May 2022 (UTC)

Let's get back to the main topic. Is there anyone here who objects to creating a separate page for junk DNA? If you object, please explain why because it seems to me that we really need such a page in order to explain to viewers what the main issues are in the controversy. We need some place to put all the evidence showing that 90% of the human genome is junk and to explain why many scientists reject this evidence.Genome42 (talk) 20:18, 15 May 2022 (UTC)
 * I looked at pubmed and searched for "junk dna" to see how prominent this topic even is. It seems the term is declining in usage in the scientific literature (see the "results by year"). This is despite all of the abundant media coverage it still gets. I would say that if the usage in the scientific literature was rising then perhaps it would be good a good idea, but the reverse is happening. I see an increasing number of papers calling for abandoning the term altogether too. Just an FYI, one of the original reasons for the merge of the junk DNA to this article was that it was causing too much confusion and edit warring as a separate page. When merged you could have the general article on noncoding DNA without the fireworks and a section isolating the controversies coming from it rather than having 2 pages on the same topic with the Junk DNA article mixing controversy with general information on noncoding DNA.Ramos1990 (talk) 21:28, 15 May 2022 (UTC)
 * Are you serious? Do you really believe that the debate is over and junk DNA doesn't exist just because the opinions you prefer to read are against junk? You don't seem to be knowledgeable about this topic. I can help you get up to speed. Read these articles on my blog.
 * sandwalk.blogspot.com/2021/11/whats-in-your-genome-2021.html
 * sandwalk.blogspot.com/2013/11/stop-using-term-noncoding-dna-it-doesnt.html
 * sandwalk.blogspot.com/2018/04/required-reading-for-junk-dna-debate.html
 * sandwalk.blogspot.com/2013/07/five-things-you-should-know-if-you-want.html
 * The last one is particularly important, it list five things you should know if you want to participate in the junk DNA debate.
 * Also, you seem to be genuinely confused about the difference between junk DNA and noncoding DNA. Think of it this way. Genomes can be divided into centromeric DNA and non-centromeric DNA and the junk is located in the non-centromeric DNA. Does that mean we should have an article on non-centromeric DNA where we discuss junk? We can also split the genome between regulatory DNA and non-regulatory DNA but I don't see you calling for an article on non-regulatory DNA where we discuss junk DNA.
 * The only reason why you favor discussing junk DNA in a article on non-coding DNA is because you think that junk DNA was once defined as non-coding DNA and this article will prove that some non-coding DNA has a function - therefore it is not all junk. That's an extremely biased, and incorrect, view. No knowledgeable scientist ever defended the claim that all noncoding DNA was junk. Do you think we didn't know about noncoding genes, regulatory sequences, and origins of replication back in the 1960s?
 * Genomes can be separated into functional DNA and junk DNA and that's where the debate is. The non-coding DNA fraction is a heterogeneous mixture of functional elements and junk DNA and it's very confusing to mix them. An article on junk DNA will discuss all of the various functional regions of the genome and how common they are in the human genome. We will see that if you add them all up you only get to about 5% of the genome. The article will discuss the evidence for junk DNA and the arguments against claims for abundant function. None of that is appropriate in an article on non-coding DNA.
 * It's easy for me to see why there was "edit warring" over a junk DNA article. It's because many of the editors here are opposed to junk DNA so they try to suppress the legitimate scientific debate. You need to recognize that what you are doing here is expressing a very personal and biased opinion about the topic of junk DNA and you are using your position to start edit wars in order to censure any views in favor of junk DNA. Genome42 (talk) 14:49, 16 May 2022 (UTC)
 * Why do you make all of these assumption about junk DNA and me? I have no issues with. Nor have I opposed adding sources on it anywhere. In fact I have been constantly saying that multiple views from multiple sources are acceptable on here. Its the policy on wikipedia WP:NPOV to represent "all the significant views that have been published by reliable sources on a topic." that includes pro and cons on junk DNA. If you have lots of content try making a draft of a Junk DNA article on your "sandbox". If it is substantial and adheres to WP:NPOV by showing multiple sides of the controversy to a decent degree at least, then I would support making it its own article again. The draft does not need to be long or massive but it should show some potential as a stand alone article. The original wikipedia junk DNA article is here from 2012 . It was not bad and it did contain some nuance. Perhaps piggy back on it?
 * But please read WP:OR and WP:SYN for how sources are supposed to be used. In essence the sources are the ones that speak, not us.Ramos1990 (talk) 16:47, 16 May 2022 (UTC)
 * Thank-you for the link to the old version. It was absolutely horrible and by saying that it was "not bad" you reveal your bias. There is no nuance or fairness in that article since it begins with ...
 * "The term was introduced in 1972 by Susumu Ohno, but is somewhat outdated (as of 2008), being used mainly in popular science and in a colloquial way in scientific publications. If DNA does not seem to have a function now it may have had a function in the past or may be discovered to have a function in the future. The term 'dark matter' is increasingly being adopted to refer to this apparently functionless DNA."
 * I could easily edit that old version but if I publish it, I think that would affect all the changes that were made in the non-coding DNA article. Is that correct?
 * If so, I will create a new version and publish it. Do you know who "approves" it for publication? I don't know what happens after that. Can you and the other junk DNA skeptics just delete it once I've published it? Genome42 (talk) 18:49, 16 May 2022 (UTC)
 * I can only answer from my own experience. I don't know exactly what the policy is on who does the approving. I had to submit my first Wikipedia article (on Alberto Sols) for approval. It took about two months and I don't know who did the approving. Once past that step I was able to create articles (not many, so far) without getting specific approval. Given the recalcitrance to accept your proposals in this Talk discussion I fear you're unlikely to get an easy ride.  Athel cb (talk) 10:00, 17 May 2022 (UTC)
 * @Athel cb, @Genome42, @Afasmit: I'm just catching up on this discussion but can offer some info that may help:
 * "Approval" for new pages that start off as Draft:XYZ is done by volunteer Wikipedia editors at WP:Articles for creation. They'll be looking for a few things:
 * Is it notable (not a problem here, clearly notable)
 * Are there sufficient references (there are plenty of reliable sources so shouldn't be a problem)
 * Is it a neutral coverage of the topic that gives proportional due weight to competing positions (i.e. if 85% of the literature supports the existence of junk DNA, then approx 85% of the article should be dedicated to presenting that info).
 * Is the language clear (this is where it'll be important to lay out the differences between protein-coding vs RNA-coding vs binding site vs telomeres/centromeres vs selfish genetic element vs zero function of any sort)
 * for subjects like this they'd usually reach out to the subject-specialist wikiprojects for feedback (e.g. WP:Molecular Biology).
 * So, I think this topic should be fine to go through that process, starting out at Draft:Junk DNA. When the draft is ready for the main encyclopedia, there's a button that says "Submit the draft for review".
 * Separately, there is also an independent process that you can opt for as well / instead, that is more like standard academic peer review, through WikiJournal of Science. After external peer review it'd be copied over to Wikipedia to evolve like any other article. Examples:
 * Grhl gene family: journal copy, Wikipedia copy
 * TIM barrels: journal copy, Wikipedia copy
 * Gene structure: journal copy, Wikipedia copy
 * Hope that info helps the process. T.Shafee(Evo &#38; Evo)talk 02:49, 20 May 2022 (UTC)
 * Hope that info helps the process. T.Shafee(Evo &#38; Evo)talk 02:49, 20 May 2022 (UTC)

Fix the introduction
The current version has a complicated paragraph about ENCODE results in the introduction. This is not the right place for that material; it is confusing and badly explained.

I propose moving that paragraph to the section on the ENCODE Project. It can be moved to the new article on junk DNA when we create it. — Preceding unsigned comment added by Genome42 (talk • contribs) 23:30, 14 May 2022 (UTC)
 * The lead is supposed to summarize the contents of the article and since the junk stuff is mentioned there, then ENCODE stuff is there too. Perhaps both the junk and ENCODE paragraphs can be shortened. Let me try to nuance the intro to diminish the discussion on Junk DNA and ENCODE. The rest of the article seems to be missed from the intro and the Junk and ENCODE section is not most of the article so it probably should be trimmed.Ramos1990 (talk) 23:53, 14 May 2022 (UTC)


 * The main problem here is confusion about the difference between non-coding DNA and junk DNA. I think this article should be about describing non-coding DNA and junk DNA should have a separate entry. (I'm working on it.) Of course junk DNA will get a mention here but I think it should only be to explain the false idea that non-coding DNA and junk DNA are equivalent. Putting any mention of junk DNA in the introduction just serves to perpetuate the idea that there's some important connection between the two terms other than the fact that junk DNA is not DNA that codes for protein. It would be like discussing junk DNA in an article on non-regulatory DNA or non-centromeric DNA.


 * Discussions about the ENCODE results should not survive in this article because that discussion will be in the junk DNA article. Beside being inappropriate, it's way too complicated to cover here. Similarly, any discussion of evidence for junk DNA will be in the junk DNA article and it's inappropriate to bring it up here, especially if it just mentions one aspect of that debate such as the C-value enigma. That's also far too complicated to bring up in an article on non-coding DNA and we will have no opportunity to present the counter-arguments of Mattick and his followers. — Preceding unsigned comment added by Genome42 (talk • contribs) 17:13, 20 May 2022 (UTC)
 * I agree with you. Since the article will be split into two again, then this article should be more about general non-controversial information on non-coding DNA and not involve much on junk DNA debates since they are indeed quite complicated. I agree that mentioning stuff about function or non-function or junk DNA or ENCODE in the lead or in most of the article has indeed perpetuated an untangled, and erroneous, equivalency understanding of both non-coding DNA and junk DNA among wikipedia editors and probably readers too. They are not the same. I will remove these from the introduction at this time. The junk DNA article, once written can absorb the details and references of such debates and wikilinked if there is a desire to do so.Ramos1990 (talk) 02:46, 21 May 2022 (UTC)

C-value enigma
I deleted the following sentences from the introduction because they are irrelevant and scientifically incorrect.


 * The amount of ncDNA can vary greatly between even closely related species, therefore there is not a clear relationship between "organism complexity" and genome-size, an observation known as the C-value enigma. This fact adds further evidence that major portions of ncDNA in larger genomes are non-functional.

The term "C-value enigma" was coined by my colleague Ryan Gregory to replace the old term "C-value paradox." Ryan argued that the existence of large eukaryotic genomes is no longer a paradox because we know the answer - it's due to differences in the amount of junk DNA. However, there are still questions about why genome sizes differ significantly and there are various explanations being explored. This is the C-value enigma (Gregory, 2005; Elliot and Gregory, 2015).

The fact that genome size doesn't correlate with perceived complexity is better known as the "G-value paradox," a term created by Hahn and Wray (2002) and exploited by John Mattick. I call this the "Deflated Ego Problem."


 * sandwalk.blogspot.com/2018/11/deflated-egos-and-g-value-paradox

This is only a small part of the arguments surrounding junk DNA and it doesn't deserve any special attention in the introduction to an article on non-coding DNA. — Preceding unsigned comment added by Genome42 (talk • contribs) 22:00, 20 May 2022 (UTC)


 * Fine to remove mention "C value" I suppose -- it is a bit of a distraction. However the remaining introductory sentences only discuss clearly functional ncDNA elements. Do you not think it is important to mention that the majority of ncDNA (esp. in large genome eukaryotes) is likely to be non-functional (i.e. junk DNA)? I think it's important here to not perpetuate the idea that lots of ncDNA is functional. Paul (talk) 05:02, 21 May 2022 (UTC)


 * Thanks for adding mention of non-functional elements back to the intro -- this looks good. I suggest trying to make smaller edits rather than whole-sale deletion of sections, and avoid inflammatory phrases like "irrelevant, scientifically incorrect, and extremely biased". Those sorts of edits are likely to get quickly rolled back. Paul (talk) 19:57, 22 May 2022 (UTC)

Evidence of functionality
This section is irrelevant, scientifically incorrect, and extremely biased.

The main body of the article describes all sorts of functional DNA sequences that fall under the non-coding DNA category but this section begins with "Some non-coding DNA sequences must have some important function." It seems to be arguing against some ill-defined claim that non-coding DNA is all junk but nobody is making such a claim and no reasonable scientist ever defended such a ridiculous idea. We certainly don't make a claim like that in the article.

The section on "Evidence from Polygenic Scores and GWAS" is particularly strange since it seems to be pointing out the obvious as if there was some controversy. The section includes the following statement, "Individual differences between humans are clearly affected in a significant way by non-coding genetic loci, which is strong evidence for functional effects." Nobody is surprised that human traits might be affected by changes in regulatory sequences or noncoding genes so what's the point? This is just anti-junk bias expressed as a strawman argument.

I propose to delete this entire section in a few days unless someone can come up with a rational explanation for why it should be retained.Genome42 (talk) 16:19, 16 May 2022 (UTC)


 * I agree. This section starts out by assuming its conclusion: "Some non-coding DNA sequences must have some important biological function". That does not belong in a section supposedly about Evidence. There is no attempt to discuss alternative ideas in a balanced way. Nonetheless, by deleting this and other sections you run the risk of being labelled a vandal and blocked from further editing; maybe you don't mind that. Athel cb (talk) 09:47, 17 May 2022 (UTC)
 * I don't see how I can be labelled a "vandal" if I announce what I propose to do and ask for input. If nobody objects then I assume there's general agreement with my proposal. Genome42 (talk) 15:28, 17 May 2022 (UTC)
 * We'll see! I've been trying to discover the biological expertise of the editors who think they can dismiss the opinions of people like you, Sydney Brenner, Dan Graur, Nick Matzke and Joe Felsenstein as having no merit. Ramos1990 seems to be mainly interested in religious topics like the historicity of Jesus, but not much in biology; and Qzd, mainly interested in tracking down vandals, and trappist the monk (no user page, so who knows) are not all that different. I was wondering what they thought of the Onion Test: to my surprise there is a Wikipedia article about it, of which the first third gives a reasonably accurate description, and the last two thirds deal with what it calls "Criticism".  Athel cb (talk) 16:49, 17 May 2022 (UTC)
 * It's interesting that two of my blog posts are cited in that article. I don't think Ramos1990 would be happy about that!
 * It's also interesting that the article devotes so much attention to the views of an Intelligent Design Creationist. I wonder how that fits into the discussion about appropriate references? Genome42 (talk) 18:49, 17 May 2022 (UTC)
 * Yes. I was amazed at the attention given to Jonathan Wells.  Athel cb (talk) 06:46, 18 May 2022 (UTC)
 * Please don't start an edit war over this section. The entire premise of the section is to show that some non-coding DNA has a function but that is obvious from the list at the top where we describe noncoding genes, regulatory sequences, centromeres, telomeres, and origins of replication. Those fractions are clearly functional and, once more, we've known that for 50 years.


 * The whole point of this section ("Evidence of functionality") was to refute the false claim that non-coding DNA is equivalent to junk DNA. It was probably inserted by editors who were arguing against junk DNA by referring to obvious functional parts of the genome that don't code for amino acids but that point becomes mute in light of the descriptions above. Further debate about the amount of junk DNA will be included in a separate article on junk DNA. It does not belong here. This is not the place where we debate junk DNA but it IS the place where we correct misinformation about non-coding DNA and that's why this section has to go.


 * Headbomb disagrees but it's not clear to me whether they are knowledgeable about the subject or just being overly protective about material that has been published in Wikipedia. This is the place where Headbomb should defend the scientific value of including duplicated evidence of function and other information that is irrelevant and misledaing. Genome42 (talk) 13:55, 23 May 2022 (UTC)

Evidence from Polygenic Scores and GWAS
This section presents evidence that some non-coding DNA has a function but that isn't disputed. Several functional regions of non-coding DNA are described in the first part of the article. This section appears to be a hangover from earlier arguments against junk DNA. It seems as though the original intent was that show than mutations in regions such as noncoding genes and regulatory sequences could have an effect on the phenotype of an organism. Surely this is obvious and doesn't need explanation unless you once held the false belief that all noncoding DNA was junk?

As it stands now, the section looks like a strawman argument and, besides, it is far too complicated for an article of this type.

I propose to delete the section unless someone can come up with a scientific justification for keeping in in a general article on non-coding DNA.Genome42 (talk) 14:27, 23 May 2022 (UTC)


 * The section should be restored. Non-coding DNA has always been an important consideration in the design and analysis of genotyping arrays and GWAS studies. In turn, most (typically 90%) of GWAS hits are in non-coding regions outside the exome and hence help to inform the possible roles of non-coding regions with respect to the phenotypes they associate with. The literature bears out both of those assertions--just search for GWAS non-coding DNA in GScholar and you will find many papers and many secondary, reliable sources (per WP:RS). Hence, as far as the mainstream genetics community is concerned, GWAS studies (and the closely related polygenic risk scores) are an important part of the non-coding DNA story and including a section discussing them is of due weight.
 * More broadly, a hyperbolic dismissal of a fairly well cited section as "irrelevant, scientifically incorrect, and extremely biased" without reliable sources to back up those assertions convinces no one of its truth. It just comes across as academic bombast--good for blog clickbait, but terrible for encyclopedic articles. The way of Wikipedia is to summarize content from reliable sources in a neutral and balanced way. If there are reliable sources that claim GWAS and PRS studies are irrelevant to the study of non-coding DNA and say nothing useful about it, we could include those in the section to provide a balanced summary of scientific thought about the topic. But even if such sources exist, they would not negate the impact of GWAS and PRS have had on the topic of non-coding DNA. -- 04:39, 24 May 2022 (UTC)
 * Mark, you have a point that there may be a reason to retain some discussion of GWAS but if so it needs corrections. Focus on "Individual differences between humans are clearly affected in a significant way by non-coding genetic loci, which is strong evidence for functional effects." It is scientifically (and logically) incorrect. Associations! Correlation is not causation. The very point of GWAS is to discover associations that may well be non-causative, non-functional, but are nonetheless diagnostically useful. Asserting that associations cause differences is scientific malpractice.
 * I suggest adding a statement emphasizing that GWAS studies have demonstrated that even DNA that is apparently not functional respective to a cell's or organism's biology may still yield useful diagnostic information. 146.115.157.23 (talk) 05:20, 24 May 2022 (UTC)


 * Relevance: I think the point of this section was to demonstrate that there are functional regions within the noncoding fraction. This seems rather pointless to me since we have already explained that this fraction contains at least 5,000 genes and genes are functional. We have also described regulatory sequences as part of this fraction and nobody doubts that regulatory sequences are functional. So, what's the point of this section?


 * Biased: The reason I think this section is biased is because it argues that non-coding DNA has a function and the only people I've ever heard make this argument are people who are against junk DNA. They usually make the false claim that all non-coding DNA was thought to be junk DNA and then by pointing out that some non-coding DNA has a function (duh!) they think they're making an argument against junk DNA. Unless someone can show me another reason for trying to prove that genes and regulatory sequences have a function, I think we should remove this section.


 * Scientifically incorrect: The article implies that the associations are causitive and this is not necessarily true. Lots of these associations are links to SNPs in junk DNA and the SNP may or may not be the cause of the phenotype/disease. Furthermore, even if the actual mutation causes the phenotype it may not be in a functional region of the genome. Mutations in junk DNA can cause genetic disease, for example. Thus, if the entire point of the section is to prove that there are functional regions in the non-coding fraction then the premise that the argument is based on is wrong. That's why the section is scientifically incorrect, or as stated above, 'scientific malpractice.' Genome42 (talk) 15:43, 24 May 2022 (UTC)
 * Genome42, respective to Relevance. I largely agree. Except, what was should not restrain what will be. Is there a proper role for a GWAS mention in this article? Maybe, just not the one that's present.
 * Biased? Certainly. Properly written, the bias is gone. Incorrect? Yes, as is. Necessarily? No. So strip it to valid factual statements. GWAS has identified regions of non coding DNA that are associated with diagnostically useful regions of non coding DNA. Some of these regions are function, and some are non-functional respective to biology. (it is a mistake to label these regions "functional" owing just to diagnostic utility, that's an abuse of the term "functional"). One almost wants to insert the XKCD cartoon about correlation and causation but that's perhaps over-the-top. Summary: the worst of the incorrect is falsely inferring biological function through association so don't do that.
 * Sidenote: the section on ENCODE is far worse. It makes me wretch. It conflates signal in a high-throughput screening assay with biological function. It's a foundational logical error and people debate it based upon references? 146.115.157.23 (talk) 02:17, 25 May 2022 (UTC)
 * Edit: GWAS has identified regions of non coding DNA that are associated with diagnostically useful regions of non coding DNA . 146.115.157.23 (talk) 02:26, 25 May 2022 (UTC)
 * Agreed with Mark viking's well thought out points on how it is relevant to non-coding DNA and reiteration of how wikipedia works and part of 146's comments. The section is indeed relevant, per google scholar. I got many hits too so there are lots of reliable sources available on it. If a section has reliable sources, instead of wholesale deleting things, consider rewording parts of the section while avoiding violating WP:OR and WP:SYN and sticking to the claims only made in the sources. Usually this reduces perceived bias. Some sources make clear statements that oppose one's views on the matter but that is not a reason to remove it or alter their views with our own interpretations of them. Here we have to tolerate different views found in the relevant literature AS IS precisely because such diverse views exist in the literature like that. And of course, what one scientist sees as accurate, another scientist sees as inaccurate - which is why we even have so much debate among scientists in the literature in the first place especially after ENCODE til today (e.g. Mattick vs Doolittle). Best stance is neutral tone and presenting both sides like Mark viking and wikipedia NPOV policy state.Ramos1990 (talk) 03:35, 25 May 2022 (UTC)
 * Oh BS. There are indeed sources, but they are of poor quality. This is the worst of Wikipedia. At some level, one has to put competence ahead of bombast.
 * Myriad sources that promote nonsense, that correlation equals causation, aren't valid just because there are many of them. Scientific integrity matters. The world is not flat even if many people perceive it that way. 146.115.157.23 (talk) 07:46, 25 May 2022 (UTC)
 * I've read all the comments and I just don't see the relevance. What is the point? We know that GWAS identifies associations that lie in coding DNA so are we just trying to show that it also identifies associations that lie outside of coding regions? If so, why? Wikipedia already has a very good article on Genome-wide association study so if people think it's important to mention GWAS here then why not just link to that page?
 * Ramos1990 says, "The section is indeed relevant per google scholar." What does that mean? Are you referring to the fact that there's an association between non-coding DNA and GWAS on Google Scholar? If so, that's not a sufficient reason to include GWAS in this article. There's an even more powerful connection between non-coding DNA and Neutral Theory but that doesn't mean we need to discuss Neutral Theory in this article. I still need to see a good explanation of why GWAS is relevant. Genome42 (talk) 13:02, 25 May 2022 (UTC)
 * I edited the section to make it less biased and scientifically accurate. I still don't think it's particularly relevant so I would prefer to delete it entirely. Genome42 (talk) 14:38, 25 May 2022 (UTC)
 * I edited the section to make it less biased and scientifically accurate. I still don't think it's particularly relevant so I would prefer to delete it entirely. Genome42 (talk) 14:38, 25 May 2022 (UTC)
 * I edited the section to make it less biased and scientifically accurate. I still don't think it's particularly relevant so I would prefer to delete it entirely. Genome42 (talk) 14:38, 25 May 2022 (UTC)
 * I edited the section to make it less biased and scientifically accurate. I still don't think it's particularly relevant so I would prefer to delete it entirely. Genome42 (talk) 14:38, 25 May 2022 (UTC)

Uses/Evolution
The section on "Uses" has a single subsection called "Evolution." This section has three parts.

1. Reference to a nineteen-year-old talk.origins post showing that pseudogenes can be old and can be found in related species. The talk.origins howlers (I am one) refer to this as the Ken Miller argument.
 * https://sandwalk.blogspot.com/2015/11/what-do-pseudogenes-teach-us-about.html

2. More focus on pseudogenes including a 22-year-old reference showing that pseudogenes evolve at the neutral rate - therefore they are junk. The main point seems to be that pseudogenes can give rise to new genes and the reference is to the infamous Balakireve and Ayala (2003) paper that questions whether pseudogenes are junk or functional DNA. There are probably a few examples of pseudogenes that have secondarily acquired a new function, in which case they are no longer pseudogenes, but the idea that a significant fraction of the 15,000 pseudogenes is anything but junk has been thoroughly refuted.
 * https://sandwalk.blogspot.com/2020/01/are-pseudogenes-really-pseudogenes.html

3. A study showing that de novo genes can arise from non-coding regions. This is correct, but the rate of de novo gene formation is likely to be quite low (one per million years) so it's not a big deal in the junk DNA controversy.
 * https://sandwalk.blogspot.com/2019/10/the-evolution-of-de-novo-genes.html

I don't think this section is relevant. There is a separate Wikipedia article on pseudogenes that discusses the points raised in #1 and #2 and there's also a separate article on  de novo genes that covers #3.

I suggest we remove this section. — Preceding unsigned comment added by Genome42 (talk • contribs) 15:09, 25 May 2022 (UTC)

Junk DNA and non-coding DNA
The confusion about non-coding DNA and junk DNA will be addressed in the new junk DNA article but I thought people here might like a preview. Here's the current draft.

---

There is considerable confusion in the popular press and in the scientific literature about the distinction between non-coding DNA and junk DNA. One example was recently published in American Scientist.


 * Close to 99 percent of our genome has been historically classified as noncoding, useless “junk” DNA. Consequently, these sequences were rarely studied.

Another example can be found in a recent book.


 * When it was first discovered, the nongenic DNA was sometimes called—somewhat derisively by people who didn't know better—"junk DNA" because it had no obvious utility, and they foolishly assumed that if it wasn't carrying coding information it must be useless trash.

There are many other examples. The common theme is that the original proponents of junk DNA thought that all noncoding DNA was junk. This is often used as a prelude to reports of functional noncoding DNA as though that refutes the idea of junk DNA.

These statements are incorrect. The existence of functional non-coding DNA elements such as noncoding genes, regulatory sequences, origins of replication, and centromeres were well known in the late 1960s when the the idea of junk DNA was being proposed. While it may be possible to comb though the scientific literature to find someone who equated non-coding DNA with junk DNA it was not an important part of the debate in the late 1960s and early 1970s. The leading proponents of junk DNA such as Susumu Ohno, Masatoshi Nei, Sydney Brenner, Francis Crick, Thomas Jukes, Motoo Kimura, and Tomoko Ohta were well aware of functional non-coding DNA and they never would have defended the idea that all non-coding DNA was junk. Also, they were not "foolish;" they based their arguments on what they thought was scientific evidence for junk DNA and not an argument from ignorance. There is still considerable debate over the merits of those arguments and whether most of the human genome is junk but confusion about the distinction between non-coding DNA and junk DNA should not be part of that discussion. Hopefully, this article will help to put an end to the false claim that all non-coding DNA was ever thought to be junk.

Junk DNA is defined as DNA that can be deleted from the genome without affecting the fitness of the individual or the species. It is DNA that is not currently subject to purifying selection. Some fraction of non-coding DNA, especially pseudogenes, introns, intergenic regions, and fragments of transposons, will undoubtedly contain mostly junk DNA but the amount of junk DNA is debated within the scientific community.


 * In its current form this draft does not conform to a Wikipedia style. For example, the first paragraph should begin "Junk DNA is ..." It may be worth spending some time reviewing Help:Your_first_article. You have front-loaded this with material that would typically be in a "Controversies" section late an article, which should still be written from a Neutral point of view. Try to keep in mind that Wikipedia is an encyclopedia. Paul (talk) 03:32, 26 May 2022 (UTC)


 * It is good start. However, there are quite a few issues with the contents of the draft. Drafter needs to find sources that make these claims specifically. Most of this draft is WP:OR and reads like an original essay or original thesis. The policy states “ Wikipedia articles must not contain original research. The phrase "original research" (OR) is used on Wikipedia to refer to material—such as facts, allegations, and ideas—for which no reliable, published sources exist. This includes any analysis or synthesis of published material that serves to reach or imply a conclusion not stated by the sources. To demonstrate that you are not adding original research, you must be able to cite reliable, published sources that are directly related to the topic of the article, and directly support the material being presented. The prohibition against original research means that all material added to articles must be verifiable in a reliable, published source, even if not already verified via an inline citation. The verifiability policy says that an inline citation to a reliable source must be provided for all quotations, and for anything challenged or likely to be challenged—but a source must exist even for material that is never challenged.”


 * Drafter would need to find sources that explicitly make statements on the draft. If no source can be found to make such statements, then the statement would have to be removed or adjusted according to what actual sources claim. Here are some examples of statements that need explicit sourcing because they are very specific claims:


 * “While it may be possible to comb though the scientific literature to find someone who equated non-coding DNA with junk DNA it was not an important part of the debate in the late 1960s and early 1970s. The leading proponents of junk DNA such as Susumu Ohno, Masatoshi Nei, Sydney Brenner, Francis Crick, Thomas Jukes, Motoo Kimura, and Tomoko Ohta were well aware of functional non-coding DNA and they never would have defended the idea that all non-coding DNA was junk. Also, they were not "foolish;" they based their arguments on what they thought was scientific evidence for junk DNA and not an argument from ignorance.”


 * “Junk DNA is defined as DNA that can be deleted from the genome without affecting the fitness of the individual or the species. It is DNA that is not currently subject to purifying selection.”


 * The first part is also WP:SYNTHESIS. Here is what synthesis is per wikipedia “Do not combine material from multiple sources to reach or imply a conclusion not explicitly stated by any source. Similarly, do not combine different parts of one source to reach or imply a conclusion not explicitly stated by the source. If one reliable source says A and another reliable source says B, do not join A and B together to imply a conclusion C not mentioned by either of the sources. This would be improper editorial synthesis of published material to imply a new conclusion, which is original research.”


 * Juxtaposing the Mortola source and the McHughen source and coming up with a new conclusion like “There are many other examples. The common theme is that the original proponents of junk DNA thought that all noncoding DNA was junk. This is often used as a prelude to reports of functional noncoding DNA as though that refutes the idea of junk DNA. These statements are incorrect.” is synthesis.


 * Also personal views or opinions on a subject such as the following are not allowed per WP:FORUM:


 * “…but confusion about the distinction between non-coding DNA and junk DNA should not be part of that discussion. Hopefully, this article will help to put an end to the false claim that all non-coding DNA was ever thought to be junk.”


 * The article has to be neutral and based on what the sources say, not what any wikieditor says or thinks is “correct” on the topic. Consider attribution too.


 * Once drafter cleans up the draft and aligns with what reliable sources actually state then the draft will be looking good.Ramos1990 (talk) 03:51, 26 May 2022 (UTC)
 * @Paul
 * Thank-you for your comments. The section is going to appear at the end of the Junk DNA article after I've explained the evidence for junk DNA and the correct history of the idea as it was developed in the late 1960s and early 1970s. I was there as a graduate student in a well-connected lab from 1968-1974 so I remember those debates and I knew many of the players.
 * I'm not sure if this section really qualifies as a "controversy" since there's no evidence whatsoever that the main proponents of junk DNA were ever trying to attribute all non-coding DNA to junk. In fact, there was considerable debate over what fraction was devoted to regulatory elements (e.g. Britten & Kohn, 1968) but there was no debate over the EXISTENCE of regulatory elements in the non-coding fraction. Similarly, there was debate over what fraction of the repetitive DNA was due to multiple copies of ribosomal RNA genes and/or tRNA genes but no debate over the EXISTENCE of these noncoding genes.
 * It's quite difficult to be 'neutral' when you are refuting misinformation. What do you suggest? Genome42 (talk) 19:21, 26 May 2022 (UTC)
 * It's quite difficult to be 'neutral' when you are refuting misinformation. What do you suggest? Genome42 (talk) 19:21, 26 May 2022 (UTC)


 * @Ramos1990
 * Thank-you for the information on Wikipedia. I realize that you interpret those rules very strictly and this can be helpful in some circumstances. I am especially intrigued by your insistence on the following rule: "To demonstrate that you are not adding original research, you must be able to cite reliable, published sources that are directly related to the topic of the article, and directly support the material being presented."
 * I will do my best. Meanwhile, instead of worrying about any future article, perhaps you could help me by applying that rule (and others) to the current article? It still needs a lot of fixing. Here are some citations that you could check out for me if you have time.
 * 1. The second sentence of the section on "Fraction of non-coding genomic DNA" says, "it was originally suggested that over 98% of the human genome does not encode protein sequences, including most sequences within introns and most intergenic DNA ..." (Elgar and Vavouri, 2008). I read that paper but I can't find any reliable evidence to support the claim as it stands. It is approximately correct to say that coding DNA takes up only a few percent of the genome but shouldn't we have a citation to an actual calculation? The last time I checked, the actual amount of coding DNA was between 0.9% and 1.0%. That means that 99% of the human genome is noncoding. I'm busy with other problems, so could you check the literature and put in the correct number and the correct citation since I know you are concerned about accuracy and citations?
 * 2. The same section mentions the genome of the bladderwort and says, that it has "only 3% non-coding DNA and 97% of coding DNA." That claim is also made in the figure legend. The reference is to a popular science article in a journal on design but that link no longer works. The research paper (Ibarra-Laclette et al. 2013) says that that about two-thirds of the bladderwort genome is non-coding DNA and that's ten-fold more than what it says in the article. Could you confirm that for me and make the correction?
 * 3. Also in that section it says, "The term 'junk DNA' has been questioned on the grounds that it provokes a strong a priori assumption of total non-functionality and some have recommended using more neutral terminology such as 'non-coding DNA' instead" and it cites Ryan Gregory's article in the 2005 book that he edited. But in his 2014 paper with my colleague Alex Palazzo he says, "Today, 'junk DNA' is often used in the broad sense of referring to any DNA sequence that does not play a functional role in development, physiology, or some other organism-level capacity." This definition of junk DNA as DNA that is totally non-functional is the dominant definition and has been published in dozens of papers. Do you think the statement in the current article is the sort of thing that should be included in an encyclopedia if we know the the person being quoted has changed his mind and we know that it misrepresents the view of junk DNA supporters?
 * Also, referring to the Palazzo and Gregory paper (which I helped edit) they say,
 * "It has now become something of a cliché to begin both media stories and journal articles with the simplistic claim that most or all noncoding DNA was 'long dismissed as useless junk.' The implication, of course, is that current research is revealing function in much of the supposed junk that was unwisely ignored as biologically uninteresting by past investigators. Yet, it is simply not true that potential functions for noncoding DNA were ignored until recently. In fact, various early commenters considered the notion that large swaths of the genome were nonfunctional to be “repugnant,” and possible functions were discussed each time a new type of nonprotein-coding sequence was identified (including pseudogenes, transposable elements, satellite DNA, and introns) ...."
 * What are the Wikipedia rules for citing something like this? It's an informed opinion (just like mine) but does that get an okay from you as long as it's in the scientific literature? Genome42 (talk) 20:12, 26 May 2022 (UTC)
 * What are the Wikipedia rules for citing something like this? It's an informed opinion (just like mine) but does that get an okay from you as long as it's in the scientific literature? Genome42 (talk) 20:12, 26 May 2022 (UTC)


 * I would certainly classify the quotes from & as controversial. Which I suspect is the point of why they were selected. I can appreciate that you have a personal connection to the researchers, which will make it a challenge to write in Neutral point of view style. I am a fan of the work of Ohno, Ohta, Palazzo & Gregory -- I reference their work regularly and have my students read their key papers -- which is why I'm here. It is worth having a crack at, it is what makes Wikipedia so useful. Having domain knowledge is certainly very useful for finding sources and knowing how to construct the articles, but with that comes the challenge of adhering to the recommended style.
 * BTW, Ohta gave a great seminar for the Asia-Pacific Genetics last Tuesday. She is very inspiring. --Paul (talk) 02:05, 27 May 2022 (UTC)
 * Genome42, great to hear you will try to follow rules. Sticking to what the sources say will help prevent or reduce issues and will still allow editors to share those views on the pages. Sticking only to statements that are found directly in the sources gives better grounding and is less prone to be challenged because they are verifiable per WP:V and makes content in the article transparent to significant degree to any reader. To your questions:
 * 1. Yes. The number should be specified in the source and not rely on wikieditor calculations. The second sentence you refer seems to be supported in the source in page 345 where it says “In both genomes, however, the vast majority of the DNA is non–protein coding (>98% in human, > 90% in Fugu)…” (p.345). It says greater than 98%. I found a few secondary sources like the Library of Medicine which states 99% . I also found some textbooks on it which concur with greater than 98% from science direct such as "Noncoding DNA makes up about 98.5% of the total DNA.". Another one there by the Encyclopedia of Evolutionary Biology just says 99%. Others I found were  "It is known that over 98% of the human genome is non-coding." So I think saying "98-99%" or "greater than 98%" and adding 1 source to support the 99% would be better to cover the range commonly found in the literature no?
 * 2. Sure thing. The dead link was a news post on it breaking a world record. It probably can be updated to a Science news article which is a more academic source and it mentions the 3% is noncoding DNA explicitly. The Ibarra-Laclette et al. 2013 paper is not clear on making that 3% nocoding DNA claim explicitly. The closest I see in there is "Intergenic sequence contraction in the U. gibba genome is particularly apparent in the paucity of repetitive DNA and mobile elements (Supplementary Table 8). Whereas repetitive DNA accounts for 10–60% of most plant genomes, in U. gibba it only amounts to 3%, including 569 mobile elements (Supplementary Information section 2)." which would require us to either "synthesize" (problematic) from this and interpret repetitive DNA = noncoding DNA or just mention that "3% of it genome is repetitive DNA" and leave it to the reader to decipher this. I see that "repetitive DNA" may not always be clearly associated with non-coding DNA, let alone non-functional DNA, in the literature . The secondary Science news source on the other hand does make the connection explicit, short and simple "The finding, published online today in Nature, overturns the notion that this repetitive, non-coding DNA, popularly called "junk" DNA, is necessary for life." and links to the Ibarra-Laclette et al. 2013 paper. I will try the switch from dead news link to the Science news link.
 * 3. Hmm.. the current wording is mostly accurate to the 2005 source (p.30-31) where Gregory does note that it does give off the a priori assumption of total nonfunction and recommends using the more neutral term "noncoding DNA" over "junk DNA" in future treatments of the subject. The only thing wrong in the statement in the wiki article is that he does not "question" junk DNA, he merely sees it as unhelpful in that it ambiguates rather than clarifies research since it has pejorative baggage to it. I can correct that right now. Now in terms of the 2014 paper on Gregory, I don't see him retracting or changing his mind on what he wrote in 2005 on that point. He never cites his 2005 book in the 2014 piece either. It seems he just updated his definition of "junk DNA" from pseudogenes in (2005 page 30) to "any DNA sequence that does not play a functional role in development, physiology, or some other organism-level capacity". So I think these are 2 different claims. I see no issue with providing Gregory's 2014 definition in that section because the only one that is there is the one from the 1960s-1970s. I will include it right now.
 * In terms of the draft question you had, criticisms are certainly allowed but you have to make sure the sources make the explicit criticism themselves and avoid SYNTHESIS. A good practice would be to attribute in cases where sources disagree with each other, so that the views fall back on the sources voice and not wikipedia's voice. So I think you can reword your draft along the lines of Gregory 2014 like
 * "According to Palazzo and Gregory, media stories and journal articles have often offered simplistic historical narratives of "junk DNA" give the erroneous impressions on the matter such as that early leading proponents of junk DNA based their views on ignorance or that noncoding regions were always dismissed as "junk" since the 1960s by the early proponents or that only recently have researchers have been finding function in some parts noncoding DNA. Palazzo and Gregory argue that all of this is false because early proponents of Junk DNA, since the 1960s, were repulsed by the notion that large swaths of the genome were nonfunctional and instead they often discussed possible functions when investigating a new type of nonprotein-coding sequence such as pseudogenes, transposable elements, satellite DNA, and introns. Palazzo and Gregory also state that the concept of junk DNA was based on known details about genome size variability, the mechanism of gene duplication and mutational degradation, and population genetics theory, all of which are still valid today based on observational and theoretical grounds." (Palazzo and Gregory)
 * Perhaps trim a little, but you get the idea. You can add a history section to the junk DNA article where such content can be expanded too. It seems there is lots of history on this topic. Hope this helps.Ramos1990 (talk) 08:05, 27 May 2022 (UTC)
 * @Paul
 * The emphasis on neutrality can easily get out-of-hand. In the case of a genuine scientific controversy, it's important to cover both sides but it's not necessarily important to cover them equally. With respect to the controversy over junk DNA this page (Non-coding DNA) did not do a very good job of being neutral so let's try and fix it. Eventually, all of the material on junk DNA and ENCODE will be moved to the new article dedicated to that controversy.
 * Some things are not part of a genuine scientific controversy. For example, the claim that early proponents if junk DNA argued that all non-coding DNA is junk is not really controversial. The people making that claim never provide any scientific or historical evidence to back it up because it's not true. If there was evidence, then I agree that it should be presented and we could review the history to find out which scientists might have thought that non-coding genes and regulatory sequences were junk. In the absence of any evidence of a genuine scientific controversy, it is the responsibility of Wikipedia to correct the misinformation that's out there. That's what we are here for.
 * In this case, the misinformation is so pervasive that the non-experts who are editing this page are confused about whether the claims has some merit. That makes it difficult to address it without appearing to be not neutral. I think I can handle it in the junk DNA article so hopefully we can put a stop to further spread of this ridiculous myth.Genome42 (talk) 17:07, 27 May 2022 (UTC)
 * @Ramos1990
 * Thanks for trying to fix #1 and #2. I think I can find the latest results on the amount of coding DNA in the human genome so I'll fix the citations when I have time. I'm pretty sure I can fix the explanation in #2 once I get a chance to re-read the paper. Give me a few days.Genome42 (talk) 17:07, 27 May 2022 (UTC)
 * @Ramos1990
 * Thanks for trying to fix #1 and #2. I think I can find the latest results on the amount of coding DNA in the human genome so I'll fix the citations when I have time. I'm pretty sure I can fix the explanation in #2 once I get a chance to re-read the paper. Give me a few days.Genome42 (talk) 17:07, 27 May 2022 (UTC)

Bladderwort reference
The bladderwort section is wrong and needs to be fixed. Gimme a day or so. The current reference to a news item in ScienceShot [1] must be deleted since it spreads incorrect information. It says, "Only 3% of this aquatic plant's DNA is not part of a known gene, new research shows. In contrast, only 2% of human DNA is part of a gene." That's ridiculous. Genes occupy about 45% of the human genome.

It's really, really important that we use reliable citations - see the discussion above on "Appropriate references." Genome42 (talk) 21:11, 27 May 2022 (UTC)


 * Hi, I think you made some WP:SYNTHESIS on the Ibarra 2013 source since you mention things not mentioned explicitly in the source. For instance, "It has roughly the same number of genes as other plants but the total amount of coding DNA comes to 35% of the genome - a much higher percentage than most complex eukaryotes. The remainder of the genome (65% non-coding DNA) consists largely of introns and functional elements. " is not found there, except the first part on it having a typical number of genes relative to other plants. "Eukaryotes", "35%", "65%" or even "65 noncoding-DNA" are not in the source. "Introns" is only mentioned once and "functional" is only found twice, but neither are used to enumerate how much of it constitutes noncoding DNA. Also, nothing ins the source states "strongly suggests that the missing DNA was non-functional, or junk DNA." The terms "non-functional" and "junk DNA" are not even found in the source and no claims in the sources resembles such a statement either. Perhaps what you wrote is true, but this is mostly wikieditor synthesizing and coming up with claims not made by the sources themselves.
 * The reason why the policy exists is because it prevents wikieditors from over or under interpreting sources and twisting out or extrapolating claims that the sources do not make. The expertise is in the sources, not us wikieditors, even if we think we "know" other details. The sources have to make the claims, arguments, connections, and points, not us. Readers cannot verify who makes particular edits on the wikipedia pages or the trustworthy of any wikieditor. We merely find and cite according to the limits of the sources since those are tracible to the readers. Otherwise, what is it to stop any wikieditor from twisting all sources claims to what they do not say? For instance, what if an editor used this same source and said the opposite of Ibarra 2013? It opens a pandora's box of misinformation and errors by anonymous wikieditor opinions - all of which are not considered reliable sources.Ramos1990 (talk) 21:11, 28 May 2022 (UTC)


 * @Ramos1990
 * Thank-you for your comment. The original article said that 3% of the bladderwort genome was non-coding DNA and this was repeated in the figure legend. Why didn't you object to that claim since it is clearly wrong? I don't see any evidence that you carefully scrutinized the statements in this article until I started correcting it. Why the double standard?
 * The supplemental data in the Ibarra-Laclette article gives all the data necessary to calculate the amount of coding DNA then all you have to do is subtract that from the genome size to get the amount of non-coding DNA. I was under the impression that you and the other editors wanted to include that information (amount of non-coding DNA) in the article because you mentioned it twice. I'm happy to delete the entire section on the bladderwort genome if that's what you want but the whole point of the paragraph seems to be to show that there is considerable variation in the amount of non-coding DNA in different species so what's wrong with specifically mentioning that point? The entire section is an attempt to synthesize the information in the scientific literature in order tho make a point than no one article makes.
 * The expertise in writing encyclopedia articles comes from the expertise and authority of the authors and the role of editors is to make sure that information is accurate. That hasn't happened in this article (and many others on similar subjects) because it was full of misinformation. I don't know why the previous editors failed to live up to the standards on a good encyclopedia but let's put that behind us and concentrate on fixing the problem.
 * The whole point of an encyclopedia article is to synthesize the information so that it's readable and informative to the general audience. Check out the Wikipedia article on the Battle of Waterloo, for example. It is not littered with citations and it has not been chopped into pieces by editors who all want to have their say on some nitpicky point. The article on the Gene is a scientific example of an article that has a great deal of what you call "synthesis" and that's because it was written by someone who knows the subject and doesn't have to cite every little bit of common knowledge information. The article on Evolution is another example of the sort of article we should be writing - it has a lot of "synthesis" with only peripheral citations that don't really support every word and phrase in the article. Genome42 (talk) 15:19, 29 May 2022 (UTC)
 * Hi Genome42, I checked that source a few years back when the original (now dead link) was active and believe the original wording was consistent with the source at that time. Otherwise I would have reworded so that the source is represented correctly. I looked at the supplementary data and believe that that that is where you got some of the numbers but the wikiedotorial calculating and extracting novel claims not in Ibarra 2013 is actually synthesis. The Science news report, on the other hand, was more consistent with your wording and there would have been no synthesis (just have to remove the Eukaryotes stuff or find a source for that point). The Science source did the "analysis" and came up with a conclusion actually similar to what you wrote. We could merely cite that and it would be following wikipedia rules. No sythn, reliable secondary source from Science, verifiable, etc. We are limited by what the sources say. I will adjust the wording per the source.
 * You can make any point you want, but specific claims or a number, have to be backed up by a source making such a number or claim. If no source can be found to make a claim you wish to include in the article, then it does not belong on wikipedia. Essentially. Otherwise who is to stop the pandoras box of the original research and synthesis (source manipulation)? It is a double edged sword but it is a form of quality control to prevent edit wars over wikiditor differences in opinion. The expertise comes from reliable sources as defined by wikipedia, not the actual wikieditors. Any "expertise" of any wikieditors comes in mainly in finding sources (knowledgeable people know where to look) that mention the claim you think is missing from the article and to cite the source making that claim on wikipedia.
 * I hear you on the quality of many articles you see on wikipedia. It certainly is true that many are choppy and part of that is that Wikiditors come and go and disappear and many just dump their original research or change what the sources claim and not enough editors have the time to read and double check new additions or old ones. All editors are volunteers so it is a time consuming and unpaid labor and the sheer number of edits being monitored is overwhelming. I try to clean up some of these articles, but I also am short on time. Nonetheless, the articles you mentioned have editors constantly removing or correcting inaccurate wording or content per the sources if you look at the history of those. Its not perfect, but it is being done - usually an editor will just correct the wording to better align with the source. I checked a few statements from one of the articles you linked and the ones I checked seem to align with the sources (some even provide quotes) so no synthesis from what I saw. Just extended extraction from the sources. If you find a case of synthesis, then please correct it per the source. A great article that is sprinkled with sources on almost every other sentence is Bacteria.
 * The point is that since you want to improve the numerous articles and since you are making lots of changes to this one for example, you should improve them while being consistent with Wikipedia rules to significantly reduce the chances of other editors reverting what you added and also because it will make you look like a good editor who follows the rules. Literally any original research or synthesis is extremely vulnerable to be removed without question by random editors. Properly sourced content on the other hand can be defended, restored, or corrected.Ramos1990 (talk) 19:59, 29 May 2022 (UTC)
 * Ramos1990 says,
 * "If you find a case of synthesis, then please correct it per the source. A great article that is sprinkled with sources on almost every other sentence is Bacteria."
 * Are you being sarcastic? The first three paragraphs of that article contain all kinds of 'facts' that are not cited. The only citation comes at the end and that's an obscure 2008 reference. You will have your hands full trying to make that article live up to your Wikipedia standards.
 * I don't have a big problem with the introduction to the Bacteria article. Most of the statements are correct and the information corresponds to what I think should be in an encyclopedia. I'm surprised that it measures up to the standards that you are trying to impose here. However, I do have serious problems with other parts of the article, especially the section on 'Origin and early evolution.' Some of the statements in that section (and the figure) correspond to statements made in the citations but they are wrong or misleading. The Three Domain Hypothesis is dead. Eukaryotes and Archaea do not form separate domains that are distinct from Bacteria. Eukaryotes arose from a fusion between a protoebacterium (within Bacteria) and (probably) a member of the Asgard family of Archaea (i.e. within Archaea). We've known for decades that the Three Domain Hypothesis is wrong and the article needs to be fixed to reflect this knowledge. (The correct 'circle of life' diagram is shown later on in the article under 'Classification' but the text is still wrong.)
 * With reference to the statement that the bladderworth genome had only 3% non-coding DNA ( = 97% coding), you said,
 * " I checked that source a few years back when the original (now dead link) was active and believe the original wording was consistent with the source at that time. Otherwise I would have reworded so that the source is represented correctly."
 * This is the problem. It's true that the statement in the article mirrored what was said in the citation but that doesn't make it factually correct. It's ridiculous to say that 97% of a eukaryotic genome is coding DNA and you should have recognized right away that this is wrong. We have an obligation to post correct scientific information in an encyclopedia and not to propagate misinformation just because it appears in what you define as a "reliable" source. You try to excuse your behavior by saying,
 * "The expertise comes from reliable sources as defined by wikipedia, not the actual wikieditors. Any "expertise" of any wikieditors comes in mainly in finding sources (knowledgeable people know where to look) that mention the claim you think is missing from the article and to cite the source making that claim on wikipedia."
 * The key to your statement is the phrase "knowledgeable people know where to look." That requires a large amount of expertise on the part of the authors of Wikipedia articles. Those experts have to figure out which sources defined by Wikipedia are "reliable" and which ones aren't. From my perspective, a great many wikieditors don't have this expertise.
 * We need to fix this article and many others. I value your experience in editing so we could work effectively together as a team as long as you are willing to recognize my experience and knowledge of the scientific content. Genome42 (talk) 17:46, 30 May 2022 (UTC)
 * I was not being sarcastic. And the rules are not optional and if synthesis is seen or original research is seen then editors can remove such content. Restoring such content does not look good on the editor who does such a thing. You cannot use the general imperfection of wikipedia as a basis to introduce original research or synthesis. We do not have enough editors to constantly review articles comprehensively so they try to do it by little by little. Otherwise, anyone else can do the same and remove your stuff and replace it with their uncited original research or source manipulation (synthesis) because they think your edits are misinformation or whatever. Which is why WP:FORUM states "Wikipedia is not a place to publish your own thoughts and analyses or to publish new information."
 * I think you have lots of knowledge, but you have to use it following wikipedia rules. Wikipedia is not a blog or personal website or essay to say whatever you want. It is limited to what published sources have said on the topic (see my quote of the policy in the section above). Its pretty clear. I understand you wish to educate on wikipedia, but it has to be through finding reliable sources making the claims, not you stating what a source does not state.
 * It seems you are more focused on the contents of what sources themselves say and the truthfulness, so for that you may want to look at . See under the No Original Research section there.
 * Your observations about the Bacteria article are understandable, and that type of complaint on inaccuracy of an article content applies to pretty much every article on wikipedia. It is not written by any one editor. It is written by many editors through many years little by little. No article lives up to the personal standard of any editor. It should be obvious that wikipedia is not a reliable source for anything because editors change this stuff all the time.
 * I will remove the Bladderwort reference seeing that you were okay with it being removed and that the way it is currently there is WP:SYNTHESIS.Ramos1990 (talk) 19:40, 30 May 2022 (UTC)
 * I added back the bladderwort section including all the citations that I could find, especially the updated sequence results. I also added citations from the New York Times and from one of the best science writers in the world (Ed Yong). According to your criteria, the material can't be deleted unless you can prove that the citations don't match what I wrote.
 * I strongly recommend that you take your editorial skills over to the article on [Utricularia gibba]]. It gets the implications right (most non-coding DNA is junk) but it also contains the following sentences.
 * "The main difference between other plant genomes and that of U. gibba is a drastic reduction in non-coding DNA. Only 3% of the plant's DNA is not part of a gene or material that controls those genes, in contrast to human DNA which is 98.5% non-coding."
 * The citation is to the same popular press article in "Design and Trends" that used to be referenced in this article. You will want to edit that now that you know it is misinformation, right?  Genome42 (talk) 20:47, 30 May 2022 (UTC)
 * Cool. News sources are usually ok because they have some editorial oversight, but blogs usually do not. If something a blog says is found in better sources, then those can be cited instead of the blog (without making claims or conclusions not found in them). Keep in mind that wikipedia is not a blog fest or a promoter of original thought. Rarely are they citable. Keep in mind that I do not have the time to look at all the articles and references or cross references or monitor who adds what to where since there are just quite a lot of editors and edits throughout many articles each day and I only do what I can. No editor can catch everything or do so much time consuming detective work. See WP:WINARS.Ramos1990 (talk) 20:59, 30 May 2022 (UTC)
 * Ed Yong is a Pulitzer Prize winning science journalist. He is ten times more reliable than the author of the New York Times article and 100 times more reliable than the author who wrote the stupid article in the design magazine that you allowed before. It is up to those who are knowledgeable about the subject to judge the reliability of citations and you have to exercise your judgement instead of blindly following some "rule." Look where that got you in the past.
 * Ed is quoting Ryan Gregory who is one of the world's leading authorities on genomes. Why in the world would you object to that? Is it because you don't like what he's saying? Genome42 (talk) 22:51, 30 May 2022 (UTC)
 * Here are the Wikipedia criteria for citing blogs. Turns out that you (Ramos1990) don't even know the rules you are trying to enforce.
 * Here are the Wikipedia criteria for citing blogs. Turns out that you (Ramos1990) don't even know the rules you are trying to enforce.
 * Here are the Wikipedia criteria for citing blogs. Turns out that you (Ramos1990) don't even know the rules you are trying to enforce.


 * "Being able to reasonably verify who wrote the blog is necessary to being able to source it as a primary source. The blog should meet one of the following criteria:


 * The blog is part of a credible site: a news agency, magazine, or other company; and the blog or postings are clearly identified as belonging to the named individual.
 * The blog is part of a notable and credible special interest site and the blog or postings are clearly identified as belonging to the named individual.
 * The blog is part of a site owned by the person(s) in question, and is established as their own words.
 * The blog is clearly identified on a credible site as belonging to that person(s). For example John Smith's biography on www.examplenewscompany.com identifies that he keeps a blog at livejournal and provides a link or other identifying method."


 * Relevance
 * Establishing relevance to the article in question is necessary for citation. The individual should meet one of the following criteria:


 * The individual is the subject of the article;
 * The individual is a verifiable employee of the company which is the subject of the article;
 * The individual is a prominent individual in the industry or field which is the subject of the article;
 * The individual is a widely-acknowledged expert on the subject of the article. Genome42 (talk) 22:59, 30 May 2022 (UTC)
 * The stuff you cited is not the policy. It is an inactive page and only retained for historical purposes.
 * The policy actually states "Anyone can create a personal web page, self-publish a book, or claim to be an expert. That is why self-published material such as books, patents, newsletters, personal websites, open wikis, personal or group blogs (as distinguished from newsblogs, above), content farms, Internet forum postings, and social media postings are largely not acceptable as sources."
 * Its his blog (no longer active since 2015 or so). There is no editorial oversight on what he says. I personally don't have any issues with what he says, but again you really need to understand the "reliable sources" policy. Reliability, in wikipedia terms, is not over the content of the source, whether what it says is true or false. Wikipedia does not vouch for the truth or falsehood of claims made in any source. Reliability is over editorial oversight of the source - does it have peer review or editorial oversight such as a degree of fact checking. The New York Times piece, and even the Science new piece you didn't like had such editorial oversight in place.
 * Please understand that this is all linked to original research, which is prohibited in wikipedia. If what he says is found in published sources like a news organization story, journal, book, reference work, etc then those can be used instead. if what he says is only found in his blog, then it does not belong on wikipedia. Ramos1990 (talk) 23:35, 30 May 2022 (UTC)
 * I updated the bladdewort section using citations from university press releases. These press releases are edited and supervised so they must meet the Wikipedia criteria even though they are often as unreliable as newspaper articles.
 * I suggest you check out citation #47, which is a reference to Dan Graur's personal blog. You will want to expunge it because it doesn't measure up to the standards you are policing. I'm sure you don't want to be accused of preferential treatment where you treat some editors differently than others, although I suspect that ship has already sailed.Genome42 (talk) 18:17, 31 May 2022 (UTC)

Untranslated regions
I'm trying to cover all of the non-coding DNA so I added a short section on 5'-UTRs and 3'-UTRs. This is standard textbook stuff so I don't think it requires a lot of citations. This is an encyclopedia entry and the authors are expected to be authorities on the subject matter.

Nevertheless, I added citations to three textbooks. Unfortunately my most recent copy of the Alberts text is from 1994 and my latest copy of Genes is from 2004. I threw out all my old biochemistry textbooks when I retired but I kept a copy of my own book from 2012 so I cited it. If anyone has more recent copies of textbooks please check to see if they cover UTRs and cite them. Genome42 (talk) 14:11, 5 June 2022 (UTC)