User:Genome42/sandbox

=Junk DNA=

Junk DNA is DNA that does not have a function; therefore, it is important to define "function." Proponents of junk DNA define functional DNA as DNA that is currently under purifying selection. This is the definition used by Dan Gaur in his textbook "Molecular and Genome Evolution."


 * "Functional DNA refers to any segment in the genome whose selected-effect function is that for which it was selected and/or by which it is maintained. Most functional sequences are maintained by purifying selection."

This definition of function is called the maintenance function. From this it follows that nonfunctional DNA, or junk DNA, is any segment in the genome that is NOT maintained by purifying selection. Many similar definitions have been published but they all have in common the idea that junk DNA is DNA that does not have a function and this means that it is not under negative selective pressure.

How much of the human genome is junk?
Most of this article is about the human genome but the arguments for function and junk apply to other genomes.

The data on functional and nonfunctional DNA elements in the human genome is covered in many other articles so this is just a brief summary.


 * Genes (main article genes) There are approximately 20,000 protein-coding genes in the human genome. The number of noncoding genes is disputed with values ranging from about 5,000 to more than 100,000.

Arguments against junk DNA
Some scientists are convinced that junk DNA does not exist. For example, Peter Larsen declared in 2018 that,


 * "There is no such thing as 'junk DNA.' Indeed, a suite of discoveries made over the past few decades have put to rest this misnomer and have identified many important roles that so-called junk DNA provides to both genome and function."

This is a widely held point of view although most of these authors don't explain why obvious examples of junk DNA, such as pseudogenes and broken bits of transposons, don't qualify as junk DNA.

Mutation load
The idea of excess DNA in some species started with the realization that the expected number of mutations in a species was would lead to extinction if the entire genome were full of functional DNA. This is a reference to mutation load or [genetic load].

By the late 1960s it was apparent that much of the DNA in humans had to be invisible to mutations and only a small percentage could be devoted to genes and other functional elements. The connection between the mutation load argument and junk DNA appeared in the paper by Susumu Ohno in 1972 where he said,


 * "All in all, it appears that the calculations made by Muller, Kimura and others are not far off the mark and that at least 90% of our genomic DNA is 'junk' or 'garbage' of various sorts."

How much of the human genome is junk?
Some scientists are convinced that junk DNA does not exist. For example, Peter Larsen declared in 2018 that,


 * "There is no such thing as 'junk DNA.' Indeed, a suite of discoveries made over the past few decades have put to rest this misnomer and have identified many important roles that so-called junk DNA provides to both genome and function."

This is a widely held point of view although most of these authors don't explain why obvious examples of junk DNA, such as pseudogenes and broken bits of transposons, don't qualify as junk DNA.

Mutation load
The connection between the mutation load argument and junk DNA appeared in a paper by Susumu Ohno in 1972 where he said,


 * "All in all, it appears that the calculations made by Muller, Kimura and others are not far off the mark and that at least 90% of our genomic DNA is 'junk' or 'garbage' of various sorts."

.... see "Gene: Mutation" ....

Most mutations are due to DNA replication errors. The DNA replication complex is highly accurate and newly replicated DNA will only have only about one error for every 10 billion base pairs replicated (10-10 per bp per replication.) - the estimates in various publications range from 10-9 to 10-11. The overall replication error rate is the product of (1) the intrinsic error rate of the polymerization reaction, (2) the errors that are corrected by proofreading, and (3) the errors that are corrected by repair enzymes following DNA replication.

The extraordinary accuracy of DNA replication means that mutations will be rare in unicellular organisms with small genomes, such as bacteria. However, when mutations occur they will likely affect genes since genes take up a large part of the bacterial genome. Such mutations have a good chance of being deleterious.

The overall DNA replication error rate applies to all cell divisions in multicellular organisms. This means a much greater chance of a mutation being passed on the the daughter cells in various tissues in species with large genomes. In humans, for example, an overall error rate of 10-10 means that there will be 0.62 mutations every time a cell divides (assuming cells are diploid and a genome size of 3.1 x 109). Spontaneous somatic cell mutations are responsible for many human diseases, including cancer.

In multicellular species, the mutation rate per generation can be calculated from the DNA replication error rate knowing the number of cell divisions that occur in germline cells. It can also be observed directly by sequencing the genomes of each parent and their offspring. These two values agree in humans, leading to an estimate of about 100 new mutations in every newborn baby. Mutation rates can also be calculated by comparing the genome sequences of two closely related sequences, such as humans and chimpanzees, and these rates are roughly the same as those obtained by the two other methods.

Since the phylogenetic rate only measures the neutral mutation rate, the agreement of the three estimates means that most of the human and chimpanzee genomes is evolving at the neutral rate - an observation that's consistent with the idea that most of the genome is junk.

Genes occupy about 45% of the human genome so in every newborn child there will be approximately 45 new mutations in genes and 55 new mutations elsewhere. If a large fraction of those mutations were deleterious then human species could not survive such a mutation load (genetic load). This lead to predictions in the late 1940s by one of the founders of population genetics, J.B.S. Haldane, and by Nobel laureate, Hermann Muller, that only a small percentage of the human genome contains functional DNA elements that can be destroyed by mutation.

In 1966 Muller reviewed these prediction and concluded that the human genome could only contain about 30,000 genes based on the known mutation rate and the number of deleterious mutations that the species could tolerate. Similar predictions were made by other leading experts in molecular evolution who concluded that the human genome could not contain more than 40,000 genes and that less than 10% of the genome was functional. These predictions were confirmed with the publication of the human genome sequence.

The connection between the mutation load argument and junk DNA appeared in a paper by Susumu Ohno in 1972 where he said,


 * "All in all, it appears that the calculations made by Muller, Kimura and others are not far off the mark and that at least 90% of our genomic DNA is 'junk' or 'garbage' of various sorts."

Several hundred thousand human genome have been sequenced making it possible to analyze the regions that are subject to purifying selection, that is, sequences that seem to be protected from mutations because such mutations are very deleterious. The results show that only a small percentage of the genome (less than 10%) seems to be functional by this criterion. Less than half of the sites subject to purifying selection lie within genes and these are concentrated in coding regions, the regions specifying functional non-codong RNAs, and intron splice sites. Other sites subject to purifying selection include regulatory sequences.

Selfish DNA
It's important to note that selfish DNA is functional DNA although its function resides at the level of the gene and not the individual. This means that selfish DNA is not junk DNA and the two terms are not synonyms.

Molecular evolution
The early proponents of junk DNA were well aware of the controversy they were initiating and how it would affect those whose standard view of evolution was restricted to natural selection. For example, Thomas Jukes wrote the following in a letter to Francis Crick in 1979.


 * "I am sure that you realize how frightfully angry a lot of people will be if you say that much of the DNA is junk. The geneticists will be angry because they think that DNA is sacred. The Darwinian evolutionists will be outraged because they believe every change in DNA that is accepted in evolution is necessarily an adaptive change. To suggest anything else is an insult to the sacred memory of Darwin."

=DRAFT SECTION=

Junk DNA stub for Non-Coding DNA
Junk DNA is DNA that has no biologically relevant function such as pseudogenes and fragments of once active transposons. Bacteria genomes have very little junk DNA but some eukaryotic genomes may have a substantial amount of junk DNA. The exact amount of nonfunctional DNA in humans and other species with large genomes has not been determined and there is considerable controversy in the scientific literature. See the article on Junk DNA for more information.

The nonfunctional DNA in bacterial genomes is mostly located in the intergenic fraction of non-coding DNA but in eukaryotic genomes it may also be found within introns (see Introns). It's important to note that there are many examples of functional DNA elements in non-coding DNA (see above) and there are no scientists who claim that all non-coding DNA is junk.

Origin of introns (Feb. 25, 2023)
(Added to INTRON on Feb. 25, 2023)

The current view is that following the formation of the first eukaryotic cell, group II introns from the bacterial endosymbiont invaded the host genome. In the beginning these self-splicing introns excised themselves from the mRNA precursor but over time some of them lost that ability and their excision had to be aided in trans by other group II introns. Eventually a number of specific trans-acting introns evolved and these became the precursors to the snRNAs of the spliceosome. The efficiency of splicing was improved by association with stabilizing proteins to form the primitive spliceosome.

Remove Prokaryotic Cell Diagram


The prokaryotic cell diagram was created by and inserted into this article on May 29, 2020.

The label under the diagram says, "A label diagram explaining the different parts of a prokaryotic genome" but the diagram does not explain the different parts of a prokaryotic genome. Instead it describes different parts of prokaryotic cell but some of those labels are incorrect.

The prokaryotic cell diagram stacks on top of the Genetics series banner, which should normally be at the top of the article.

I propose deleting the prokaryotic cell diagram in a few days unless anyone objects.

Are introns mostly junk?
The size of individual introns is not conserved in closely related species and the average length of introns varies widely from an average of less than 100 nucloetides in some species to averages of more than several thousand nulceotides in plants and mammals. As a general rule, the length of introns correlates with genome size suggesting that expansions and contractions of genome size affect introns and intergenic regions equally. Thus, the arguments for junk DNA apply to introns as well as the rest of the genome.

Highly repetitive DNA
Highly repetitive DNA consists of short stretches of DNA that are repeated many times in tandem (one after the other). The repeat segments are usually between 2 bp and 10 bp but longer ones are known. Highly repetitive DNA is rare in prokaryotes but common in eukaryotes, especially those with large genomes. It is sometimes called satellite DNA.

Most of the highly repetitive DNA is found in centromeres and telomeres (see above) and most of it is functional although some might be redundant. The other significant fraction resides in short tandem repeats (STRs; also called microsatellites) consisting  of short stretches of a simple repeat such as ATC. There are about 350,000 STRs in the human genome and they are scattered throughout the genome with an average length of about 25 repeats.

Variations in the number of STR repeats can cause genetic diseases when they lie within a gene but most of these regions appear to be non-functional junk DNA that where the number of repeats can vary considerably from individual to individual. This is why these length differences are used extensively in DNA fingerprinting.

Untranslated regions
The standard biochemistry and molecular biology textbooks describe non-coding nucleotides in mRNA located between the 5' end of the gene and the translation initiation codon. These regions are called 5'-untranslated regions or 5'-UTRs. Similar regions called 3'-untranslated regions (3'-UTRs) are found at the end of the gene. The 5'-UTRs and 3'UTRs are very short in bacteria but they can be several hundred nucleotides in length in eukaryotes. They contain short elements that control the initiation of translation (5'UTRs) and transcription termination (3'-UTRs) as well as regulatory elements that may control mRNA stability, processing, and targeting to different regions of the cell.

Defining the genome
It's very difficult to come up with a precise definition of "genome." It usually refers to the DNA (or sometimes RNA) molecules that carry the genetic information in an organism but sometimes it is difficult to decide which molecules to include in the definition; for example, bacteria usually have one or two large DNA molecules (chromosomes) that contain all of the essential genetic material but they also contain smaller extrachromosomal plasmid molecules that carry important genetic information. The definition of 'genome' that's commonly used in the scientific literature is usually restricted to the large chromosomal DNA molecules in bacteria.

Eukaryotic genomes are even more difficult to define because almost all eukaryotic species contain nuclear chromosomes plus extra DNA molecules in the mitochondria. In addition, algae and plants have chloroplast DNA. Most textbooks make a distinction between the nuclear genome and the organelle (mitochondria and chloroplast) genome so when they speak of, say, the human genome they are only referring to the genetic material in the nucleus. This is the most common usage of 'genome' in the scientific literature.

Most eukaryotes are diploid, meaning that there are two copies of each chromosome in the nucleus but the 'genome' refers to only one copy of each chromosome. Some eukaryotes have distinctive sex chromosomes such as the X and Y chromosomes of mammals so the technical definition of the genome must include both copies of the sex chromosomes. When referring to the standard reference genome of humans, for example, it consists of one copy of each of the 23 autosomes plus one X chromosome and one Y chromosome.

Conflicting definitions 'gene'
There are many different ways to use the term "gene" based on different aspects of their inheritance, selection, biological function, or molecular structure but most of the definitions fall into two categories, the Mendelian gene or the molecular gene. (12 = Orgogozo et al. (2016)

The Mendelian gene is the classical gene of genetics and it refers to any heritable trait. This is the gene described in "The Selfish Gene" 14 = Dawkins). More thorough discussions of this version of a gene can be found in the articles on Genetics and Gene-centered view of evolution. This article focuses on the molecular gene&mdash;the gene that's described in terms of DNA sequence. There are many different different definitions of this gene - some of which are mispleading or incorrect. But the idea of two kinds of genes dates back to the late 1950's when Jacob and Monod speculated that regulatory genes might produce repressor RNAs.

This idea of two kinds of genes is still part of the definition of a gene in most textbooks. For example,


 * "The primary function of the genome is to produce RNA molecules. Selected portions of the DNA nucleotide sequence are copied into a corresponding RNA nucleotide sequence, which either encodes a protein (if it is an mRNA) or forms a 'structural' RNA, such as a transfer RNA (tRNA) or ribosomal RNA (rRNA) molecule. Each region of the DNA helix that produces a functional RNA molecule constitutes a gene."


 * "We define a gene as a DNA sequence that is transcribed. This definition includes genes that do not encode proteins (not all transcripts are messenger RNA). The definition normally excludes regions of the genome that control transcription but are not themselves transcribed. We will encounter some exceptions to our definition of a gene - surprisingly, there is no definition that is entirely satisfactory."


 * "A gene is a DNA sequence that codes for a diffusible product. This product may be protein (as is the case in the majority of genes) or may be RNA (as is the case of genes that code for tRNA and rRNA). The crucial feature is that the product diffuses away from its site of synthesis to act elsewhere."

The important parts of such definitions are: (1) that a gene corresponds to a transcription unit; (2) that genes produce both mRNA and noncoding RNAs; and (3) regulatory sequences control gene expression but are not part of the gene itself. However, there's one other important part of the definition and it is emphasized in Kostas Kampourakis' book "Making Sense of Genes."


 * "Therefore in this book I will consider genes as DNA sequences encoding information for functional products, be it proteins or RNA molecles. With 'encoding information,' I mean that the DNA sequence is used as a template for the production of an RNA molecule or a protein that performs some function.'

The emphasis on function is essential because there are stretches of DNA that produce non-functional transcripts and they don't qualify as genes. These include obvious examples such as transcribed pseudogenes as well as less obvious examples such as junk RNA produced as noise due to transcription errors. In order to qualify as a true gene, by this definition, one has to prove that the transcript has a biological function.

Early speculations on the size of a typical gene were based on high resolution genetic mapping and on the size of proteins and RNA molecules. A length of 1500 base pairs seemed reasonable at the time (1965). This was based on the idea that the gene was the DNA that was directly responsible for production of the functional product. The discovery of introns in the 1970s meant that many eukaryotic genes were much larger than the size of the functional product would imply. Typical mammalian protein-coding genes, for example, are about 62,000 base pairs in length (transcribed region) and since there are about 20,000 of them they occupy about 35-40% of the mammalian genome (including the human genome).

In spite of the fact that both protein-coding genes and noncoding genes have been known for more than 50 years, there are still a number of textbooks, websites, and scientific publications that define a gene as a DNA sequence that specifies a protein. In other words, the definition is restricted to protein-coding genes. Here's an example from a recent article in American Scientist.


 * What Is a Gene, Really?


 * ... to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.

This restricted definition is so common that it has spawned many recent articles that criticize this "standard definition" and call for a new expanded definition that includes noncoding genes. However, this so-called "new" definition has been around for more than half a century and it's not clear why some modern writers are ignoring noncoding genes.

There are exceptions to the standard definition of a gene; for example, some viruses have an RNA genome. The one important exception concerns bacterial operons where a contiguous stretch of DNA containing multiple protein-coding regions is transcribed into one large mRNA. Scientists usually refer to each of the coding regions as separate genes in this case. The only significant controversy over the definition of a gene is whether to include the regulatory sequences that control transcription of the gene. The general consensus among scientists is that regulatory elements control the expression of a gene but are not part of the gene.

Repeat sequences, transposons and viral elements
Virus DNA

There are two main types of viruses, DNA viruses and RNA viruses. Some RNA viruses are called retroviruses in eukaryotes because the RNA is 'retrotranscribed' into DNA as part of the life cycle. In prokaryotes, these viruses are called bacteriophage or phage.

Sometimes the viral genome can become incorporated into the host genome, either as part of the normal life cycle or by accident. The viral sequence will then be passed on to daughter cells following DNA replication and cell division. If the insertion occurs in the germ line of multicellular species then the viral genome will be inherited in the next generation and the viral DNA may become fixed in the genome by random genetic drift.

The viral genome usually contains virus-specific genes that are transcribed and translated, which means that this DNA doesn't qualify as 'non-coding' in the strictest sense of the word, but, with some exceptions, the viral DNA evolves at the neutral rate of evolution so it soon becomes non-functional and qualifies as junk DNA. The exceptions include a few retroviral genes that have secondarily become essential in the life of the host.

DNA viruses and their degenerate descendants occupy about 3-4% of the human genome and RNA virus fragments take up about 9%. Viral DNAs have inserted into introns and also the spaces between genes (intergenic DNA). Since introns take up a substantial portion of the genome, the viral DNA elements are about equally distributed between introns and intergenic DNA.



Transposons and retrotransposons are mobile genetic elements. Retrotransposon repeated sequences, which include long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), account for a large proportion of the genomic sequences in many species. Alu sequences, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes.

Endogenous retrovirus sequences are the product of reverse transcription of retrovirus genomes into the genomes of germ cells. Mutation within these retro-transcribed sequences can inactivate the viral genome.

Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of DNA transposons. Much of the remaining half of the genome that is currently without an explained origin is expected to have found its origin in transposable elements that were active so long ago (> 200 million years) that random mutations have rendered them unrecognizable. Genome size variation in at least two kinds of plants is mostly the result of retrotransposon sequences.

Protein-coding genes
The human genome contains somewhere between 19,000 and 20,000 protein-coding genes. These genes contain an average of 10 introns and the average size of an intron is about 6 kb (6,000 bp). This means that the average size of a protein-coding gene is about 62 kb and these genes take up about 40% of the genome.

Exon sequences consist of coding DNA and untranslated regions (UTRs) at either end of the mature mRNA. The total amount of coding DNA is about 1-2% of the genome.

Many people divide the genome into coding and non-coding DNA based on the idea that coding DNA is the most important functional component the genome. About 98-99% of the human genome is non-coding DNA.

Biochemical activity
Another criterion that has been used to estimate functional elements is biochemical activity. Biochemical activity includes whether a given locus is transcribed or whether it binds a transcription factor.

In a series of papers published in 2012 the Encyclopedia of DNA Elements (ENCODE) project reported that detectable biochemical activity was observed in regions covering at least 80% of the human genome. These conclusions were promoted by a publicity campaign announcing the demise of junk DNA. The ENCODE conclusions were challenged in a series of publications over the next few years. The challengers suggested that many transcripts are spurious transcripts that do not necessarily come from functional regions of the genome. They also suggested that many transcription factor binding sites are nonfunctional sites that occur by chance in large genomes.

The challengers argued that biochemical activity is not a reliable indicator of function and in 2014 the ENCODE researchers agreed with the challengers and abandoned their claim that 80% of the human genome was functional. They also presented evidence for junk DNA that was missing in their 2012 papers. The most recent attempt to define function using biochemical activity focuses on identifying which transcripts have a function and which transcription factor binding sites are true regulatory sequences. One way of distinguishing between true functional biochemical activity and spurious nonfunctional biochemical activity is to look for evidence of sequence conservation or purifying selection. Opponents of junk DNA argue that biochemical activity detects functional regions of the genome that are not identified by sequence conservation or purifying selection.

Kellis et al. (2014)
According to the ENCODE researchers, the genetic approach looks at the phenotypic effects of mutations in order to identify functional regions of the genome. They maintain that the genetic approach is the "gold standard" for defining function. We know that there can be mutations in non-functional (junk) DNA that cause genetic diseases, for example by creating spurious splice sites, so the genetic approach cannot be a definitive criterion for identifying function.

The question is whether there are clear examples where the genetic approach identifies function elements of the genome that are not under purifying selection. In the absence of such examples, purifying selection is the only reasonable criterion.

The ENCODE researchers recognize that sequence conservation is indicative of purifying selection and they seem to implicitly accept that the definitive criterion is purifying selection and not just sequence conservation. They highlight technical difficulties in detecting sequence conservation but they don't discuss methods of detecting purifying selection.

They point out that human-specific elements will not be conserved but they fail to mention that they will still be subject to purifying selection, which is the preferred definition of function for that very reason.

They conclude that "absence of conservation cannot be interpreted as evidence for the lack of function." There are two problems with that statement. First, it's not conservation that defines function; it's purifying selection. Second, the goal is to identify function and not to provide evidence that a given region of DNA does not have a function (proving the negative). What we need is solid evidence that there are functional regions of the genome that are not under purifying selection so we can use another criterion to identify function if such a criterion exists.

The ENCODE researchers say that the biochemical approach identifies "candidate" functional elements. This is correct as long as it is understood to mean that it detects only a subset of true functional elements. The important point is that the biochemical approach by itself does not identify 'actual' functional elements but only 'possible' (candidate) functional elements. This is a retraction of their 2012 claim that all sequence with biochemical activity are functional.

They now point out that regions exhibiting biochemical function "are not always deterministic evidence of function, but can occur stochastically." This is exactly the point made by critics of their 2012 claim that 80% of the genome is functional.

The ENCODE researchers have conceded that not all regions of biochemical activity are functional but in order for biochemical activity to be a useful addition in identifying function there would have to be examples of true functional elements with biochemical activity that are not under purifying selection. Otherwise, the purifying selection definition supersedes biochemical activity in all cases.

In the context of the junk DNA debate, it is important to identify functional regions of the genome whether or not we know the exact type of function the the region specifies. The ENCODE researchers state that "Our results reinforce the principle that each approach [genetic, biochemical, evolutionary] provided complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease." Unfortunately, they have not provided a single example where biochemical activity or the genetic approach identifies function (i.e. not junk) in the absence of purifying selection but there are examples where the genetic and biochemical approaches identify regions that are not functional and assumed to be junk. It's difficult to see why all three approaches are said to be complimentary.

= References = For references with author credit

For references without author credit

access-date= 2023-02-28

Citing a symposium volume.

Link to subsections within an article. Junk DNA section of Non-coding_DNA

This is the first citation to Alberts et al. 1994 textbook. This is the second citation.

Shortened footnote template (sfn). Refers to the first reference in the list that corresponds to the same author name and date (e.g. Gould (2002) pp. 1-10)

Alberts et al. 1994 textbook

Amaral et al. (2023) (human genome catalogue)

Abascal et al. (2018)

Besenbacher et al. (2019) (mutation rates in great apes)

Bishop (1974)

Britten and Davidson (1969)

Britten and Kohne (1968)

Brown (2018) (Genomes 4)

Brown (2018) (Genomes 4: Chapt. 12 Transcriptomics)

Brunet and Doolittle (2014)

Brzović and Šustar (2020)

Casane et al. (2015)

Cavalier-Smith (1978)

Cavalier-Smith (1980)

Cavalier-Smith (1991) (introns)

Christmas et al. 2023

Comings (1972) (book)

Comings (1972) (book review)

Coyne (2009)

Crick (1978)(introns)

Dawkins (1976) (The Selfish Gene)

Dawkins and Wong (2016)

De Parseval and Heidmann (2005) (ERVs)

Doolittle (1978)(introns)

Doolittle (1991) (origin of inrons)

Doolittle (2013)

Doolittle and Sapienza (1980)

Doolittle et al. (2014)

Dover (1980)

Dover and Doolittle (1980)

Dukler et al. (2022) (genetic load)

Echols and Goodman (1991) (DNA replication)

Eddy (2012)

Eddy (2013)

Elliot et al. (2014)

ENCODE (2012)

ENCODE cartoon

ENCODE EMBL video

ENCODE The Guardian video

ENCODE Maher blog (2012)

Ensemble Homo sapiens

Francis and Wörheide (2017) (50% genes)

Galeota-Sprung et al. (2020)

Gericke and Hagberg (2007) (gene definitions)

Germain et al. (2014)

Gil and Latorre (2012) (junk DNA in bacteria)

Gilbert (1978)(introns)

Gilbert (1985)(introns)

Gould (2002)

Graur (2016) (textbook)

Graur (2017)

Graur et al. (2013)

Graur et al. (2015)

Gregory (2005)

Gymrek et al. (2016) (STRs)

Haldane (1949)

Halldorson et al. (2022)(genetic load)

Häsler et al. (2007) (Alus not junk)

Hatje et al. (2019)

Haerty and Ponting (2014)

Hopkin (2009) (gene definition)

Hoyt et al. (2022) (T2T sequence)

Hubé and Francastel (2015) (introns)

Irimia and Roy (2014) (origin of introns)

Jain (1980)

Jensen (2001) (orthologs and paralogs)

Jensen et al. (2013) (pervasive transcription)

Johnson (2019) (ERVs)

Judson (1996) (The Eight Day of Creation)

Jukes (1979) (letter to Crick)

Kampourakis (2017)

Keightly (2012) (mutation rates)

Kimura (1968)

Kimura and Ohta (1971)

King and Jukes (1969)

Kirchberger et al. (2020) (bacterial genomes)

Kronenberg et al. (2018) (great ape genomes)

Kunkel (2009) (DNA replication)

Lander et al. (2001) (human genome)

Larsen (2018)

Lewin (1974)

Lewin (1974b)

Lewin (1974c)(Cell editorial)

Lewin (2004) (Genes VIII)

Leypold and Speicher (2021) (sequence conservation)

Linquist (2022)

Linquist et al. (2020)

Lynch (2016) (~100 mutations per newborn)

Lynch et al. (2016) (mutation rate)

Mattick (2023)

Mattick (2023b)

Mattick and Dinger (2013)

McHughen (2020)

Moorjani et al. 2016) (primate molecular clock)

Moran et al. (2012) Principles of Biochemistry)

Morange (2014) (junk DNA controversy)

Morange (2020)(intron history)

Mortola and Long (2021) (gene definition/birth)

Muller (1950)

Muller, H.J. (1966)

Nachman (2004) (mutation rate history)

Neil and Faribrother (2019) (intron function)

Nelson et al. (2004)

Nowak and Waclaw (2017) (review of mutations cause cancer)

Niu and Jiang (2013)

O'Brian (1973)

Ohno (1972) (So much 'Junk' DNA)

Ohno (1972) (Genetic Simplicity)

Ohno (1972) (regulatory sequences)

Ohno and Yomo (1991)

Ohta (1973)

Ohta (1998)

Ohta and Kimura (1971)(30,000 genes) Omenn et al. (2020)

Orgel and Crick (1980)

Orgel, Crick and Sapienza (1980)

Orgogozo et al. (2016) (Mendelian vs Molecular Gene)

Palazzo and Gregory (2014)

Palazzo and Kejiou (2022) (molecular biologists)

Palazzo and Lee (2015)

Pearson (2006) (gene definition)

Pennisi (2007) (gene definition)

Pennisi (2012)

Piovesan et al. (2019)

Pioveasan et al. (2919) (length weight of human genome)

Ponicson et al. (2010) (SINE function)

Ponting (2017)

Ponting and Hardison (2011)

Ponting and Haerty (2022)

Ségurel et al. (2014) (mutation rates)

Scally (2016) (human mutation rate)

Scally and Durbin (2012) (human mutation rate)

Sharp (1991) ("Five easy pieces")

Sverdlov (2017) (junk RNA)

Sweet (2022)(junk DNA history thesis)

Thomas (1971) (C-value Paradox)

Yu et al. (2002) (minimal introns not junk)

van Bakel et al. (2011) (pervasive transcription)

Wade and Grainger (2018) (spurious transcription)

Walters et al. (2009) (SINE functions)

Watson (1965) (Molecular Biology of he Gene)

Wong et al. (2000) (are introns junk?)

Zhou et al. (2021) (DNA replication)

Press release: Yale 2012