User:Synpath/sandbox2

Definitions
There are many different ways to use the term "gene" based on different aspects of their inheritance, selection, biological function, or molecular structure but most of the definitions fall into two categories, the Mendelian gene or the molecular gene.

The molecular gene is any sequence of DNA that codes for a functional RNA product. These functional products include mRNA which may be translated into protein, tRNA used in translation, rRNA necessary for the construction of ribosomes, and a host of other functional products from non-coding DNA. This definition can be expanded to include phenomena, such as alternative splicing, trans-splicing, viral RNA genomes or bacterial operons where appropriate,  or simplified to focus on the most common one gene to one protein relation. Regulatory regions controlling the expression of genes may be included under these definitions, or classified as gene-associated regions.

Before the discovery of DNA and its relevance to heritable traits, the concept of a gene was the abstract notion of a smallest, indivisible unit relevant to inheritance. This is known as the 'Mendelian gene' and is the classical definition of a gene used in genetics and the gene-centered view of evolution originally proposed in Richard Dawkins' book The Selfish Gene. Such a definition is useful in fields where the molecular mechanisms governing how genes function are not directly relevant, such as population genetics.

Text below is copied from the Gene article (11-03-2023)

Definitions
There are many different ways to use the term "gene" based on different aspects of their inheritance, selection, biological function, or molecular structure but most of the definitions fall into two categories, the Mendelian gene or the molecular gene.

The Mendelian gene is the classical gene of genetics and it refers to any heritable trait. This is the gene described in "The Selfish Gene." More thorough discussions of this version of a gene can be found in the articles on Genetics and Gene-centered view of evolution.

This article focuses on the molecular gene&#x2014;the gene that's described in terms of DNA sequence. There are many different different definitions of this gene - some of which are mispleading or incorrect.

The very first edition of the textbook "Molecular Biology of the Gene" (1965) described two kinds of molecular gene: protein-coding genes and those that specified functional RNA molecules such as ribosomal RNA and tRNA (noncoding genes). But the idea of two kinds of genes dates back to the late 1950s when Jacob and Monod speculated that regulatory genes might produce repressor RNAs.

This idea of two kinds of genes is still part of the definition of a gene in most textbooks. For example,


 * "The primary function of the genome is to produce RNA molecules. Selected portions of the DNA nucleotide sequence are copied into a corresponding RNA nucleotide sequence, which either encodes a protein (if it is an mRNA) or forms a 'structural' RNA, such as a transfer RNA (tRNA) or ribosomal RNA (rRNA) molecule. Each region of the DNA helix that produces a functional RNA molecule constitutes a gene."


 * "We define a gene as a DNA sequence that is transcribed. This definition includes genes that do not encode proteins (not all transcripts are messenger RNA). The definition normally excludes regions of the genome that control transcription but are not themselves transcribed. We will encounter some exceptions to our definition of a gene - surprisingly, there is no definition that is entirely satisfactory."


 * "A gene is a DNA sequence that codes for a diffusible product. This product may be protein (as is the case in the majority of genes) or may be RNA (as is the case of genes that code for tRNA and rRNA). The crucial feature is that the product diffuses away from its site of synthesis to act elsewhere."

The important parts of such definitions are: (1) that a gene corresponds to a transcription unit; (2) that genes produce both mRNA and noncoding RNAs; and (3) regulatory sequences control gene expression but are not part of the gene itself. However, there's one other important part of the definition and it is emphasized in Kostas Kampourakis' book "Making Sense of Genes."


 * "Therefore in this book I will consider genes as DNA sequences encoding information for functional products, be it proteins or RNA molecules. With 'encoding information,' I mean that the DNA sequence is used as a template for the production of an RNA molecule or a protein that performs some function.'

The emphasis on function is essential because there are stretches of DNA that produce non-functional transcripts and they don't qualify as genes. These include obvious examples such as transcribed pseudogenes as well as less obvious examples such as junk RNA produced as noise due to transcription errors. In order to qualify as a true gene, by this definition, one has to prove that the transcript has a biological function.

Early speculations on the size of a typical gene were based on high resolution genetic mapping and on the size of proteins and RNA molecules. A length of 1500 base pairs seemed reasonable at the time (1965). This was based on the idea that the gene was the DNA that was directly responsible for production of the functional product. The discovery of introns in the 1970s meant that many eukaryotic genes were much larger than the size of the functional product would imply. Typical mammalian protein-coding genes, for example, are about 62,000 base pairs in length (transcribed region) and since there are about 20,000 of them they occupy about 35-40% of the mammalian genome (including the human genome).

In spite of the fact that both protein-coding genes and noncoding genes have been known for more than 50 years, there are still a number of textbooks, websites, and scientific publications that define a gene as a DNA sequence that specifies a protein. In other words, the definition is restricted to protein-coding genes. Here's an example from a recent article in American Scientist.


 * ... to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.

This restricted definition is so common that it has spawned many recent articles that criticize this "standard definition" and call for a new expanded definition that includes noncoding genes. However, this so-called "new" definition has been around for more than half a century and it's not clear why some modern writers are ignoring noncoding genes.

Although some definitions can be more broadly applicable than others, the fundamental complexity of biology means that no definition of a gene can capture all aspects perfectly. For example definitions explicitly focusing on DNA omit viral RNA genomes, Another important exception concerns bacterial operons which are transcribed into single large mRNAs containing multiple protein-coding regions, typically with each of the coding regions referred to as separate genes. The only significant controversy over the broad molecular definition of a gene is whether to include the regulatory sequences that control transcription of the gene. The general consensus among scientists is that regulatory elements control the expression of a gene but are not part of the gene.

Functional definitions
Defining exactly what section of a DNA sequence comprises a gene is difficult. Regulatory regions of a gene such as enhancers do not necessarily have to be close to the coding sequence on the linear molecule because the intervening DNA can be looped out to bring the gene and its regulatory region into proximity. Similarly, a gene's introns can be much larger than its exons. Regulatory regions can even be on entirely different chromosomes and operate in trans to allow regulatory regions on one chromosome to come in contact with target genes on another chromosome.

Early work in molecular genetics suggested the concept that one gene makes one protein. This concept (originally called the one gene-one enzyme hypothesis) emerged from an influential 1941 paper by George Beadle and Edward Tatum on experiments with mutants of the fungus Neurospora crassa. Norman Horowitz, an early colleague on the Neurospora research, reminisced in 2004 that "these experiments founded the science of what Beadle and Tatum called biochemical genetics. In actuality they proved to be the opening gun in what became molecular genetics and all the developments that have followed from that". The one gene-one protein concept has been refined since the discovery of genes that can encode multiple proteins by alternative splicing and coding sequences split in short section across the genome whose mRNAs are concatenated by trans-splicing.

A broad operational definition is sometimes used to encompass the complexity of these diverse phenomena, where a gene is defined as a union of genomic sequences encoding a coherent set of potentially overlapping functional products. This definition categorizes genes by their functional products (proteins or RNA) rather than their specific DNA loci, with regulatory elements classified as gene-associated regions.