G-value paradox

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human. Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.

DNA and biological complexity
The lack of correlation between the morphological complexity of eukaryotes and the amount of genetic information they carry has long puzzled researchers. The sheer amount of DNA in an organism, measured by the mass of DNA present in the nucleus or the number of constituent nucleotide pairs, varies by several orders of magnitude among eukaryotes and often is unrelated to an organism's size or developmental complexity. One amoeba has 200 times more DNA per cell than humans, and even insects and plants within the same genus can vary dramatically in their quantity of DNA. This C-value paradox troubled genome scientists for many years.

Eventually, researchers recognized that not all DNA contributes directly to the production of proteins and other biological functions. Susumu Ohno coined the phrase "junk DNA" to describe these nonfunctional swaths of DNA. They include introns, genetic sequences that are removed after transcription into mRNA and thus are not translated into proteins; transposable elements that are mobile fragments of DNA, most of which are nonfunctional in humans; and pseudogenes, nonfunctional DNA sequences that originated from functional genes. The share of the human genome that may be considered "junk" remains controversial. Estimates reach as low as 8% and as high as 80%, with one researcher arguing that there is a fixed ceiling of 15% imposed by the genome's genetic load. (Prokaryotes, which have little "junk" DNA by comparison, exhibit a fairly close relationship between genome size and biological functionality).

In any case, the assumption was that once the C-paradox was swept away and the focus shifted to the number of protein-coding genes, the anticipated correlation between genetic information and biological complexity in eukaryotes would emerge. Unfortunately, the G-value paradox simply picked up where the C-value paradox left off, because the discrepancy persisted when comparisons were narrowed to just protein-coding genes.

G-value paradox
Estimates of the number of coding genes in the human genome reached upwards of 100,000 prior to the human genome project, but since have dwindled to as low as 19,000 following completion of that massive sequencing effort and subsequent refinements. By comparison, the microscopic water flea Daphnia pulex has about 31,000 genes; the nematode C. elegans about 19,700; the fruit fly (Drosophila melanogaster) about 14,000; the zebrafish (Danio rerio), 26,000; and the small flowering plant Arabidopsis thaliana, 27,000. Plants in general tend to have more genes than other eukaryotes. One explanation is their higher incidence of gene and whole genome duplication and retention of those additional genes, due in part to their development of a large collection of defensive secondary metabolites.

The apparent disconnect between the number of genes in a species and its biological complexity was dubbed the G-value paradox. While the C-value paradox unraveled with the discovery of massive sequences of noncoding DNA, resolution of the G-value paradox appears to rest on differences in genome productivity. Humans and other complex eukaryotes simply may be able to do more with what they have, genetically speaking.

Among the mechanisms cited for this greater productivity are more sophisticated transcriptional controls, multifunctional proteins, more interaction between protein products, alternative splicing and post-translational modifications that may produce several protein products from the same genetic raw material. In addition, thousands of non-coding RNAs that are transcribed from DNA but not translated into protein have emerged as important regulators of gene expression and development in humans and other eukaryotes. They include short RNA sequences, such as microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs), and long, non-coding RNAs (lncRNA) that may regulate gene expression at different stages of development. Some researchers suggest that instead of the number of genes the focus now should shift to gene interactions and the network of genetic regulatory mechanisms that allow them to support a variety of biological activities. These transitions have taken analysis of genetic complexity from the C-value to the G-value to what some refer to as the I-value, a measure of the total information contained in a genome.

Defining complexity
One of the challenges in the long debate over the mismatch between genome size and biological complexity has been ambiguity in defining complexity. Is it the number of cell types in an organism, the sophistication of its nervous system or the number of different proteins it produces? By some definitions, the greater complexity of humans compared to other organisms may be illusory. Even once complexity is defined, some researchers argue complexity in function does not necessarily require the same complexity in process. Evolution is not a paragon of efficiency but travels a crooked path that leads to a more cumbersome genome than is necessary in some species.