User:Ryancorona2/sandbox

E-box
An E-box (enhancer box) is a DNA response element found in some eukaryotes that acts as a protein-binding site and has been found to regulate gene expression in neurons, muscles, and other tissues. Its specific DNA sequence, CANNTG (where N can be any nucleotide), with a palindromic canonical sequence of CACGTG, is recognized and bound by transcription factors to initiate gene transcription. Once the transcription factors bind to the promoters through the E-box, components of the general transcription machinery, such as enzymes like RNA polymerase, can bind to the promoter and facilitate transcription from DNA to mRNA.

Discovery
The E-box was discovered in a collaboration between Susumu Tonegawa's and Walter Gilbert's laboratories in 1985 as a control element in immunoglobulin heavy-chain enhancer. They found that a region of 140 base pairs in the tissue-specific transcriptional enhancer element was sufficient for different levels of transcription enhancement in different tissues and sequences. They suggested that proteins made by specific tissues acted on these enhancers to activate sets of genes during cell differentiation.

In 1989, David Baltimore's lab discovered the first two E-box binding proteins, E12 and E47. These immunoglobulin enhancers could bind as heterodimers to proteins through basic helix-loop-helix (bHLH) domains. In 1990, another E-protein, ITF-2A (later renamed E2-2Alt) was discovered that can bind to immunoglobulin light chain enhancers. Two years later, the third E-box binding protein, HEB, was discovered by screening a cDNA library from HeLa cells. A splice-variant of the E2-2 was discovered in 1997 and was found to inhibit the promoter of a muscle-specific gene.

Since then, researchers have established that the E-box affects gene transcription in several eukaryotes and found E-box binding factors that identify E-box consensus sequences. In particular, several experiments have shown that the E-box is an integral part of the transcription-translation feedback loop that comprises the circadian clock.

The link between E-box-regulated genes and the circadian clock was discovered in 1997, when Hao, Allen, and Hardin (Department of Biology at Texas A&M University) analyzed rhythmicity in the period (per) gene in Drosophila melanogaster.

Binding
E-box binding proteins play a major role in regulating transcriptional activity. These proteins usually contain the basic helix-loop-helix protein structural motif, which allows them to bind as dimers directly to DNA. This motif consists of two amphipathic α-helices, separated by a small sequence of amino acids, that form one or more β-turns. The hydrophobic interactions between these α-helices stabilize dimerization. Besides, each bHLH monomer has a basic region, which helps mediate recognition between the bHLH monomer and the E-box (the basic region interacts with the major groove of the DNA). Depending on the DNA motif ("CAGCTG" versus "CACGTG") the bHLH protein has a different set of basic residues.

Relative Position of CTRR and E-Box

The E-box binding is modulated by Zn2+in mice. The CT-Rich Regions(CTRR) located about 23 nucleotides upstream of the E-box is important in E-box binding, transactivation (increased rate of genetic expression), and transcription of circadian genes BMAL1/NPAS2 and BMAL1/CLOCK complexes.

The binding specificity of different E-boxes is found to be essential in their function. E-boxes with different functions have a different number and type of binding factor.

The consensus sequence of the E-box is usually CANNTG; however, there exist other E-boxes of similar sequences called noncanonical E-boxes. These include, but are not limited to:


 * GGCCACGTGACC sequence found within MYC (c-Myc)
 * CACGTT sequence 20 bp upstream of the mouse Period2 (PER2) gene and regulates its expression
 * CAGCTT sequence found within the MyoD core enhancer
 * CACCTCGTGAC sequence in the proximal promoter region of human and ratAPOE, which is a protein component of lipoproteins.

Role in the circadian clock
In 1997, Hao, Allen, and Hardin found a circadian transcriptional enhancer upstream of the per gene within a 69 bp DNA fragment. Depending upon PER protein levels, the enhancer drove high levels of mRNA transcription in both LD (light-dark) and DD (constant darkness) conditions. The enhancer was found to be necessary for high-level gene expression but not for circadian rhythmicity. It also works independently as a target of the BMAL1 and CLOCK complex in the mammalian circadian clock.

Drosophila Clock
Based on these properties, a model for the transcription-translation feedback loop (TTFL) was hypothesized in Drosophila. First, CLOCK protein forms a heterodimer with another bHLH-PAS protein called CYCLE which binds E-boxes, activating per and tim transcription. As PER and TIM accumulate in the nucleus of the cell, these proteins repress per and tim transcription via non-PAS-dependent interactions between the C-terminus of PER and CLOCK. The interactions between PER and CLOCK-CYCLE in Drosophila are thought to reduce the ability of the CLOCK-CYCLE heterodimers to bind to E-boxes.

Mammalian Clock
Through biochemical assays in vertebrates, researchers have theorized a carefully staged set of interactions that occur at the E-box. Initially, through their bHLH PAS domains BMAL1 and CLOCK bind to each other to form a complex, which has been shown to interact with histone-acetyltransferase p300 and CREB-binding protein (CBP). The binding of the complex upon the E-box results in chromatin remodeling and recruitment of the RNA polymerase II machinery necessary for transcription. Large macromolecular complexes, which contain the proteins PER and CRY, then assemble on top of the complexes occupying the E-box to uncouple the DNA-bound BMAL1 and CLOCK heterodimer from its accompanying acetyltransferase. The assembly of the large macromolecular complexes and the binding of PER and CRY proteins to the heterodimer are essential to effectively stop the transcription of the per and cry genes, because the heterodimer is uncoupled from the acetyltransferase and the E-box. This rhythmic process of transcription promotion and inhibition produces the circadian oscillations in cells.

Sensitivity of the E-box
The E-box plays an important role in circadian genes; so far, nine E/E'BOX controlled circadian genes have been identified: PER1, PER2, BHLHB2, BHLHB3, CRY1, DBP, Nr1d1, Nr1d2, and RORC. As the E-box is connected to several circadian genes, it is possible that the genes and proteins associated with it are "crucial and vulnerable points in the (circadian) system."

The E-box promoter element is subject to both positive and negative regulation, creating a Transcription Translation Feedback Loop (TTFL). The E-box is positively regulated by BMAL1, CLOCK, and NPAS2 transcription factors which bind to the E-box and promote transcription of Per and Cry mRNAs in mammals. The E-box is negatively regulated by PER 1-3, CRY 1-2, and DEC 1-2; CRY and PER are hypothesized to auto regulate their own expression by repressing the heterodimeric complex of the basic helix-loop-helix (bHLH). Though positive and negative regulators have their own circadian rhythmic expression pattern, generally the peak time of positive regulators are anti-phase to that of negative regulators, resulting in delayed negative feedback.

The impact of the E-box in this clock gene regulatory network is best observed when perturbation experiments of E-box regulation abolishes circadian rhythms. When E-box activity is perturbed by over expression of the cry1 gene, both Per2-promoter driver reporter gene (Per2-dLuc) and Bmal1-promoter-driven reporter gene (Bmal1-dLuc) lose circadian rhythms. Disruption of BMAL1, a positive regulator in the E-box mediated regulation, also results in complete behavioral arrhythmicity in mice. Disruption of CLOCK alone did not result in loss of behavioral rhythmicity, but CLOCK and NPAS2 double-knockout mice does in fact lead to arrhythmic behavior. PER and CRY play critical roles in this regulatory network by closing the negative feedback loop of E-box regulation. Loss of both per 1 and per 2, negative regulators of E-box mediated transcription, show loss of circadian rhythmicity. Similarly, cry1 and cry2 disrupted mice are also arrhythmic.

E-box Involvement in Diverse Circadian Output Proteins
The E-box is one of the top five transcription factor families associated with the circadian phase and is found in most tissues. A total of 320 E-box-controlled genes are found in the SCN (suprachiasmatic nucleus), liver, aorta, adrenal, WAT (white adipose tissue), brain, atria, ventricle, prefrontal cortex, skeletal muscle, BAT (brown adipose tissue), and calvarial bone. One important E-box controlled gene to the generation and maintenance of circadian rhythms is arginine vasopressin (AVP), which is an output of the SCN circadian pacemaker. AVP contains an E-box in its promoter to which BMAL1/CLOCK can bind, thus activating/increasing/promoting transcription. Furthermore, studies now show that the Vasopressin receptor 1A mediates interactions between AVP and vasoactive intestinal polypeptide (VIP). VIP neurons are essential for circadian coordination between cells in the SCN, allowing the neurons to communicate with each other and synchronize to the same circadian rhythm, which only becomes possible through the rhythmic binding of BMAL1 and CLOCK to the E-box.

Circadian E-box-like elements
E-box like CLOCK-related elements (EL-box; GGCACGAGGC) are also important in maintaining circadian rhythmicity in clock-controlled genes. Similarly to the E-box, the E-box like CLOCK related element can also induce transcription of BMAL1/CLOCK, which can then lead to expression in other EL-box containing genes (Ank, DBP, Nr1d1). However, there are differences between the EL-box and the regular E-box. Suppressing DEC1 and DEC2 has a stronger effect on E-box than on EL-box. Furthermore, HES1, which can bind to a different consensus sequence (CACNAG, known as the N-box), shows suppression effect in EL-box, but not in E-box.

Both non-canonical E-boxes and E-box-like sequences are crucial for circadian oscillation. Recent research on this forms an hypothesis that either a canonical or non-canonical E-box followed by an E-box like sequence with 6 base pair interval in between is a necessary combination for circadian transcription. In silico analysis also suggests that such an interval existed in other known clock-controlled genes.

Role of proteins which bind to E-boxes
There are several proteins that bind to the E-box and affect gene transcription.

CLOCK and ARNTL
The CLOCK-ARNTL (also called CLOCK-BMAL1) complex is an integral part of the mammalian circadian cycle and vital in maintaining circadian rhythmicity.

Knowing that binding activates transcription of the per gene in the promoter region, researchers discovered in 2002 that DEC1 and DEC2 (bHLH transcription factors) repressed the CLOCK-BMAL1 complex through direct interaction with BMAL1 and/or competition for E-box elements. They concluded that DEC1 and DEC2 were regulators of the mammalian molecular clock.

In 2006, Ripperger and Schibler discovered that the binding of this complex to the E-box drove circadian DBP transcription and chromatin transitions (a change from chromatin to facultative heterochromatin). It was concluded that CLOCK regulates DBP expression by binding to E-box motifs in enhancer regions located in the first and second introns.

dCLK(JRK) and CYC
Among different organisms, there exists a conserved intracellular timing loop in which the E-box remains a vital part in producing circadian rhythms. Much like in the mammalian circadian clock, with the CLOCK-ARNTL complex, the binding of the dCLK-CYC complex to the E-box begins transcription of the period and timeless (tim) genes, which produces a Drosophila circadian rhythm. Through the rhythmic binding of the complex, per and tim transcription is regulated, allowing PER and TIM proteins to be produced. These proteins are later phosphorylated by a kinase in the transcription-translation feedback loop and eventually repress the binding of dCLK and CYC to the E-box, shutting the loop down.

MYC (c-Myc, an oncogene)
MYC (c-Myc), a gene that codes for a transcription factor Myc, is important in regulating mammalian cell proliferation and apoptosis.

In 1991, researchers tested whether c-Myc could bind to DNA by dimerizing it to E12. Dimers of E6, the chimeric protein, were able to bind to an E-box element (GGCCACGTGACC) which was recognized by other HLH proteins. Expression of E6 suppressed the function of c-Myc, which showed a link between the two.

In 1996, it was found that Myc heterodimerizes with MAX and that this heterodimeric complex could bind to the CAC(G/A)TG E-box sequence and activate transcription.

In 1998, it was concluded that the function of c-Myc depends upon activating transcription of particular genes through E-box elements.

MYOD1 (MyoD)
MyoD comes from the Mrf bHLH family and its main role is myogenesis, the formation of muscular tissue. Other members in this family include myogenin, Myf5, Myf6,Mist1, and Nex-1.

When MyoD binds to the E-box motif CANNTG, muscle differentiation and expression of muscle-specific proteins is initiated. The researchers ablated various parts of the recombinant MyoD sequence and concluded that MyoD used encompassing elements to bind the E-box and the tetralplex structure of the promoter sequence of the muscle specific gene α7 integrin and sarcomeric sMtCK.

MyoD regulates HB-EGF (Heparin-binding EGF-like growth factor), a member of the EGF (Epidermal growth factor) family that stimulates cell growth and proliferation. It plays a role in the development of hepatocellular carcinoma, prostate cancer, breast cancer, esophageal cancer, and gastric cancer.

MyoD can also bind to noncanonical E boxes of MyoG and regulate its expression.

MyoG (Myogenin)
MyoG belongs to the MyoD transcription factor family. MyoG-E-Box binding is necessary for neuromuscular synapse formation as an HDAC-Dach2-myogenin signaling pathway in skeletal muscle gene expression has been identified. Decreased MyoG expression has been shown in patients with muscle wasting symptom.

MyoG and MyoD have also been shown to involve in myoblast differentiation. They act by transactivating cathepsin B promotor activity and inducing its mRNA expression.

TCF3 (E47)
E47 is produced by alternative spliced E2A in E47 specific bHLH-encoding exons. Its role is to regulate tissue specific gene expression and differentiation. Many kinases have been associated with E47 including 3pk and MK2. These 2 proteins form a complex with E47 and reduce its transcription activity. CKII and PKA are also shown to phosphorylate E47 in vitro.

Similar to other E-box binding proteins, E47 also binds to the CANNTG sequence in the E-box. In homozygous E2A knock-out mice, B cells development stops before the DJ arrangement stage and the B cells fail to mature. E47 has been shown to bind either as heterodimer(with E12) or as homodimer(but weaker).

Binding to the E-box
Although the structural basis for how BMAL1/CLOCK interact with the E-box is unknown, recent research has shown that the bHLH protein domains of BMAL1/CLOCK are highly similar to other bHLH containing proteins, e.g. Myc/Max, which have been crystallized with E-boxes. It is surmised that specific bases are necessary to support this high affinity binding. Furthermore, the sequence constraints on the region around the circadian E-box are not fully understood: it is believed to be necessary but not sufficient for E-boxes to be randomly spaced from each other in the genetic sequence in order for circadian transcription to occur. Recent research involving the E-box has been aimed at trying to find more binding proteins as well as discovering more mechanisms for inhibiting binding.

A study published April 4, 2013 by researchers at Harvard Medical School concluded that the nucleotides on either side of an E-box influences which transcription factors can bind to the E-box itself. These nucleotides determine the 3-D spatial arrangement of the DNA strand and restrict the size of binding transcription factors. The study also found differences in binding patterns between in vivo and in vitro strands.

Effects of the E-box on Circadian Rhythms
A recent study from Uppsala University in Sweden implicates the AST2-RACK1 complex in inhibiting binding of the BMAL1-CLOCK complex to the E-box. The researchers studied the role of Astakine-2 in melatonin-induced circadian regulation in crustaceans and found that AST2 is necessary to inhibit binding between the BMAL1-CLOCK complex and E-box. Furthermore, they found that melatonin secretion is responsible for regulating AST2 expression and hypothesized that inhibiting E-box binding affects the clock in any animal with the AST2 molecule.

A study from 2016 conducted on rats also highlighted the ability of melatonin to shift the phase of the SCN, so that the rats either begin activity earlier (advance) or later (delay). This effect is controlled by E-box mediated transcription of the per1 and per2 genes. By the activation of protein kinase C (PKC) through a signal transduction pathway mediated by the binding of melatonin to a melatonin receptor melatonin can induce transcription in order to advance the clock. However, when it does not advance the clock, melatonin does not induce transcription. Thus, the E-box serves an important role in the way circadian rhythms are altered in the SCN.

Researchers at the Medical School of Nanjing University found that the amplitude of FBXL3 (F-box/Leucine rich-repeat protein) is expressed via an E-box. They studied mice with FBXL3 deficiency and found that it regulates feedback loops in circadian rhythms by affecting circadian period length.

A review from 2016 highlighted how DEC 1 is a protein that generally binds to E-boxes in order to suppress the target genes via histone deacetylase (HDAC). It has been shown that DEC 1 deletion mutants of the basic region as well as a point mutation of the basic region R65A have a dominant negative effect on CLOCK/BMAL1 transactivation. Other researchers have also shown that a DEC 1 deletion mutant of the basic region does not bind to E-boxes. This DEC 1 deletion mutant of the basic region was found to suppress metastasis of breast cancer cells, whereas DEC 1 overexpression induced metastasis in these cells.

DEC 2 suppresses target genes through E-boxes as well. However, researchers have shown that a DEC 2 point mutation of R57A in the basic region has little effect on CLOCK/BMAL2 transactivation. Further research is needed to elucidate the effect of DEC1 and DEC2 on circadian rhythms through E-boxes.