Anti-CRISPR

Anti-CRISPR (Anti-Clustered Regularly Interspaced Short Palindromic Repeats or Acr) is a group of proteins found in phages, that inhibit the normal activity of CRISPR-Cas, the immune system of certain bacteria. CRISPR consists of genomic sequences that can be found in prokaryotic organisms, that come from bacteriophages that infected the bacteria beforehand, and are used to defend the cell from further viral attacks. Anti-CRISPR results from an evolutionary process occurred in phages in order to avoid having their genomes destroyed by the prokaryotic cells that they will infect.

Before the discovery of this type of family proteins, the acquisition of mutations was the only way known that phages could use to avoid CRISPR-Cas mediated shattering, by reducing the binding affinity of the phage and CRISPR. Nonetheless, bacteria have mechanisms to retarget the mutant bacteriophage, a process that it is called "priming adaptation". So, as far as researchers currently know, anti-CRISPR is the most effective way to ensure the survival of phages throughout the infection process of bacteria.

History
Anti-CRISPR systems were first seen in Pseudomonas aeruginosa prophages, which disabled type I-F CRISPR–Cas system, characteristic of some strains of these bacteria. After analysing the genomic sequences of these phages, genes codifying five different Anti-CRISPR proteins (also named Acrs) were discovered. Such proteins were AcrF1, AcrF2, AcrF3, AcrF4 and AcrF5. Research found none of these proteins disrupted the expression of Cas genes nor the assembling of CRISPR molecules, so it was thought that those type I-F proteins directly affected the CRISPR–Cas interference.

Further investigation confirmed this hypothesis with the discovery of 4 other proteins (AcrE1, AcrE2, AcrE3 and AcrE4), which were shown to impede Pseudomonas aeruginosa’s CRISPR-Cas system. Furthermore, the locus of the genes codifying these type I-E proteins was really close to the one responsible for the type I-F proteins expression in the same group of phages, leading to the conclusion that both types of proteins worked together. However, these first nine proteins shared no common sequence motifs, which would have made easier the identification of new Anti-CRISPR protein families.

Later on, it was seen that phages that produced such proteins also encoded a putative transcriptional regulator named Aca 1 (anti-CRISPR associated 1) which was genetically located really close to the anti-CRISPR genes. This regulatory protein is supposed to be the responsible for the anti-CRISPR gene expression during the infectious cycle of the phage, therefore, both types of proteins (anti-CRISPR and Aca1) seem to work together as a single mechanism.

After some studies, a similar amino-acid sequence to that of Aca1 was found, leading to the discovery of Aca2, a new family of Aca proteins. Aca2 also revealed the existence of five new groups of type I-F anti-CRISPR proteins due to their genomic proximity: AcrF6, AcrF7, AcrF8, AcrF9 and AcrF10. These proteins were not only present in Pseudomonas aeruginosa’s phages, as they also affected other cells of the Pseudomonadota (formerly Proteobacteria).

Thanks to the use of bioinformatic tools, in 2016, AcrIIC1, AcrIIC2 and AcrIIC3 protein families were discovered in Neisseria meningitidis (which had been infected by phages previously). Such proteins were the first inhibitors of type II CRISPR–Cas to be found (concretely, they impeded II-C CRISPR–Cas9, the type of mechanism used in the genetic edition of human cells). A year later, a study confirmed the presence of type II-A CRISPR–Cas9 inhibitors (AcrIIA1, AcrIIA2, AcrIIA3 and AcrIIA4) in Listeria monocytogenes (infected by bacteriophages which introduced the anti-CRISPR proteins). Two of those proteins (AcrIIA2 and AcrIIA4) were demonstrated to work properly against Streptococcus pyogenes type II-A defensive CRISPR system.

The result of all this research has been the discovery of 21 different Anti-CRISPR protein families, despite other inhibitors may exist due to the quick mutational process of phages. Thus, more research is needed to unravel the complexity of anti-CRISPR systems.

Types
Anti-CRISPR genes can be found in different parts of the phage DNA: in the capsid, the tail and at the extreme end. Moreover, it has been found that many MGEs have two or even three Acr genes in a single operon, which suggest that they could have been exchanged between MGEs.

As all proteins, Acr family proteins are formed by the translation and transduction of the genes, and their classification is based on the type of CRISPR-Cas system they inhibit, due to the fact that each anti-CRISPR protein inhibits a specific CRISPR-Cas system. Although not many anti-CRISPR proteins have been discovered, these are the ones that have been found so far:

So far, genes encoding anti-CRISPR proteins have been found in myophages, siphophages, putative conjugative elements and pathogenicity islands.

Attempts have been made to find common surrounding genetic features of anti-CRISPR genes, but without any success. Nevertheless, the presence of an aca gene just below anti-CRISPR genes has been observed.

The first Acr protein families to be discovered were AcrF1, AcrF2, AcrF3, AcrF4 and AcrF5. These inhibitors are mainly found in Pseudomonas phages, which are capable of infecting Pseudomonas aeruginosas possessing a type I‑F CRISPR–Cas system. Then, in another study, AcrE1, AcrE2, AcrE3 and AcrE4 protein families were found to also inhibit the type I‑F CRISPR–Cas in Pseudomonas aeruginosas.

Later on, AcrF6, AcrF7, AcrF8, AcrF9 and AcrF10 protein families, which were also able to inhibit type I‑F CRISPR–Cas, were found to be very common in Pseudomonadota MGEs.

The first inhibitors of a type II CRISPR–Cas system were then discovered: AcrIIC1, AcrIIC2 and AcrIIC3, that block the type II‑C CRISPR–Cas9 activity of Neisseria meningitidis.

Finally, AcrIIA1, AcrIIA2, AcrIIA3 and AcrIIA4 were found. These protein families have the ability to inhibit the type II‑A CRISPR–Cas system of Listeria monocytogenes.

As for the naming convention of Acr family proteins, it is established as follows: firstly, the type of system inhibited, then a numerical value referring to the protein family and finally the source of the specific anti-CRISPR protein. For example, AcrF9Vpa is active against the type I-F CRISPR–Cas system. It also was the ninth anti-CRISPR described for this system, and it is encoded in an integrated MGE in a Vibrio parahaemolyticus genome.

Structure
As exposed above, there is a wide spectrum of anti-CRISPR proteins, but few of these have been deeply studied. One of the most studied and well-defined Acrs is AcrIIA4, which inhibits Cas9, thus blocking the II-A CRISPR-Cas system of Streptococcus pyogenes.

AcrIIA4
AcrIIA4.jpg

Structure of AcrIIA4 obtained with the UCSF Chimera software, where its PDB file was uploaded. Different colours were assigned to the four different secondary structures found in this protein: blue for β-strands, red for α-helices, orange for the 310 helix, and grey for loops. Originally, the PDB file contains the 20 lowest energy sequences (and thus, the most stable ones) superposed, one of which was randomly selected to create the figure. ]] The protein was solved using nuclear magnetic resonance (NMR); it contains 87 residues and its molecular weight is 10.182 kDa. AcrIIA4 contains:


 * 3 antiparallel β-strands (the first, from residues 16 to 19, the second, from 29 to 33, and the third, from 40 to 44) that form a β-sheet. This represents a 16,1% of the total number of amino acids, as 14 of them form the β-strands.
 * 3 α-helices (the first, 2–13 residues, the second, 50–59 residues, and the third, 68–85 residues).
 * 1 310 helix placed between the first (β1) and second (β2) β-strands, which starts at residue 22 and end in residue 25. The total helical part is composed of 40 residues, which is a 50,6% of the protein.
 * Loops joining the different secondary structures.

There is a good definition of the secondary structures, as the three α-helices are packed near the three β-strands. Strikingly, between β3 strand, α2 and α3 helices, there is a hydrophobic core, originated by a cluster of aromatic side chains which are attracted by non-covalent interactions, such as pi stacking. Moreover, as it is an acidic protein, there is a high concentration of negatively charged residues in the loops between β3 and α2, between α2 and α3, and in the first part of α3, which may play an important role in the inhibition of Cas9, as negative charges might imitate phosphates of nucleic acids.

AcrF1
On the other hand, there is another Acr, AcrF1, which may not have been as studied as the explained above, although there is a good description of its structure. It inhibits the I-F CRISPR-Cas system of Pseudomonas aeruginosa. Maxwell et al. solved the 3D structure using NMR.

The protein contains 78 residues, between which interact to form secondary structures. The structure of AcrF1 is formed of two anti-parallel α-helices and a β-sheet, which contains four anti-parallel β-strands. This β-sheet is placed in the contrary side of the α-helical part, which creates a hydrophobic core formed of 13 amino acids. Turns can also be found in different parts of the protein, for instance, joining the β-strands.

There are surface residues which actively participate in the active site of AcrF1, two of which are tyrosines (Y6 and Y20) and the third amino acid is a glutamic acid (E31), as their mutation by an alanine causes a 100-fold decrease in the activity of the protein (with Y20A and E31A mutations), and a 107-fold decrease when Y6 is mutated.

The different structures that form the protein create a strange combination, as Maxwell et al. conducted a DALI search in order to find similarities between other proteins, and they found no informative similarities.

Avoiding destruction of the phage DNA
The principal function of anti-CRISPR proteins is to interact with specific components of CRISPR-Cas systems, such as the effector nucleases, to avoid the destruction of the phage DNA (by binding or cleavage).

A phage introduces its DNA into a prokaryotic cell, usually the cell detects a sequence known as "target", that activates CRISPR-Cas immune system, but the presence of an initial sequence (before the target) encoding the formation of Acr proteins, avoids phage destruction. Acr proteins are formed before the target sequence is read. This way, the CRISPR-Cas system is blocked before it can develop a response.

The procedure starts with the CRISPR locus being transcribed into crRNAs (CRISPR RNA). CrRNAs combine with Cas proteins forming a ribonucleoprotein complex called Cascade. This complex surveys the cell to find complementary sequences of the crRNA. When this sequence is found, the Cas3 nuclease is recruited to the Cascade, and the target DNA from the phage is cleaved. But, for instance, when AcrF1 and AcrF2 are found (anti-CRISPR proteins), these interact with Cas7f and Cas8f-Cas5f, respectively, not allowing the binding to the phage DNA. Moreover, the cleaving of the target is prevented by the union between AcrF3 and Cas3.

[[File:Phage cooperation against CRISPR immunity.png|thumb|290x290px|Phage-phage cooperation: First phage infections may be unable to hamper the CRISPR immunity, but phage-phage cooperations increasingly boost Acr production and host immunosuppression, which produces an increase on the vulnerability of the host cell to reinfection, and finally allows a successful infection and spreading of a second phage.

'' Based on a representation found in the 17th reference. '' ]]

The majority of Acr genes are located next to anti-CRISPR-associated (Aca) genes, which encode proteins with a helix-turn-helix DNA-binding motif. Aca genes are preserved, and researchers are using them to identify Acr genes, but the function of the proteins they encode is not totally clear. The Acr-associated promoter produces high levels of Acr transcription just after the phage DNA injection into the bacteria takes place and, afterward, Aca proteins repress the transcription. If this wasn't repressed, the constant transcription of the gene would be lethal to the phage. Therefore, Aca activity is essential to ensure its survival.

Phage-phage cooperation
Moreover, it has been verified that bacteria with CRISPR-Cas systems are still partially immune to Acr. Consequently, initial abortive phage infections may be unable to hamper CRISPR immunity, but phage-phage cooperation can increasingly boost Acr production and promote immunosuppression, which might produce an increase on the vulnerability of the host cell to reinfection, and finally allow a successful infection and spreading of a second phage. This cooperation creates an epidemiological tipping point, in which, depending on the initial density of Acr-phages and the strength of CRISPR/Acr binding, phages can either be eliminated or originate a phage epidemic (the number of bacteriophages is amplified).

If the starting levels of phages are high enough, the density of immunosuppressed hosts reaches a critical point where there are more successful infections than unsuccessful ones. Then, an epidemic begins. If this point is not reached, phage extinction occurs, and immunosuppressed hosts recover their initial state.

Phage immune evasion
It has become clear that Acr proteins play an important role in allowing phage immune evasion, though it is still unclear how anti-CRISPR proteins synthesis can overcome the host’s CRISPR-Cas system, which can shatter the phage genome within minutes after the infection.

Mechanisms
Within all the Anti-CRISPR proteins that have been discovered so far, mechanisms have been described for only 15 of among them. These mechanisms can be divided into three different types: crRNA loading interference, DNA binding blockage and DNA cleavage prevention.

CrRNA loading interference
CrRNA (CRISPR RNA) loading interference mechanism has been mainly associated with the AcrIIC2 protein family. In order to block Cas9 activity, it prevents the correct assembly of the crRNA‐Cas9 complex.

DNA binding blockage
AcrIIC2 has been shown not to be the only one capable of blocking DNA binding. There are 11 other Acr family proteins that can also carry it out. Some among those are AcrIF1, AcrIF2, and AcrIF10, which act on different subunits of the Cascade effector complex of the type I‐F CRISPR‐Cas system, preventing the DNA to bind to the complex.

Furthermore, AcrIIC3 prevents DNA binding by promoting dimerization of Cas9 and AcrIIA2 mimics DNA, thereby blocking the PAM recognition residues and consequently preventing dsDNA (double-stranded DNA) recognition and binding.

DNA cleavage prevention
AcrE1, AcrIF3 and AcrIIC1 can prevent target DNA cleavage. Using X-ray crystallography, AcrE1 was discovered to bind to the CRISPR associated Cas3. Likewise, biochemical and structural analysis of AcrIF3 showed its capacity of binding to Cas3 as a dimer so as to prevent the recruitment of Cas3 to the Cascade complex. Finally, thanks to biochemical and structural AcrIIC1 studies, it was found that it binds to the active site of the HNH endonuclease domain in Cas9, which prevents DNA from cleaving. Thus, it turns Cas9 into an inactive but DNA bound state.

Reducing CRISPR-Cas9 off-target cuts
AcrIIA4 is one of the proteins responsible for the CRISPR-Cas9 system inhibition, the mechanism used in mammalian cells edition. Addition of AcrIIA4 in human cells avoids Cas9 interaction with the CRISPR system, reducing its ability to cut DNA. However, diverse studies have reached the conclusion that adding it in small proportions after the genome editing has been done, reduces the number of off-target cuts at the concrete sites in which Cas9 interacts, a thing that makes the whole system much more precise.

Avoiding ecological consequences
One of the main objectives of using CRISPR-Cas9 technology is eradicating diseases, some of which are found in disease vectors, such as mosquitoes. Anti-CRISPR proteins can impede gene drive, which could create uncertain and catastrophic consequences in ecosystems.

Detect presence of Cas9 in a sample
In order to know whether a certain bacterium synthesises Cas9, and therefore uses CRISPR-Cas9, or to detect accidental or not allowed use of this system, AcrIIC1 can be used. As the aforementioned protein binds to Cas9, a centrifugal microfluidic platform has been designed to detect it and determine its catalytic activity.

Phage therapy
Antibiotic resistance is a public health problem that is constantly increasing, because of the bad use of antibiotics. Phage therapy consists of the infection of bacteria using phages, which are much more specific and cause less side effects than antibiotics. Acrs could inhibit the CRISPR-Cas9 system of some bacteria and allow these phages to infect bacterial cells without being attacked by its immune system.