User:Nielsrca

= Past Contributions =

CRISPR Cas9 Genome Engineering
Genome Editing utilizing the CRISPR-Cas9 system is carried out with a Type II CRISPR system. When utilized for genome editing this system includes Cas9, CRISPR RNA (crRNA), trans-activating crRNA (tracrRNA) along with an optional section of DNA repair template that is utilized in either Non-Homologous End Joining (NHEJ) or Homology Directed Repair (HDR).



Major Components of the CRISPR Cas9 System
When using the CRISPR Cas9 system for genome engineering a plasmid is often created and used to transfect the cells that one wants to edit. The main components of this plasmid are displayed in the image and listed in the table above. The crRNA needs to be designed for each specific application as this is the sequence that Cas9 will use to directly bind to the cell's DNA and as a result needs to be specific and only bind where editing is desired. The repair template will also need to be designed for each application as it must overlap with the hanging ends and codes for the insertion sequence.

One or more crRNA's and the tracrRNA can be packaged together to form a single-guide RNA (sgRNA). This sgRNA can be joined together with the Cas9 gene and made into a plasmid in order to be transfected into cells (see image for overview).



CRISPR Cas9 Structure
CRISPR Cas9 is a widely used system for genome editing due to its high degree of fidelity and relatively simple construction. CRISPR Cas9 depends primarily on two factors for its specificity – the CRISPR target sequence and the Protospacer Adjacent Motif (PAM). The CRISPR target sequence is 20 bases long and found as a part of each CRISPR locus in the crRNA array. Typically a crRNA array will have multiple unique CRISPR target sequences. Cas9 proteins select the correct location on the host's genome by utilizing the CRISPR target sequence for base pair bonding with the host DNA. The CRISPR target sequence is not part of the Cas9 protein and as a result is customizable and can be independently synthesized. On the other hand the PAM sequence on the host genome is recognized by the protein structure of Cas9 and generally cannot be easily modified to recognize a difference sequence. However this is not overly limiting as it is a short sequence and not very specific (eg. the SpCas9 PAM sequence is 5'-NGG-3' and in the human genome that is found roughly every 8 to 12 base pairs.

Once these have been assembled into a plasmid and transfected into cells the Cas9 protein with help of the crRNA finds the correct sequence in the host cell's DNA and – depending on the Cas9 variant – creates a single or double strand break in the DNA. Properly spaced single strand breaks in the host DNA can trigger homology directed repair which is less error prone than non-homologous end joining that typically follows a double strand break. Providing a section of DNA repair template allows for the insertion of a specific DNA sequence at an exact location within the genome. The repair template should extend 40 to 90 base pairs beyond the Cas9 induced DNA break. The goal is for the cell's HDR process to utilize the provided repair template and thereby incorporate the new sequence into the cell's genome. Once incorporated into the cell's genome this new sequence is now part of the cell's genetic material and will be found in it's daughter cells.

There are many online tools available to aid in designing effective sgRNA sequences (eg http://tools.genome-engineering.org) when designing a new CRISPR Cas9 plasmid.

Signed Networks
Protein-protein interactions are not monolithic events, that is, a protein is often modifying another protein in such a way that the resulting protein is often either ‘activated’ or ‘repressed’. While there are other options (proteins can form complexes with one another, physically degrade or modify another each other, transport a protein in or out of a compartment, etc.) the main protein-protein interactions that are commonly represented in network diagrams involve activation or repression.

Standard protein-protein interaction networks (directed or undirected) generally only indicate that two proteins are interacting and don't include more details about what type of interaction is occurring. Signed networks are one way that more useful information is being added to network diagrams. Signed networks are often expressed by labeling the interaction as either positive or negative. A positive interaction is one where the interaction results in one of the proteins being activated. Conversely a negative interaction results in one of the proteins being inactivated.

Protein-Protein interaction networks are often constructed as a result of lab experiments such as yeast two hybrid screens and ‘affinity purification and subsequent mass spectrometry’ techniques. However these methods do not provide the layer of information needed in order to determine what type of interaction is present in order to be able to attribute signs to the network diagrams.

RNA Interference Screens
RNA Interference (RNAi) screens (repression of individual proteins between transcription and translation) are one method that can be utilized in the process of providing signs to the protein-protein interactions. Individual proteins are repressed and the resulting phenotypes are analyzed. A correlating phenotypic relationship (ie where the inhibition of either of two proteins results in the same phenotype) indicates a positive, or activating relationship. Phenotypes that do no correlate (ie where the inhibition of either of two proteins results in two different phenotypes) indicate a negative or inactivating relationship. If protein A is dependent on protein B for activation then the inhibition of either protein A or B will result in a cell losing the service that is provided by protein A and the phenotypes will be the same for the inhibition of either A or B. If, however, protein A is inactivated by protein B then the phenotypes will differ depending on which protein is inhibited (inhibit protein B and it can no longer inactivate protein A leaving A active however inactivate A and there is nothing for B to activate since A is inactive and the phenotype changes). Multiple RNAi screens need to be performed in order to reliably appoint a sign to a given protein-protein interaction. Vinayagam et al. who devised this technique state that a minimum of nine RNAi screens are required with confidence increasing as one carries out more screens.

Function prediction methods
...

Structure-based methods
...

Computational Solvent Mapping


One of the challenges involved in protein function prediction is discovery of the active site. This is complicated by certain active sites not being formed - essentially existing - until the protein undergoes conformational changes brought on by the binding of small molecules. Most protein structures have been determined by X-ray crystallography which requires a purified protein crystal. As a result existing structural models are generally of a purified protein and as such lack the conformational changes that are created when the protein interacts with small molecules.

Computational Solvent Mapping utilizes probes (small organic molecules) that are computationally ‘moved’ over the surface of the protein searching for sites where they tend to cluster. Multiple different probes are generally applied with the goal being to obtain a large number of different protein-probe conformations. The generated clusters are then ranked based on the cluster’s average free energy. After computationally mapping multiple probes, the site of the protein where relatively large numbers of clusters form typically corresponds to an active site on the protein.

This technique is a computational adaptation of ‘wet lab’ work from 1996. It was discovered that ascertaining the structure of a protein while it is suspended in different solvents and then superimposing those structures on one another produces data where the organic solvent molecules (that the proteins where suspended in) typically cluster at the protein’s active site. This work was carried out as a response to realizing that water molecules are visible in the electron density maps produced by X-ray crystallography. The water molecules are interacting with the protein and tend to cluster at the protein's polar regions. This led to the idea of immersing the purified protein crystal in other solvents (e.g. ethanol, isopropanol, etc.) to determine where these molecules cluster on the protein. The solvents can be chosen based on what they approximate, that is, what molecule this protein may interact with (e.g ethanol can probe for interactions with the amino acid serine, isopropanol a probe for threonine, etc.). It is vital that the protein crystal maintains its tertiary structure in each solvent. This process is repeated for multiple solvents and then this data can be used to try and determine potential active sites on the protein. Ten years later this technique was developed into an algorithm by Clodfelter et al.

Protein Sequence Analysis: Ensembles


Proteins are often thought of as relatively stable structures that have a set tertiary structure and experience conformational changes as a result of being modified by other proteins or as part of enzymatic activity. However proteins have varying degrees of stability and some of the less stable variants are intrinsically disordered proteins. These proteins exist and function in a relatively 'disordered' state lacking a stable tertiary structure. As a result they are difficult to describe in a standard protein structure model that was designed for proteins with a fixed tertiary structure. Conformational ensembles have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state of intrinsically disordered proteins. Conformational ensembles function by attempting to represent the various conformations of intrinsically disordered proteins within an ensemble file (the type found at the |Protein Ensemble Database).

Protein ensemble files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try and determine the most likely set of conformations for an ensemble file.

There are multiple methods for preparing data for the |Protein Ensemble Database that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein’s amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected.

The molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (eg known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.