User:Renxin.zju/tSMS technology



The tSMS technology (true Single Molecule Sequencing) is one of Massively parallel DNA sequencing, which has revolutionized many fields of biology by allowing the generation of sequence information on an unprecedented scale. Most of the sequence data generated thus far has been obtained with amplification-based sequencing systems, but single-molecule, non-amplification-based sequencing approaches are now possible, including at the scale of resequencing whole human genomes. It is based on the HelicosTM Genetic Analysis System, which consists of multiple components that work together as an integrated system. The fragments of DNA molecules are hybridized in place on disposable glass flow cells by using the Helicos Sample Loader in which the temperature can be adjusted for optimal hybridization. Each of the 25 channels on one standard flow cell can be addressed individually for addition of sample and any other needed sample preparation steps. Once the flow cells have been appropriately loaded with sample, they are inserted in the HeliScopeTM Sequencing System along with all the reagents necessary for sequencing by synthesis and imaging. The Sequencing System is then allowed to sequence as long as necessary with images being processed in real time by the HeliScope Analysis Engine. The Analysis Engine processes the images from each physical location and builds sequence reads from those images. Once the run is complete, the images processed, and strand formation complete, the data are downloaded to a compute cluster for reference alignment or assembly as needed. 

I Shearing
The Helicos Genetic Analysis System is capable of sequencing nucleic acids over a very broad range of template lengths, from several nucleotides to several thousand nucleotides, without the need for size selection in most situations. However, the yield of sequences per unit mass is dependent on the number of 3’ end hydroxyl groups, and thus having relatively short templates for sequencing is more efficient than having long templates. If starting with nucleic acids longer than 1000 nt, it is generally advisable to shear the nucleic acids to an average length of 100 to 200 nt so that more sequence information can be generated from the same mass of nucleic acids. For double-stranded DNA, the standard Helicos protocol for shearing employs a Covaris Adaptive Focused Acoustic instrument that allows good control of fragment size and, if used at the recommended power settings, 3’ ends compatible with terminal transferase tailing. Additionally, there are commercially available enzymatic shearing approaches that are also compatible with standard sample preparation techniques such as the Nextera system (Epicentre) and NEBNext dsDNA Fragmentase enzyme (New England Biolabs). For some applications, rather than shearing, it is desirable to cleave with restriction endonucleases or other specific cutters. In other cases, as with DNA from most ChIP, FFPE, and ancient or degraded samples, shearing is unnecessary as the starting material is already sufficiently short that further cleavage is not beneficial.

II Purification
After shearing, a size selection and purification of sample must be performed to remove very small fragments or the sequencing yield will be reduced. This is done with SPRI beads using The Agencourt AMPure XP system.

III Tailing
DNA samples are hybridized to a primer immobilized on a flow cell for sequencing, so it is usually necessary to generate a nucleic acid with an end compatible for hybridization to those surfaces. The target sequence attached to the flow cell surface could, in theory, be any sequence which can be synthesized, but, in practice, the standard commercially available flow cell is oligo(dT)50. To be compatible with the oligo(dT)50 primer on the flow cell surface, it is necessary to generate a poly(dA) tail of at least 50 nt at the 3’ end of the molecule to be sequenced. Because the fill and lock step will fill in excess A’s but not excess T’s, it is desirable for the A tail to be at least as long as oligo(dT) on the surface. Generation of a 3’ poly(dA) tail can be accomplished with a variety of different ligases or polymerases. If there is sufficient DNA to measure both mass and average length, it is possible to determine the proper amount of dATP to be added to generate poly(dA) tails 90 to 200 nucleotides long. To generate tails of this length, it is first necessary to estimate how many 3’ ends there are in the sample and then use the right ratio of DNA, dATP, and terminal transferase to obtain the optimal size range of tails.

IV Blocking
If the tailed DNA targeted for sequencing is hybridized to the flow cell directly after tailing, it would have a free 3’ hydroxyl that could be extended in the sequencing reaction just like the surface-bound primer and potentially confuse the sequence determination. Thus, prior to sequencing, it is also necessary to block the 3’ ends of the molecules to be sequenced. Any 3’ end treatment that makes the molecule unsuitable for extension can be used. Typically, tailed molecules are blocked using terminal transferase and a dideoxynucleotide, but any treatment that leaves a 3’ phosphate or other modification that prevents extension can be similarly effective. 

I Sample loading
The tSMS is carried out on a glass flow cell with 25 channels for the same or different samples. The system can be run with either one or two flow cells at a time. In the standard configuration, each channel is equivalent and holds approximately 8 μl. Samples are generally loaded with higher volume (usually 20 μl or more) to ensure even hybridization along the length of the flow cell. Samples are inserted into the flow cell via the sample loader included with the overall system. Each channel is individually addressable, and sample is applied using a vacuum. Hybridization to the flow cell is typically carried out at 55◦C for 1 hr.

II Fill and Lock
Generally, samples for sequencing are prepared in such a way that the poly(A) tail is longer than the oligo(dT)50 on the surface of the flow cell. To avoid sequencing the unpaired A residues, a fill and lock treatment is needed. After hybridization, the temperature is lowered to 37◦C, and then dTTP and Virtual TerminatorTM nucleotides corresponding to dATP, dCTP, and dGTP are added along with DNA polymerase. Virtual Terminator nucleotides incorporate opposite the complementary base and prevent further incorporation because of the chemical structure appended to the nucleotide. Thus, all of the unpaired dAs present in the poly(A) tail are filled in with dTTP. The hybridized molecule is locked in place when the polymerase encounters the first non-A residue and inserts the appropriate Virtual Terminator nucleotide. Because every DNA molecule should now have a dye attached, an image will include all molecules capable of nucleotide incorporation. Also, because the label could correspond to any base, no sequence information is obtained at this stage. Thus, for most molecules, sequencing commences with the second base of the original molecule.

III Sequencing
In order to sequence the hybridized DNAs, it is first necessary to cleave off the fluorescent dye and terminator moieties present on the Virtual Terminator nucleotides. The current generation of nucleotides is synthesized with a disulfide linkage that can be rapidly and completely cleaved. Following cleavage, the now-separated fluorescent dyes are washed away and then new polymerase and a single fluorescent nucleotide are added. After excitation of the fluorescent moiety by the system laser, another image is taken, and, on a standard sequencing run, this cyclic process is repeated 120 times. The number of sequencing cycles is user adjustable and can be modified depending on user needs for run time and length of read. During a standard run, two 25-channel flow cells are used, with each flow cell alternating between the chemistry cycle and the imaging cycle.
 * Chemistry Cycle


 * Imaging cycle

During the imaging process, four lasers illuminate 1100 Fields of View (FOV) per channel with pictures taken by four CCD (Charge-coupled device) cameras via a confocal microscope. Though single molecules are visualized, multiple photon emissions are registered for each molecule, with the time spent at each FOV dependent on the brightness of the dye in the particular nucleotide as well as camera speed and detection efficiency. At the present time, the imaging process is the rate-determining step, and run time could be reduced at the expense of throughput by reducing the number of FOV per channel. Similarly, improvements in camera technology or improved dyes could reduce the run time by lowering the amount of time spent with each image.



Anticipated results
For a standard 120-cycle, 1100 field-of view run, 12,000,000 to 20,000,000 reads that are 25 nucleotides or longer and align to the reference genome should be expected from each channel, for a total of up to 1,000,000,000 aligned reads and 35 Gb of sequence from each run. For degraded or low-quantity samples, yields will be lower. 

Time Considerations
The tSMS technology is a novel approach to DNA sequencing and genetic analysis and offers significant advantages. Helicos offers the first universal genetic analysis platform that does not require amplification. Pursuing a single molecule sequencing strategy simplifies the DNA sample preparation process, avoids PCR-induced bias and errors, simplifies data analysis, tolerates degraded samples.

The Advantages of True Single Molecule Sequencing
Starting with full-length genomic DNA, sample preparation to the point of having samples ready for loading can be done in 1 day. All steps involved in flow cell preparation and loading the flow cells and reagents onto the sequencing system take ∼5 hr. A standard, two-flow-cell, 1100 FOV run takes ∼8 days to complete, but shorter runs can be accomplished with a single flow cell or fewer FOVs.