User:Shirish25

What is Bowtie? Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. It aligns 35-base-pair reads to the human genome at a rate of 25 million reads per hour on a typical workstation. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: for the human genome, the index is typically about 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace alignment). Multiple processors can be used simultaneously to achieve greater alignment speed. Bowtie can also output alignments in the standard SAM format, allowing Bowtie to interoperate with other tools supporting SAM, including the SAMtools consensus, SNP, and indel callers. Bowtie runs on the command line under Windows, Mac OS X, Linux, and Solaris.

Bowtie also forms the basis for other tools, including TopHat: a fast splice junction mapper for RNA-seq reads, Cufflinks: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads, Crossbow: a cloud-computing software tool for large-scale resequencing data,and Myrna: a cloud computing tool for calculating differential gene expression in large RNA-seq datasets.

If you use Bowtie for your published research, please cite the Bowtie paper.

What isn't Bowtie? Bowtie is not a general-purpose alignment tool like MUMmer, BLAST or Vmatch. Bowtie works best when aligning short reads to large genomes, though it supports arbitrarily small reference sequences (e.g. amplicons) and reads as long as 1024 bases. Bowtie is designed to be extremely fast for sets of short reads where (a) many of the reads have at least one good, valid alignment, (b) many of the reads are relatively high-quality, and (c) the number of alignments reported per read is small (close to 1).

Bowtie does not yet report gapped alignments; this is future work.

Obtaining Bowtie You may download either Bowtie sources or binaries for your platform from the Download section of the Sourceforge project site. Binaries are currently available for Intel architectures (i386 and x86_64) running Linux, Windows, and Mac OS X.

Building from source

Building Bowtie from source requires a GNU-like environment that includes GCC, GNU Make and other basics. It should be possible to build Bowtie on a vanilla Linux or Mac installation. Bowtie can also be built on Windows using Cygwin or MinGW. We recommend TDM's MinGW Build. If using MinGW, you must also have MSYS installed.

To build Bowtie, extract the sources, change to the extracted directory, and run GNU make (usually with the command make, but sometimes with gmake) with no arguments. If building with MinGW, run make from the MSYS command line.

To support the -p (multithreading) option, Bowtie needs the pthreads library on posix platforms like linux or will try to use native Win32 threads on Windows. For threading synchronization bowtie is using by default a spinlocking mechanism. Spinlocking is in general much faster. However if the need arise to not use spinlocking bowtie can also be compiled using EXTRA_FLAGS=-DNO_SPINLOCK parameter.

The bowtie aligner bowtie takes an index and a set of reads as input and outputs a list of alignments. Alignments are selected according to a combination of the -v/-n/-e/-l options (plus the -I/-X/--fr/--rf/ --ff options for paired-end alignment), which define which alignments are legal, and the -k/-a/-m/-M/--best/--strata options which define which and how many legal alignments should be reported.

By default, Bowtie enforces an alignment policy similar to Maq's default quality-aware policy (-n 2 -l 28 -e 70). See the -n alignment mode section of the manual for details about this mode. But Bowtie can also enforce a simpler end-to-end k-difference policy (e.g. with -v 2). See the -v alignment mode section of the manual for details about that mode. The -n alignment mode and the -v alignment mode are mutually exclusive.

Bowtie works best when aligning short reads to large genomes (e.g. human or mouse), though it supports arbitrarily small reference sequences and reads as long as 1024 bases. Bowtie is designed to be very fast for sets of short reads where a) many reads have at least one good, valid alignment, b) many reads are relatively high-quality, c) the number of alignments reported per read is small (close to 1). These criteria are generally satisfied in the context of modern short-read analyses such as RNA-seq, ChIP-seq, other types of -seq, and mammalian resequencing. You may observe longer running times in other research contexts.

If bowtie is too slow for your application, try some of the performance-tuning hints described in the Performance Tuning section below.

Alignments involving one or more ambiguous reference characters (N, -, R, Y, etc.) are considered invalid by Bowtie. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy. Ambiguous characters in the read mismatch all other characters. Alignments that "fall off" the reference sequence are not considered valid.

The process by which bowtie chooses an alignment to report is randomized in order to avoid "mapping bias" - the phenomenon whereby an aligner systematically fails to report a particular class of good alignments, causing spurious "holes" in the comparative assembly. Whenever bowtie reports a subset of the valid alignments that exist, it makes an effort to sample them randomly. This randomness flows from a simple seeded pseudo-random number generator and is deterministic in the sense that Bowtie will always produce the same results for the same read when run with the same initial "seed" value (see --seed option).

In the default mode, bowtie can exhibit strand bias. Strand bias occurs when input reference and reads are such that (a) some reads align equally well to sites on the forward and reverse strands of the reference, and (b) the number of such sites on one strand is different from the number on the other strand. When this happens for a given read, bowtie effectively chooses one strand or the other with 50% probability, then reports a randomly-selected alignment for that read from among the sites on the selected strand. This tends to overassign alignments to the sites on the strand with fewer sites and underassign to sites on the strand with more sites. The effect is mitigated, though it may not be eliminated, when reads are longer or when paired-end reads are used. Running Bowtie in --best mode eliminates strand bias by forcing Bowtie to select one strand or the other with a probability that is proportional to the number of best sites on the strand.

Gapped alignments are not currently supported, but support is planned for a future release.

The -n alignment mode

When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq's default policy.

Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the "seed".

The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40.

The -n option is mutually exclusive with the -v option.

If there are many possible alignments satisfying these criteria, Bowtie gives preference to alignments with fewer mismatches and where the sum from criterion 2 is smaller. When the --best option is specified, Bowtie guarantees the reported alignment(s) are "best" in terms of these criteria (criterion 1 has priority), and that the alignments are reported in best-to-worst order. Bowtie is somewhat slower when --best is specified.

Note that Maq internally rounds base qualities to the nearest 10 and rounds qualities greater than 30 to 30. To maintain compatibility, Bowtie does the same. Rounding can be suppressed with the --nomaqround option.

Bowtie is not fully sensitive in -n 2 and -n 3 modes by default. In these modes Bowtie imposes a "backtracking limit" to limit effort spent trying to find valid alignments for low-quality reads unlikely to have any. This may cause bowtie to miss some legal 2- and 3-mismatch alignments. The limit is set to a reasonable default (125 without --best, 800 with --best), but the user may decrease or increase the limit using the --maxbts and/or -y options. -y mode is relatively slow but guarantees full sensitivity.

The -v alignment mode

In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.

If there are many legal alignments, Bowtie gives preference to alignments with fewer mismatches. When the --best option is specified, Bowtie guarantees the reported alignment(s) are "best" in terms of the number of mismatches, and that the alignments are reported in best-to-worst order. Bowtie is somewhat slower when --best is specified.

Strata

In the -n alignment mode, an alignment's "stratum" is defined as the number of mismatches in the "seed" region, i.e. the leftmost L bases, where L is set with the -l option. In the -v alignment mode, an alignment's stratum is defined as the total number of mismatches in the entire alignment. Some of Bowtie's options (e.g. --strata and -m use the notion of "stratum" to limit or expand the scope of reportable alignments.