User:Wuiastate

Bioinformatics log

1. Install Vevelt for genome de novol assembly

>make

>make 'MAXKMERLENGTH=71' 'LONGSEQUENCES=1' ./velveth /media/F6861BE3861BA2E3/assemle 41,67,s -short -fastq /media/F6861BE3861BA2E3/Orhan_solexa/VDL3902.fastq ./velvetg output_directory/

./velvetg /media/F6861BE3861BA2E3/assemle_55 -min_contig_lgth 100 -exp_cov auto

valuable parameters for velvetg

Standard options:

-cov_cutoff 	: removal of low coverage nodes AFTER tour bus or allow the system to infer it		(default: no removal)

-read_trkg 		: tracking of short read positions in assembly (default: no tracking)

-min_contig_lgth 	: minimum contig length exported to contigs.fa file (default: hash length * 2)

-amos_file 		: export assembly to AMOS file (default: no export)

-exp_cov 	: expected coverage of unique regions or allow the system to infer it		(default: no long or paired-end read resolution)

Advanced options:

-unused_reads 		: export unused reads in UnusedReads.fa file (default: no)

Output:

directory/contigs.fa		: fasta file of contigs longer than twice hash length

directory/stats.txt		: stats file (tab-spaced) useful for determining appropriate coverage cutoff

directory/LastGraph		: special formatted file with all the information on the final graph

directory/velvet_asm.afg	: (if requested) AMOS compatible assembly file

Evaluate the Nodes, N50, Max node and total base for each Kmer

http://qiime.org/tutorials/processing_illumina_data.html  (processing illumina reads)

1# cat BS0H_NoIndex_L001_R1_001.fastq BS0H_NoIndex_L001_R1_002.fastq >BSOH.fastq

This file is sanger format -- 1 sequences have been converted. xiaoleideng@ubuntu:~/Documents/Scripts_Sanger$ maq fasta2bfa /media/F6861BE3861BA2E3/reference/CP001877.fna /media/F6861BE3861BA2E3/reference/CP001877.bfa -- 1 sequences have been converted.
 * 1) 2 Original illumina FASTQ file:/media/TnSeq_data/SampleBSOH_FQ
 * 2) 3 Check the fastq format to see if it is standard sanger format:~/Documents$ perl fastqFormatDetect.pl /media/TnSeq_data/SampleBSOH_FQ/BS0H_NoIndex_L001_R1_001.fastq
 * 1) 4 run maq
 * 2) 6 ref location:/media/F6861BE3861BA2E3/reference/CP001876.bfa
 * 3) 7 xiaoleideng@ubuntu:~/Documents/Scripts_Sanger$ maq fasta2bfa /media/F6861BE3861BA2E3/reference/CP001876.fna /media/F6861BE3861BA2E3/reference/CP001876.bfa

-- finish writing file '/media/F6861BE3861BA2E3/readsbfq/BS0H_NoIndex_L001_R1_001.bfq' -- 4000000 sequences were loaded.
 * 1) 8 xiaoleideng@ubuntu:~$ maq fastq2bfq /media/TnSeq_data/SampleBSOH_FQ/BS0H_NoIndex_L001_R1_001.fastq /media/F6861BE3861BA2E3/readsbfq/BS0H_NoIndex_L001_R1_001.bfq

-- maq-0.7.1 [ma_load_reads] loading reads... [ma_load_reads] set length of the first read as 50. [ma_load_reads] 894536*2 reads loaded. [ma_longread2read] encoding reads... 1789072 sequences processed. [ma_match] set the minimum insert size as 51. [match_core] Total length of the reference: 1635045 .......... [maq_indel_pe] the indel detector only works with short-insert mate-pair reads. [match_data2mapping] 735556 out of 1789072 raw reads are mapped with 0 in pairs. -- (total, isPE, mapped, paired) = (894536, 0, 735556, 0)
 * 1) 9 xiaoleideng@ubuntu:~$ maq map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_017.map /media/F6861BE3861BA2E3/reference/CP001876.bfa /media/F6861BE3861BA2E3/readsbfq/BS0H_NoIndex_L001_R1_017.bfq


 * 1) 10 xiaoleideng@ubuntu:~$ maq mapmerge /media/F6861BE3861BA2E3/maqmap/BS0H.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_001.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_002.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_003.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_004.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_005.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_006.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_007.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_008.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_009.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_010.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_011.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_012.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_013.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_014.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_015.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_016.map /media/F6861BE3861BA2E3/maqmap/BS0H_NoIndex_L001_R1_017.map


 * 1) 11 xiaoleideng@ubuntu:~$ maq mapview -bN /media/F6861BE3861BA2E3/maqmap/BS0H.map >/media/F6861BE3861BA2E3/maqmap/BS0H.aln.txt

my $mapview = "/media/F6861BE3861BA2E3/maqmap/BS0H.aln.txt"; open (MAPVIEW, "<$mapview") or die ("Couldn't open $mapview:$!\n");
 * 1) 12 Change the paths of insertsite.pl
 * 2) open and read in the mapview file

my $outfile = "/media/F6861BE3861BA2E3/maqmap/BS0H.aln.plot"; open (OUTFILE, ">$outfile") or die ("Couldn't open $outfile:$!\n");
 * 1) open a file for output to be read into

my $summary = "/media/F6861BE3861BA2E3/maqmap/BS0H.aln.summary"; open (SUMMARY, ">$summary") or die ("Couldn't open $summary:$!\n");
 * 1) open a file to store summary

my @plot = 0;

my $seqlen; if ($mapview =~ m/BS0H.aln.txt/) { $seqlen = 1635045; } elsif ($mapview =~ m/pHCM1/) { $seqlen = 218160; } elsif ($mapview =~ m/SL1344/) { $seqlen = 4878012; }
 * 1) Ty2 length = 4791961; pHCM1 length = 218160

Finish. The output is /media/F6861BE3861BA2E3/maqmap/BS0H.aln.plot.
 * 1) 13 xiaoleideng@ubuntu:~/Documents/Scripts_Sanger$ perl insertsite.pl /media/F6861BE3861BA2E3/maqmap/BS0H.aln.txt


 * 1) 14 xiaoleideng@ubuntu:/media/F6861BE3861BA2E3/reference$ dos2unix CP001876.txt dos2unix: converting file CP001876.txt to Unix format ...


 * 1) 15 Change the freqInsertionPerGene.pl
 * 2) my $tag = "pHCM1";

my $antn; if ($tag =~ m/Ty2/i){ $antn = "/media/F6861BE3861BA2E3/reference/CP001876.txt"; Please enter a .plot file to be processed: /media/F6861BE3861BA2E3/maqmap/BS0H.aln.plot Please enter the Salmonella DNA this refers to (Ty2, pHCM1 or SL1344): Ty2 Saved as /media/F6861BE3861BA2E3/maqmap/BS0H.aln.freq_inserts.txt
 * 1) xiaoleideng@ubuntu:~/Documents/Scripts_Sanger$ perl freqInsertionPerGene.pl

BWA_Samtools

xiaoleideng@ubuntu:/media/F6861BE3861BA2E3/Orhan_solexa/BWA_samtools$ bwa index -p IA3902genome IA3902genome.fasta

xiaoleideng@ubuntu:/media/F6861BE3861BA2E3/Orhan_solexa/BWA_samtools$ bwa aln IA3902genome /media/F6861BE3861BA2E3/Orhan_solexa/VDL3902.fastq >IA3902genome.bwa

xiaoleideng@ubuntu:/media/F6861BE3861BA2E3/Orhan_solexa/BWA_samtools$ bwa samse IA3902genome IA3902genome.bwa /media/F6861BE3861BA2E3/Orhan_solexa/VDL3902.fastq> VDL3902_IA3902.sam