SEA-PHAGES

SEA-PHAGES stands for Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science; it was formerly called the National Genomics Research Initiative. This was the first initiative launched by the Howard Hughes Medical Institute (HHMI) Science Education Alliance (SEA) by their director Tuajuanda C. Jordan in 2008 to improve the retention of Science, technology, engineering, and mathematics (STEM) students. SEA-PHAGES is a two-semester undergraduate research program administered by the University of Pittsburgh's Graham Hatfull's group and the Howard Hughes Medical Institute's Science Education Division. Students from over 100 universities nationwide engage in authentic individual research that includes a wet-bench laboratory and a bioinformatics component.

Curriculum
During the first semester of this program, classes of around 18-24 undergraduate students work under the supervision of one or two university faculty members and a graduate student assistant—who have completed two week-long training workshops—to isolate and characterize their own personal bacteriophage that infects a specific bacterial host cell from local soil samples. Once students have successfully isolated a phage, they are able to classify them by visualizing them through Electron microscope (EM) images. Also, DNA is extracted and purified by the students, and one sample is sent for sequencing to be ready for the second semester's curriculum.

The second semester consists of the annotation of the genome the class sent to be sequenced. In that case, students work together to evaluate the genes for start-stop coordinates, ribosome-binding sites, and possible functions of those proteins in which the sequence codes. Once the annotation is completed, it is submitted to the National Center for Biotechnology Information's (NCBI) DNA sequence database GenBank. If there is still time in the semester or the sent DNA was not able to be sequenced, the class could request genome file from the University of Pittsburgh that had yet to be sequenced. In addition to the laboratory and bioinformatic skills acquired, students have the opportunity to publish their work in academic journals and attend the national SEA-PHAGES conference in Washington, D.C. or a regional symposium.

PhagesDB
All of the details regarding each student's phage is made public by entering it into the online database PhagesDB to expand the knowledge of the SEA-PHAGES community as a whole.

Starterator
Starterator creates a report by comparing the called start sites of genes in the same Pham in annotated phage genomes and other drafts; therefore, students can suggest an appropriate start for the auto-annotated genes in their actinobacteriophage genome. This is not usually a primary source for calling a gene start because it is not always supported by the information from other programs or the start-stop coordinates are not the same for a gene called by DNA Master.

Local Blast and Blastp
These compare the amino acid sequence of a gene to other sequenced or annotated phage genomes within the database for students in the SEA-PHAGES community to predict starts and functions of their proteins.

GeneMark
This software generates a report with its algorithm that shows the coding potential for the six possible open reading frames of a specific genome, so the probability of a gene's existence can be assessed during annotation.

DNA Master
DNA Master is a free software tool that students can download on a Windows computer that utilizes the programs GLIMMER, GeneMark, Aragorn, and tRNAscan-SE to auto-annotate a genome that is uploaded as a FASTA format file. Since this is done by a computer algorithm that only uses three programs and may not be as updated as the online versions, each suggested gene has to be confirmed by student annotations. These go through several rounds of peer-review before it is accepted to be reviewed by experts from PhagesDB, then it can be submitted to GenBank.

GLIMMER
These programs are used by DNA Master to predict the starts of the genes by assessing the probability of the six open reading frames (ORFs) and the ribosome binding site (RBS) signals. Oftentimes, GLIMMER and GeneMark agree on the predictions during the auto-annotation, but sometimes they give different starts which have to be assessed during manual annotation; GLIMMER is currently the most updated software and is usually used for the final start coordinate.

Aragorn
This algorithm is utilized by DNA Master, and there is an online version that can be used to cross-reference the calls made by the software. It shows definitive tRNAs and tmRNAs within a genome by looking for very specific sequences that would fold into the distinctive cloverleaf secondary structure. Although this algorithm is considered very accurate considering how fast it produces results, it can miss some tRNAs that are not exactly within its search parameters.

tRNAscan-SE
This program allows students the ability to identify possible coding regions for tRNAs in sequence that would have been missed by Aragorn because it includes detection for unusual tRNA homologues; although, both programs have sensitivities between 99-100%. tRNAscan-SE does not detect tRNAs itself, but instead outputs the results of the information processed from three independent tRNA prediction programs: tRNAscan, EufindtRNA, and tRNA covariance model search.

Phamerator
Phamerator shows a visual representation of the genes and their similarity to other selected phage genomes by marking them with colored rectangles based on the Phamily or Pham it groups it in. Students can then view, compare, save, and print color-coded genome maps during their annotations. Possible insertions or deletions can be seen through connecting lines between the selected phage genomes. Also, the nucleotide and protein sequences can be accessed through this program; however, the starts and stops do not always match that of DNA Master so the sequences may be incorrect.

NCBI Blast and HHPred
These online programs are used to predict the functions of proteins by comparison of the amino acid or nucleotide sequences of all genomes sequenced, not just that of phages. HHPred detects homology in the sequences with other proteins that have had their functions called in any organism. Also, if the protein has been identified in another sequence, the computer-generated structure might be provided to visualize the possible folding of the amino acids.