User:Dlugose/sandbox

This is the sandbox of Dan Dlugose.

Pangenomics is the study of how organisms adapt to their environment and what genomic features vary or stay the same within related species. More precisely, it is the study of the entire genomic repertoire of a given species or phylogenetic clade when multiple species are defined by systematics. According to the definition, gene profile (content) of a pangenome is divided into three groups: core (shared by all genomes), dispensable, and strain- (or isolate-) specific genes.

Pangenomics uses mathematical tools for the computational challenges associated with efficiently storing and querying large pangenomic data sets.

History The term pan-genome was introduced in 2005 in an article "Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome"". The article discussed methods to compare the genomes of continuously varying strains of group B Streptococcus (GSB). "To fully explore gene variability within the GBS species, we determined the complete genome sequence of the type Ia strain A909 and draft genome sequences (8× sequence coverage) of five additional strains, representing the five major serotypes. Comparative analysis of the six newly sequenced genomes and the two genomes already available in the databases suggests that a bacterial species can be described by its “pan-genome” (pan, from the Greek word π αν, meaning whole), which includes a core genome containing genes present in all strains and a dispensable genome composed of genes absent from one or more strains and genes that are unique to each strain. Surprisingly, unique genes were still detected after eight genomes were sequenced, and mathematical extrapolation predicts that new genes will still be found after sequencing many more strains. Thus, the genomes of multiple, independent isolates are required to understand the global complexity of bacterial species."