Auxiliary metabolic genes

Auxiliary metabolic genes (AMGs) are found in many bacteriophages but originated in bacterial cells. AMGs modulate host cell metabolism during infection so that the phage can replicate more efficiently. For instance, bacteriophages that infect the abundant marine cyanobacteria Synechococcus and Prochlorococcus (cyanophages) carry AMGs that have been acquired from their immediate host as well as more distantly-related bacteria. Cyanophage AMGs support a variety of functions including photosynthesis, carbon metabolism, nucleic acid synthesis and metabolism. AMGs also have broader ecological impacts beyond their host including their influence on biogeochemical cycling.

Classes
AMGs employ diverse functions including pathways not involved in metabolism despite what the name suggests. They are categorized in two classes based on their presence in the Kyoto Encyclopedia of Genes and Genomes (KEGG). AMGs do not encompass metabolic genes involved in typical viral functions, such as nucleotide and protein metabolism since their functions achieve direct viral reproduction, rather than augmenting host function to indirectly enhance it.

Class I
Class I AMGs encode for metabolism pathways in the cell and are found in KEGG. In particular, these genes are found in photosynthesis and carbon metabolism. psbA is almost a ubiquitous photosynthetic AMG for the photosystem Il reaction center D1 found in Synechococcus and Prochlorococcus cyanophages. Photosynthetic machinery for other reaction centers and electron transport are also found in many viruses infecting phototrophs. Phages encode for nearly all genes involved in carbon metabolism. In particular, viruses redirect host metabolism to increase dNTP biosynthesis for viral genome replication. glgA can induce starvation by converting glucose-6-phosphate to glycogen, forcing the host to compensate by deriving ribulose-5-phosphate from glyceraldehyde-3-phosphate and fructose-6-phosphate.

Class II
Class II AMGs encode for peripheral functions absent from the KEGG metabolic pathways. This includes genes typically involved in transport and assembly. Major representatives of this class are involved in balancing TCA cycle intermediates. Additionally, the acquisition of biogenic elements outside of carbon like phosphate, governed by pstS, are prevalent for this class. Confidence of AMG identification for Class II AMGs is reduced without a database for reference.

Abundance
Virus survival through inclusion of AMGs is governed by the laws of natural selection and has been made highly selective through co-evolution with their hosts. As such, the AMGs that confer a fitness advantage to the virus's ability to infect a host and reproduce will be more abundant. AMG abundance is largely dictated by the lifestyle of the virus, environmental conditions surrounding it, and host characteristics.

Lifestyle
Lytic and lysogenic viruses have different lifestyles which impact what AMGs they acquire. Lytic viruses tend to use AMGs to repurpose host cell metabolism and steal nutrients when in high cell density. Therefore, AMGs related to metabolism and transport are found more abundantly in lytic viruses. Lytic viruses also encompass a more diverse set of AMGs than lysogenic viruses, in part due to their larger host range and higher infection frequency. Temperate viruses, on the other hand, may employ AMGs to improve host fitness and virulence due to their often longer lifespan in the cell as a prophage. Gene density in these viruses is higher when compared to their lytic counterparts. Higher rates of HGT in lysogenic viruses allows for more AMG transfer but also lowers overall gene diversity.

Photosynthesis capacity has also been correlated to AMG diversity. Aphotic viral communities possess greater AMG diversity than those in the photic zone.

Environmental conditions
Pathways utilizing nutrients found in low concentrations in the local environment are generally found in higher abundance in the virus. In marine environments, AMGs can confer fitness advantages for both host and viruses under relatively nutrient-limited conditions compared to sediment and strong ultraviolet stress of water. In sunlit versus dark ocean waters, AMGs in distinct pathways are unequally distributed to reprogram host energy production and viral replication based on available nutrients. In sedimentary environments, carbon and sulfur metabolism AMGs are typically more prevalent to outcompete other organisms for the abundant resources.

Host factors
A virus's host range determines which host it can acquire AMGs from. Additionally, the abundance of a host surrounding a virus will affect its likelihood to acquire genes from the host. Virus populations increasingly occupy lytic lifestyles as bacterial production increases. The strong evolutionary connection between viruses and their hosts makes AMG acquisition mirror the host's own adaptation to its environment over time.

Synechococcus and Prochlorococcus are the most abundant picocyanobacteria, accounting for up to 50% of primary production in the marine environment. As such, many AMGs characterized have been discovered in phages of these host systems.

Identification
DRAM-v is the standard for AMG annotation of metagenome assembled genomes (MAGs) identified as viruses. DRAM-v searches the following databases for AMGs that match the input MAGs: Pfam, KEGG, UniProt, CAZy, MEROPS, VOGDB, and NCBI Viral RefSeq. KEGG can then be referenced to classify annotated AMGs through VIBRANT.

Cellular contamination
Since AMGs originate in hosts, distinguishing host and viral genes is critical for their study. This is not easily achieved as cultivation of viral-host systems in a laboratory setting proves challenging if even possible. Additionally, filtering out cellular sequences before entry in bioinformatic pipelines is not possible with cellular gene transfer agents and membrane vesicles are unable to distinguish from viruses due to their many shared properties at this step of analysis. The extent to which they have contaminated existing viral databases is unknown. Some genes have distinctions between host and viral versions such as cyanophage photosynthesis easing the task of computational distinction. The most definitive way developed to determine gene origin has been identification of taxonomically informative genes colocalized on assembled contigs. ViromeQC can display contamination for the dataset overall and DRAM-v assigns a confidence score for the AMG being on a viral MAG. Viral identification is most popularly performed by VIBRANT, VirSorter2, DeepVirFinder, and CheckV.

Genomic context
AMGs are not randomly distributed throughout genomes. Current research is being done to determine the genes that most commonly surround specific AMGs. Hyperplastic regions including the region between genes g15-g18 has been classified as locales where multiple AMGs have been inserted. Possible AMG contexts can be divided into locally collinear blocks (LCBs), or homologous regions shared by multiple viruses without rearrangements. AMGs have been found in just one or up to 14 LCBs. Those found in more diverse contexts have also shown up in variable locales within the LCB.

Acquisition mechanisms
Horizontal gene transfer (HGT) from host to virus allows for AMGs to be acquired. Gene transfer from host eukaryotes to viruses occur about twice as frequently as virus to host gene transfers due to a higher number viral recipients than donors. The vast majority of gene transfer occurs in double-stranded DNA viruses since they have large and flexible genomes, co-evolution with eukaryotes, and wide host breadth. Additionally, unicellular hosts more commonly transfer genes.

Transcriptional regulation
AMGs may influence gene expression by modulating the activity of transcription factors, which control the rate at which specific genes are transcribed into mRNA, thereby impacting the levels of corresponding proteins involved in metabolic pathways.

Enzyme modulation
Certain AMGs encode proteins that directly interact with enzymes involved in metabolic reactions. This interaction can either enhance or inhibit enzyme activity, leading to changes in the rate of metabolic flux through specific pathways.

Signaling pathways
AMGs may be integrated into cellular signaling pathways, influencing the transmission of signals related to energy status, nutrient availability, or stress. By modulating these signaling pathways, AMGs can indirectly regulate metabolic processes.

Biogeochemicalc cycling
AMGs have a large impact on biogeochemical cycles in multiple environments through nutrient degradation, mineralization, transportation, assimilation, and transformation. By enhancing the metabolic capabilities of their hosts, bacteriophages contribute to the recycling of organic matter, influencing the availability of nutrients for other organisms in the ecosystem. Lytic viruses in particular have been shown to increase ammonium oxidation, nitric oxide reduction, nitrification, and denitrification to balance nutrient levels in nitrogen polluted environments. Nutrient-enriched wetlands contain AMGs related to sulfur transport and metabolism. AMG modification of host processes is another means other than the viral shunt by which viruses can directly impact biogeochemical cycles.

Community structure
The ability of AMGs modulating the metabolic capacities of their hosts can influence the abundance and distribution of specific microbial taxa. In turn, this shapes the overall composition of microbial communities, with potential cascading effects on higher trophic levels.

Adaptation to environment
AMGs play a crucial role in microbial adaptation to environmental changes. In extreme environments, AMGs can encode for alternate energy pathways such as subunits of dissimilatory sulfite reductase. The ability of viruses to confer new metabolic traits to their hosts enhances the resilience of microbial communities facing shifts in temperature, nutrient availability, or other environmental stressors. AMGs can also serve as a genetic pool in shaping the evolution of their hosts.