Clinical metagenomic sequencing

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material (DNA or RNA) in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.

Outside of clinical medicine, similar work is done to identify genetic material in environmental samples, such as ponds or soil.

Definition
Next-generation sequencing uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail.

Laboratory workflow
A typical mNGS workflow consists of the following steps:
 * Sample acquisition: the most commonly used samples for metagenomic sequencing are blood, stool, cerebrospinal fluid (CSF), urine, or nasopharyngeal swabs. Among these, blood and CSF are the cleanest, having less background noise, while the others are expected to have a great amount of commensals and/or opportunistic infections and thus have more background noise. Samples should be collected with much caution as surgical specimens could be contaminated during handling of the biopsy; for example, lumbar punctures to obtain CSF specimens may be contaminated during the procedure.
 * RNA/DNA extraction: the DNA and the RNA of the sample is extracted by using an extraction kit. If there is a strong previous suspicion of the pathogen genome composition and since the amount of pathogen nucleic acid in more noise samples is overwhelmed by the RNA/DNA of other organisms, selecting an extraction kit of only RNA or DNA would be a more specific and convenient approach. Some commerciable available kits are for example RNeasy PowerSoil Total RNA kit (Qiagen), RNeasy Minikit (Qiagen), MagMAX Viral Isolation kit (ABI), Viral RNA Minikit (Qiagen).
 * Optimization strategies for library preparation: because of high levels of background noise in metagenomic sequencing, several target enrichment procedures have been developed that aim to increase the probability of capturing pathogen-derived transcripts and/or genomes. Generally there are two main approaches that can be used to increase the amount of pathogen signal in a sample: negative selection and positive enrichment.


 * 1) Negative selection (background depletion or subtraction) targets and eliminates the host and microbiome genomic background, while aiming to preserve the nucleic acid derived from the pathogens of interest. Degradation of genomic background can be performed through broad-spectrum digestion with nucleases, such as DNase I for DNA background, or by removing abundant RNA species (rRNA, mtRNA, globin mRNA) using sequence-specific RNA depletion kits. Also CRISPR-Cas9-based approaches can be performed to target and deplete human mitochrondrial RNA for example. Generally, however, subtraction approaches lead to a certain degree of loss of the targeted pathogen genome, as poor recovery may occur during the cleanup.
 * 2) Positive enrichment is used to increase pathogen signal rather than reducing background noise. This is commonly done through hybridization-based target capture by probes, which are used to pull out nucleic acid of interest for downstream amplification and sequencing. Panviral probes have been shown to successfully identify diverse types of pathogens in different clinical fluid and respiratory samples, and have been used for sequencing and characterization of novel viruses. However, the probe approach includes extra hybridization and cleanup steps, requiring higher sample input, increasing the risk of losing the target, and increasing the cost and hands-on time.
 * High-throughput sequencing: all the nucleic acids fragments of the library are sequenced. The sequencing platform to be used is chosen depending on different factors such as laboratory's research objectives, personal experience and skill levels. So far, the Illumina MiSeq system has proven to be the most commonly used platform for infectious disease research, pathogen surveillance, and pathogen discovery in research and public health. The instrument is compact enough to fit on a laboratory bench, has a fast runtime as compared to other similar platforms, and has a strong user support community. However, with further improvements of this technology and with additional error reduction and software stabilization, the MinION may be an excellent addition to the arsenal of current sequencing technologies for routine surveillance, especially in smaller laboratories with limited resources. For instance, the MinION was successfully used in the ZiBRA project for real-time Zika virus surveillance of mosquitoes and humans in Brazil, and in Guinea to perform real-time surveillance during the ongoing Ebola outbreak. In general, for limited resources IlluminaMiSeq, iSeq, Ion Torrent PGM, Oxford Nanopore, MinION are used. While for substantial resources Illumina NextSeq, NovaSeq, PacBio Sequel, Oxford Nanopore and PromethION are preferred. Moreover, for pathogen sequencing the use of controls is of fundamental importance ensuring mNGS assay quality and stability over time; PhiX is used as sequencing control, then the other controls include the positive control, an additional internal control (e.g., spiked DNA or other known pathogen) and a negative control (usually water sample).
 * Bioinformatic analysis: Whereas the sequencing itself has been made widely accessible and more user friendly, the data analysis and interpretation that follows still requires specialized bioinformatics expertise and appropriate computational resources. The raw data from a sequencing platform is usually cleaned, trimmed, and filtered to remove low-quality and duplicate reads. Removal of the host genome/transcriptome reads is performed to decrease background noise (e.g., host and environmental reads) and increase the frequency of pathogen reads. This step will also decrease downstream analysis time. Further background noise removal is achieved by mapping of sample reads to the reads from the negative control to ensure elimination of any contaminating reads, such as those associated with the reagents or sampling storage medium. The remaining reads are usually assembled de novo to produce long stretches of sequences called contigs. Taxonomic identification of the resulting contigs is performed by matching them to the genomes and sequences in nucleotide or protein databases; for this, various versions of BLAST are most commonly used. Advanced characterization of bacterial organisms can be also performed, allowing to obtain the necessary depth and breadth of coverage for genetic characterization results. Gene calling can be performed in a variety of ways, including RAST or using NCBI services at the time of full genome submission. Results of multiple annotation tools can be compared for accuracy and completeness and, if necessary, merged using BEACON. For characterization of antibiotic resistance genes, the Resistance Gene Identifier from the Comprehensive Antibiotic Resistance Database (CARD) is commonly used. To characterize virulence factor genes, ShortBRED offers analyses with a customized database from the Virulence Factor Database.

Applications in Infectious diseases
One way to detect these pathogens is detect part of their genome by metagenomics sequencing (Next Generation Sequencing-mNGS), which can be targeted or untargeted.

Targeted
Because of that, the sensitivity to detect microorganisms that are being targeted usually increases, but this comes with a limitation of the amount of identifiable pathogens.

Untargeted
The untargeted analysis is a metagenomic Shotgun sequencing approach. The whole DNA and/or RNA is sequenced with this approach using universal primers. The resultant mNGS reads can be assembled into partial or complete genomes. These genome sequences allow to monitor hospital outbreaks to facilitate infection control and public health surveillance. Also, they can be used for subtyping (identificate a specific genetic variant of a microorganism).

Untargeted mNGS is the most promising approach to analyse clinical samples and provide a comprehensive diagnosis of infections. Various groups have validated mNGS in Clinical Laboratory Improvement Amendments (CLIA), such as meningitis or encephalitis, sepsis and pneumonia. This method can be very helpful in the settings where no exact infectious etiology is suspected. For example, in patients with suspected pneumonia, identification of the underlying infectious etiology as in COVID-19 has important clinical and public health implications.

The traditional method consists on formulating a differential diagnosis on the basis of the patient's history, a clinical presentation, imaging findings and laboratory testing. But here it is suggested a different way of diagnosis; metagenomic next-generation sequencing (NGS) is a promising method because a comprehensive spectrum of a potential causes (viral, bacterial; fungus and parasitic) can be identify by a single assay.

Below are some examples of the metagenomic sequencing application in infectious diseases diagnosis.

Diagnosis of meningitis and encephalitis
The traditional method used to the diagnosis of infectious diseases has been challenged in some cases: neuroinflammatory diseases, lack of diagnostic tests for rare pathogens and the limited availability and volume of the Central Nervous System (CNS) samples, because of the requirement for invasive procedures. Owing to these problems, some assays suggest a different way of diagnosis, which is the metagenomic next-generation sequencing (NGS). Summarising, NGS can identify a broad range of pathogens in a single test.

Some studies evaluate the clinical usefulness of metagenomic NGS for diagnosis neurologic infections, in parallel with conventional microbiologic testing. It has been seen that the highest diagnostic yield resulted from a combination of metagenomics NGS of CSF and conventional testing, including serologic testing and testing of sample types other than CSF. Sometimes neurologic infections remain undiagnosed in a proportion of patients despite conventional testing.

The results of metagenomic NGS can also be valuable even when concordant with results of conventional testing, not only providing reassurance that the conventionally obtained diagnosis is correct but also potentially detecting or ruling out coinfections, specially in immunocompromised patients.

Study of antimicrobial resistance
Nowadays to detect resistances of different microbes is used a technique called Antibiotic Sensitivity (AST), but several studies have discovered that bacterial resistance is in the genoma and it is transferred by horizontal way (HGT), so sequencing methods are being developed to ease the identification and characterization of those genomes and metagenomes. For the moment exist the following methods to detect antimicrobial resistances:
 * Antibiotic sensitivity: an advantage about this method is that it gives information for patients treatment. There are also some disadvantages, one of them is that this technique is only useful in cultivable bacteria for which competent personal is required.
 * Sequencing methods: some advantages of those methods over the AST technique are that it is rapid and sensible, it is useful on both bacteria that grow on artificial media and those that do not, and it permits compare studies in several organisms. One type of sequencing method can be used in preference to another depending on the type of the sample, for a genomic sample assembly-based methods is used; for a metagenomic sample it is preferable to use read-based methods.

Metagenomic sequencing methods have provided better results than genomics, due to these present fewer false negatives. Within metagenomics sequencing, functional metagenomic is a powerful approach for characterizing resistomes; a metagenomic library is generated by cloning the total community DNA extracted from a sample into an expression vector, this library is assayed for antimicrobial resistance by plating on selective media that are lethal to the wild-type host. The selected inserts from the surviving recombinant, antimicrobial-resistant host cells are then sequenced, and resulting sequences are subsequently assembled and annotated (PARFuMS).

Functional metagenomics has enabled the discovery of several new antimicrobial resistance mechanisms and their related genes, one such example is the recently discovered tetracycline resistance mechanism by tetracycline destructases. It is important to incorporate not only the antimicrobial resistance gene sequence and mechanism but also the genomic context, host bacterial species and geographic location (metagenome).

Pandemic preparedness
Potentially dangerous pathogens such as ebolaviruses, coronaviruses etc., and the closest genetic relative for unknown pathogens, could be identified immediately, prompting further follow-up. Its role in the future of pandemic preparedness is anticipated and could exist as the earliest surveillance system we may have to detect outbreaks of unknown etiology and to respond in an opportune manner.

Clinical microbiome analyses
The use of mNGS to characterize the microbiome has made possible the development of bacterial probiotics to be administrated as pills, for example, as a treatment of Clostridium difficile-associated diseases.

Human host response analyses
The studying of genes expression allows to characterize a lot of infections, for example infections due to Staphylococcus aureus, Lyme disease, candidiasis, tuberculosis and influenza. Also, this approach can be used for cancer classification.

RNAseq analysis have a lot of other purposes and applications such as to identify novel or under appreciated host–microbial interactions directly from clinical samples, to make indirect diagnosis on the basis of a pathogen specific human host response and to discriminate infectious versus noninfectious causes of acute illness.

Clinical utility
Most of the metagenomics outcomes data generated consist of case reports which belie the increasing interest on diagnostic metagenomics.

Accordingly, there is an overall lack of penetration of this approach into the clinical microbiology laboratory, as making a diagnosis with metagenomics is still basically only useful in the context of case report but not for a true daily diagnostic purpose.

As of 2018, cost-effectiveness modelling of metagenomics in the diagnosis of fever of unknown origin concluded that, even after limiting the cost of diagnostic metagenomics to $100 – 1000 per test, it would require 2.5-4 times the diagnostic yield of computed tomography of the abdomen and pelvis in order to be cost neutral and cautioned against ‘widespread rush’ to deploy metagenomic testing.

Furthermore, in the case of the discovery of potential novel infectious agents, usually only the positive results are published even though the vast majority of sequenced cases are negative, thus resulting in very biased information. Besides, most of the discovery work based in metagenomic that precedes the diagnostic-based work even mentioned the known agents detected while screening unsolved cases for completely novel causes.

Laboratory validity
To date, most published testing has been run in an unvalidated, unreportable manner. The ‘standard microbiological testing’ that samples are subjected to prior to metagenomics is variable and has not included reverse transcription-polymerase chain reaction (RT-PCR) testing for common respiratory viruses or, routinely 16S/ITS PCR testing.

Given the relative costs of validating and performing metagenomic versus 16S/ITS PCR testing, the second one is considered an easier and more efficient option. A potential exception to the 16S/ITS testing is blood, given the huge amount of 16S sequence available, making clean cutoffs for diagnostic purposes problematic.

Furthermore, almost all of the organisms detected by metagenomics for which there is an associated treatment and thus would be truly actionable are also detectable by 16S/ITS testing (or 16S/ITS-NGS). This makes questionable the utility of metagenomics in many diagnostic cases.

One of the main points to accomplish laboratory validity is the presence of reference standards and controls when performing mNGS assays. They are needed to ensure the quality and stability of this technique over time.

Sense and sensitivity
In clinical microbiology labs, the quantitation of microbial burden is considered a routine function as it is associated with the severity and progression of the disease. To achieve a good quantitation a high sensitivity of the technique is needed.

Whereas interfering substances represent a common problem to clinical chemistry or to PCR diagnostics, the degree of interference from host (for example, in tissue biopsies) or nonpathogen nucleic acids (for example, in stool) in metagenomics is a new twist. In addition, due to the relative size of the human genome in comparison with microbial genomes the interference can occur at low levels of contaminating material.

Another challenge for clinical metagenomics in regards to sensitivity is the diagnosis of coinfections where there are present high-titer pathogens that can generate biased results as they may disproportionately soak up reads and make difficult to distinguish the less predominant pathogens.

In addition to issues with interfering substances, specially in the diagnosis area, accurate quantitation and sensitivities are essential as a confusion in the results can affect to a third person, the patient. For these reason, practitioners currently have to be keenly aware of the index-swapping issues associated with Illumina sequencing which can lead to trace incorrectly barcoded samples.

Since metagenomics has typically been used on patients for whom every other test to date has been negative, questions surrounding analytical sensitivity haven been less germane. But, for ruling out infections causes being one of the more important roles for clinical metagenomics it is essential to be capable to perform a deep enough sequencing to achieve adequate sensitivities. One way could be developing novel library preparation techniques.

Cost considerations
As of 2018, the Illumina monopoly on high-quality next-generation sequencing reagents has meant that the sequencing reagents alone cost more than FDA-approved syndromic testing panels. Also additional direct costs of metagenomics such as extraction, library preparation, and computational analysis have to be considered. In general, metagenomic sequencing is most useful and cost efficient for pathogen discovery when at least one of the following criteria are met:


 * 1) the identification of the organism is not sufficient (one desires to go beyond discovery to produce data for genomic characterization),
 * 2) a coinfection is suspected,
 * 3) other simpler assays are ineffective or will take an inordinate amount of time,
 * 4) screening of environmental samples for previously undescribed or divergent pathogens.