Zoonotic origins of COVID-19

SARS-CoV-2, the causative agent of COVID-19, was first introduced to humans through zoonosis (transmission of a pathogen to a human from an animal), and a zoonotic spillover event is the origin of COVID-19 that is considered most plausible by the scientific community. Human coronaviruses including SARS-CoV-2 are zoonotic diseases that are often acquired through spillover infection from animals.

Background
Previous emergence of SARS-CoV-1 and MERS-CoV showed that Betacoronaviruses represent a risk for emergence of diseases threatening to humans. Increased awareness due to the 2002–2004 SARS outbreak motivated research into the potential for other coronavirus outbreaks and the animal reservoirs which could lead to them. Bats, in particular, are known to harbor persistent populations of coronaviruses, and under conditions of persistent infection, coronaviruses tend to accumulate mutations that allow their receptor binding domains to interact with cross-species orthologs of the target receptors. Based on serological and molecular studies, Chinese horseshoe bats were identified as the most likely reservoir for SARS-CoV-1. Bats were also a likely reservoir for the related betacoronavirus MERS-CoV, though the evidence for this is less conclusive than the role of camels as a reservoir for MERS. By 2010, in vitro experiments had confirmed that modifications to the spike protein receptor binding domain could enable human infection by several SARS-related coronaviruses. Virologists Rachel Graham and Ralph S. Baric at that time wrote, "that the question of emergence of another pathogenic human coronavirus from bat reservoirs might be more appropriately expressed as 'when' than as 'if'."

Origin in bats
The likely origin of SARS-CoV-2 from bats aligns with the origins of other coronaviruses in its genus Betacoronavirus, subgenus Sarbecovirus. Several bat species have special cellular mechanisms to resist proinflammatory cytokines associated with Betacoronavirus virulence, for example, spike proteins in SARS-related coronaviruses coevolve with bat ACE2 receptors in an evolutionary arms race. Bats, along with their viruses, have large overlapping geographic ranges in Southeast Asia, and there is a particularly great concentration and diversity of bat-related coronaviruses in Southern and Southwest China. The most similar known viruses to SARS-CoV-2 include bat coronaviruses RpYN06 with 94.5% identity, and RmYN02 with 93% identity RaTG13 was not the direct progenitor of SARS-CoV-2. Temmam et al. found no serological evidence for exposure to BANAL-52 among bat handlers and guano collectors in the area of Laos where it was sampled. Lytras et al. wrote that "SARS-CoV-2 can be unambiguously traced to horseshoe bats". They estimated SARS-CoV-2's most recent common ancestors with RmYN02 and RaTG13 to have diverged 40 and 50 years ago, respectively.

Novel features of SARS-CoV-2
The receptor binding domain of the SARS-CoV-2 spike protein has an insertion of amino acids between its S1 and S2 subunits. Among Sarbecoviruses, only SARS-CoV-2 and RmYN02 have such an insertion, suggesting differences in reservoir species, intermediate hosts, or evolutionary pathway. The receptor binding motif is the portion of SARS-CoV-2 that is most diverged from RaTG13. The SARS-CoV-2 receptor binding domain is more similar to those of pangolin coronaviruses. Viruses including BANAL-52 isolated from bats in Laos showed high similarity to the SARS-CoV-2 receptor binding domain in amino acid residues but less than 76.4% nucleotide identity across the spike protein. The observed binding of N-acetylneuraminic acid by the NTD of the spike protein and loss of that binding through mutation of the corresponding sugar binding pocket in emergent variants of concern has suggested a potential role for transient sugar-binding in the zoonosis of SARS-CoV-2, consistent with prior evolutionary proposals.

Within genus Betacoronavirus, furin cleavage sites are common in subgenera Merbecovirus and Embecovirus. Furin cleavage sites have independently evolved six times in different clades of Betacoronavirus. Furin cleavage sites also evolved in Alphacoronaviruses and Gammacoronaviruses independently of Betacoronavirus.

Furin cleavage contributes strongly to the transmissibility and pathogenicity of SARS-CoV-2 in humans. SARS-CoV-2 variants lacking the furin cleavage site are transmissible between humans, but much less effectively. The furin cleavage site is sometimes described as "polybasic" on account of its particular motif of basic amino acids. SARS-CoV-2 shares amino acid identity with a furin cleavage site of human ENaC α subunit. Human ENaC is identical only to that of a few great apes and Pipistrellus kuhli.

SARS-CoV-2 is also distinct among human coronaviruses for having a single intact ORF8 gene rather than "a" and "b" subunits.

Processes of host adaptation
SARS-CoV-2 has an expanded host range compared to SARS-CoV-1 and MERS-CoV. SARS-CoV-2 (along with SARS-CoV-1 and MERS-CoV) are generalist viruses, not specifically adapted to humans, meaning they have potential to spill over to many species and establish new natural reservoirs after adaptive evolutionary changes.

Within a single host, a variety of single-nucleotide variations arise through random mutations and genetic drift giving rise to viral quasispecies. SARS-CoV-2 mutates at a slower rate than is typical of RNA viruses. The main host-derived driver of mutation is editing by APOBEC proteins. Negative selection by host immune processes causes convergent evolution towards immune escape. Persistence of infection is correlated with quasispecies diversity, but the direction of causality for this is unknown.

Actual host susceptibility can differ significantly from in silico predictions. The process of host adaptation has been studied in humanized mice as well as by generating mouse-adapted viral strains through serial passage. Wild-type mice are only weakly susceptible to the original SARS-CoV-2 strain.

Recombination
Compared to other single-stranded RNA viruses, coronaviruses have increased tendency to undergo genetic recombination, which allows them to exchange genetic material with close relatives co-infecting the same organism. The origin of SARS-CoV-1 is believed to involve multiple recombination events. Recombination between strains of SARS-CoV-1 is common. Inferred phylogenies of SARS2-related coronaviruses might be explained by recombination. Recombination between various SARS-CoV-2 variants of concern has been reported.

Lytras et al. identified the SARS-CoV-2 spike open reading frame as a recombination hotspot. They speculate that the SARS-CoV-2 genome may involve repeated recombination events overprinting regions that were already themselves products of recombination. Temmam et al. wrote that due to the limited diffusion of bat viruses in mammals, co-infection required for recombination in mammals was unlikely. Therefore, they considered it more likely that the furin cleavage site arose in a bat reservoir prior to spillover.

Selection
After cross-species transmission of a virus, rapid evolution and positive selection are expected. Several studies found only weak signs of adaptive evolution early in the COVID-19 pandemic. Kang et al. wrote that SARS-CoV-2 had exhibited relatively little genetic variation by 2021. Tai et al. wrote that population expansion rather than positive selection explained the mutation frequency spectrum during the early pandemic. Cagliani et al. wrote that the SARS-CoV-2 genome overall shows evidence of "strong to moderate" purifying selection. Accessory open reading frames, especially ORF8, showed weak to neutral selection. The general lack of positive selection during the early outbreak of SARS-CoV-2 contrasted with the evolutionary course of SARS-CoV-1.

Strong evidence of positive selection was found however in the spike protein S1 subunit, which contains the receptor binding domain.

Genome instability in the spike protein is typical of coronaviruses in general, favoring the production of numerous spike protein variants. The receptor binding domain is a significant factor in host tropism, or the variety of species a virus can infect. Coronavirus adaptation to a new host often requires mutations in the receptor binding domain. Kang et al. identified a single nucleotide polymorphism relative to RaTG13 in the spike protein, consistent among all of more than 180,000 SARS-CoV-2 samples, affecting glycosylation of the receptor binding domain. Using a reverse genetics system to generate an ancestral-like mutant, they confirmed that the putative ancestral form of this SNP was much less transmissible in human cells.

Host proteases
The majority of sarbecoviruses are not dependent on ACE2 for cell entry. Guo et al. divide the sarbecoviruses into four clades, the first being ACE2-using respiratory viruses including SARS-CoV-1, SARS-CoV-2, and WIV1. Relative to clade 1, clades 3 and 4 have a one residue deletion in the receptor binding domain and diminished ability to use ACE2. Clade 2 has two deletions and does not interact with ACE2. Clades 2 through 4 are more difficult to isolate or propagate in cell culture, and consequently have been less studied. ACE2-independent infection by sarbecoviruses depends on high levels of trypsin, a digestive protein, in the host environment. Trypsin may compensate for other missing or compatible host proteases. The ubiquity of furin compared to tripsin would allow the gain of a furin cleavage site to expand viral tissue tropism. Guo et al. identified clade 1 and 2 sarbecoviruses in Rhinolophus fecal samples, suggesting that both types naturally replicate in the bat digestive tract. Bats with SARS-like infections in the wild showed no viral replication in respiratory droplets. A fecal-oral transmission is an alternative route for some respiratory viruses. No higher incidence of Sarbecoviruses has been reported in workers who come into direct contact with guano. Temmam et al. found that BANAL-236, a SARS-CoV-2-related virus isolated from bats in Laos, acts as an enteric virus in macaques.

Timing of spillover
In contrast to SARS-CoV-1, the initial outbreak of SARS-CoV-2 was reported in only one city. Chinese epidemiological surveillance did not report any other pneumonia outbreaks in autumn 2019. Graham and Baric wrote that in the case of SARS-CoV-1, virus populations circulated and adapted to civets and humans over the course of two years before the recognized outbreak.

Reverse zoonosis
Transmission of SARS-CoV-2 from humans to animals, known as reverse zoonosis or anthroponosis, is possible. Reverse zoonotic or experimental infection with SARS-CoV-2 has been reported in 31 animal species. It is not believed that wildlife plays a significant role in the ongoing circulation of SARS-CoV-2 among humans. SARS-CoV-2 variants adapted to mink were found co-circulating among humans and farmed mink. Mink are the only animal in Europe or North America to experience widespread SARS-CoV-2 outbreaks. Transmission from human to mink has occurred multiple times, in most cases not resulting in a sustained mink outbreak. Strong evidence was seen of positive selection in mink after spillover, concentrated in the receptor binding motif.

Transmission back to humans has been documented for mink, hamsters, and cats. Transmission to humans from deer is suspected. Transmission of SARS-CoV-2 among domestic cats has been confirmed.

Li et al. wrote that SARS-CoV-2 may be less able than SARS-CoV-1 to pass from humans to animals.

Hypothesized intermediate hosts
In the outbreak of SARS-CoV-1, palm civets, raccoon dogs, ferret badgers, red foxes, domestic cats, and rice field rats were possible vectors. Graham and Baric wrote that human and civet infections likely stemmed from an unknown common progenitor. Patrick Berche wrote that the emergences of SARS-CoV-1 and MERS-CoV appeared to be sequential processes involving intermediate hosts, co-infections, and recombination. In contrast with the rapid identification of animal hosts for SARS-CoV-1 and MERS-CoV, no direct animal source for SARS-CoV-2 has been found. Holmes et al. wrote that the lack of intermediate host is likely because the right animal has not been tested so far. Frutos et al. proposed that rather than a discrete spillover event, SARS-CoV-2 arose in accordance with a circulation model, involving repeated horizontal transfer among humans, bats, and other mammals without establishing significant reservoirs in any of them until the pandemic.

Pangolins have been considered a possible reservoir of SARS-CoV-2. Pangolins are sometimes sold in wet markets in China, where they are considered a culinary delicacy and a component of traditional medicine. The highest sequence similarity to the SARS-CoV-2 spike receptor binding domain was found in a coronavirus infecting Sunda pangolins in Guangdong province. Pangolins are frequently smuggled to China. Lytras et al. wrote that, consistent with a lack of reported infections of pangolins in Malaysia, they were likely infected after being trafficked into China.

The SARS-CoV-2 receptor binding domain shares more synonymous substitutions with a pangolin-CoV than RaTG13. The binding potential of SARS-CoV-2 to pangolin ACE2 however is very low. The receptor binding domain of SARS-CoV-2 has been hypothesized to originate from recombination of the relevant portion of pangolin-CoV with an RaTG13-like virus.

Deer mice are highly susceptible to SARS-CoV-2, making them potential reservoir or intermediate hosts. Raccoon dogs could be capable of transmitting SARS-CoV-2 to other animals under farm-like conditions. Transmission between raccoon dogs was shown under laboratory conditions. White-tailed deer are a potential vector for SARS-CoV-2.

Wet market hypothesis
Spillover has been proposed to have occurred at one or more wet markets in China. Wild and semi-wild game animals are commonly traded and consumed in China, and this practice has expanded in recent decades. The close contact among animals in unsanitary conditions creates the potential for diseases to thrive. In the 2002 outbreak in Guangdong, most living animals in the markets showed serological evidence of exposure to SARS-CoV-1. Many early cases were associated with the Huanan seafood market in Wuhan. The first documented COVID-19 infection was in a worker at a seafood stall in the Huanan market. Sequences from that patient have not been published; however, SARS-CoV-2 belonging to lineage B/L were detected in environmental samples from the patient's stall. No bats or pangolins were reported to be at the market between May 2017 and November 2019. Raccoon dogs, which are hypothesized to be succeptible to SARS-CoV-2, were at the market. A survey by He et al. in 2021 identified 102 mammal-affecting viruses in Chinese game animals, 65 of which were identified then for the first time. They did not find any SARS-like viruses, but did find evidence for the transmission of a MERS-like virus from bats to hedgehogs.

Between January 1 and March 30, epidemiologists from China CDC collected 457 samples from animal matter including carcasses and feces and collected 923 environmental samples. All animal samples tested negative for SARS-CoV-2. SARS-CoV-2 was found in 73 environmental samples. Live virus was isolated from three samples, two of which came from stalls belonging to known patients. No significant association was found between environmental virus titer and the type of product sold at particular stalls. SARS-CoV-2 was identified in all four sewer wells in the market. Overground drainage from the surrounding area concentrated under the market.

One environmental sample contained lineage A/S SARS-CoV-2, while all others belonged to lineage B/L. The sample containing lineage A/S also contained evidence of humans and livestock, but no wild animals. Overall, wildlife including racoon dogs were detected at very low levels and mostly associated with negative samples. Liu et al. wrote that there was not enough information to determine the origin of the virus. In particular, they wrote that the evidence does not prove the presence of an infected raccoon dog or the occurrence of multiple zoonotic spillovers at the market as proposed by Pekar et al.

Origins of variants
The World Health Organization defines variants of concern as a variant with evidence of increase transmissibility, severity, or immune escape. All variants of concern so far independently evolved from the original strain, rather than from each other.

The evolution of the Omicron variant appeared to be more rapid than other variants. Theories for the origin of the omicron variant include long-term circulation among humans outside areas where genetic surveillance was performed, mutation in an immunocompromised individual, or adaptation in an animal species after reverse zoonosis. The Omicron variant is proposed to have originated in mice.