2015 evidence for an ancestral association of human coronavirus 229e with bats

44
1 Evidence for an ancestral association of human coronavirus 229E with bats 1 2 Victor Max Corman 1,2# , Heather J. Baldwin 3,4# , Adriana Fumie Tateno 1,5# , Rodrigo Melim 3 Zerbinati 1,5 , Augustina Annan 6 , Michael Owusu 6 , Evans Ewald Nkrumah 6 , Gael Darren 4 Maganga 7 , Samuel Oppong 8 , Yaw Adu-Sarkodie 8 , Peter Vallo 3,9 , Luiz Vicente Ribeiro 5 Ferreira da Silva Filho 5,10 , Eric M. Leroy 7,11 , Volker Thiel 12,13 , Lia van der Hoek 14 , Leo L. M. 6 Poon 15 , Marco Tschapka 3,16 , Christian Drosten 1,2* , Jan Felix Drexler 1,2* 7 8 1: Institute of Virology, University of Bonn Medical Centre, Bonn, Germany 9 2: German Centre for Infection Research (DZIF), partner site Bonn-Cologne, Germany 10 3: Institute of Evolutionary Ecology and Conservation Genomics, University of Ulm, Ulm, 11 Germany 12 4: Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia 13 5: Laboratório de Virologia (LIMHC-FMUSP), Instituto de Medicina Tropical de São Paulo, 14 Universidade de São Paulo 15 6: Kumasi Centre for Collaborative Research in Tropical Medicine (KCCR), Kumasi, Ghana 16 7: Centre International de Recherches Médicales de Franceville, Franceville, Gabon 17 8: Kwame Nkrumah University of Science and Technology, Kumasi, Ghana 18 9: Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, v.v.i., Brno 19 10: Instituto da Criança, Hospital das Clínicas da FMUSP, São Paulo, Brazil 20 11: Institut de Recherche pour le Développement, UMR 224 (MIVEGEC), IRD/CNRS/UM1, 21 Montpellier, France 22 12: Department of Infectious Diseases and Pathobiology, Vetsuisse Faculty Bern, University 23 of Bern, Switzerland 24 13: Federal Department of Home Affairs, Institute of Virology and Immunology, Bern and 25 Mittelhäusern, Switzerland 26 JVI Accepted Manuscript Posted Online 16 September 2015 J. Virol. doi:10.1128/JVI.01755-15 Copyright © 2015, American Society for Microbiology. All Rights Reserved.

Upload: others

Post on 11-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

1

Evidence for an ancestral association of human coronavirus 229E with bats 1 2 Victor Max Corman1,2#, Heather J. Baldwin3,4#, Adriana Fumie Tateno1,5#, Rodrigo Melim 3 Zerbinati1,5, Augustina Annan6, Michael Owusu6, Evans Ewald Nkrumah6, Gael Darren 4 Maganga7, Samuel Oppong8, Yaw Adu-Sarkodie8, Peter Vallo3,9, Luiz Vicente Ribeiro 5 Ferreira da Silva Filho5,10, Eric M. Leroy7,11, Volker Thiel12,13, Lia van der Hoek14, Leo L. M. 6 Poon15, Marco Tschapka3,16, Christian Drosten1,2*, Jan Felix Drexler1,2* 7 8 1: Institute of Virology, University of Bonn Medical Centre, Bonn, Germany 9 2: German Centre for Infection Research (DZIF), partner site Bonn-Cologne, Germany 10 3: Institute of Evolutionary Ecology and Conservation Genomics, University of Ulm, Ulm, 11 Germany 12 4: Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia 13 5: Laboratório de Virologia (LIMHC-FMUSP), Instituto de Medicina Tropical de São Paulo, 14 Universidade de São Paulo 15 6: Kumasi Centre for Collaborative Research in Tropical Medicine (KCCR), Kumasi, Ghana 16 7: Centre International de Recherches Médicales de Franceville, Franceville, Gabon 17 8: Kwame Nkrumah University of Science and Technology, Kumasi, Ghana 18 9: Institute of Vertebrate Biology, Academy of Sciences of the Czech Republic, v.v.i., Brno 19 10: Instituto da Criança, Hospital das Clínicas da FMUSP, São Paulo, Brazil 20 11: Institut de Recherche pour le Développement, UMR 224 (MIVEGEC), IRD/CNRS/UM1, 21 Montpellier, France 22 12: Department of Infectious Diseases and Pathobiology, Vetsuisse Faculty Bern, University 23 of Bern, Switzerland 24 13: Federal Department of Home Affairs, Institute of Virology and Immunology, Bern and 25 Mittelhäusern, Switzerland 26

JVI Accepted Manuscript Posted Online 16 September 2015J. Virol. doi:10.1128/JVI.01755-15Copyright © 2015, American Society for Microbiology. All Rights Reserved.

Page 2: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

2

14: Department of Medical Microbiology, Academic Medical Center of the University of 27 Amsterdam, Amsterdam, The Netherlands 28 15: School of Public Health, The University of Hong Kong, Hong Kong, Special 29 Administrative Region, People’s Republic of China 30 16: Smithsonian Tropical Research Institute, Balboa, Panama 31 32 #These authors contributed equally 33 *Addresses for correspondence: 34 Professor Dr. Christian Drosten, Institute of Virology, University of Bonn Medical Centre, 35 53127 Bonn, Germany, Tel.: +49-228-287-11055/Fax: +49-228-287-14433, e-mail: 36 [email protected] 37 Professor Dr. Jan Felix Drexler, Institute of Virology; University of Bonn Medical Centre, 38 53127 Bonn, Germany, Tel.: +49-228-287-11697/Fax: +49-228-287-14433, e-mail: 39 [email protected] 40 41 Running head: HCoV-229E-related bat coronaviruses 42 Abstract word count: 247 43 Text Body word count: 4,372 44 45 One-sentence summary: HCoV-229E-related bat coronaviruses are genetically highly 46 diversified and suggest HCoV-229E acquired major genomic deletions upon host switching, 47 potentially involving camelids as intermediate hosts. 48 49 Keywords: Africa, Coronavirus, Bats, Camelids, HCoV-229E, Zoonoses 50

Page 3: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

3

Abstract 51 We previously showed that close relatives of human coronavirus (HCoV)-229E exist in 52 African bats. The small sample and limited genomic characterizations prevented further 53 analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana 54 for HCoV-229E-related viruses by RT-PCR. Only hipposiderid bats tested positive. To 55 compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and 56 diagnostic specimens sampled globally over 10 years. Bat viruses were five- to sixfold more 57 diversified than HCoV-229E in RNA-dependent RNA polymerase (RdRp) and Spike genes. In 58 phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with 59 animal viruses. Bat viruses formed three large clades in close and more distant sister 60 relationship. A recently described 229E-related alpaca virus occupied an intermediate 61 phylogenetic position between bat and human viruses. According to taxonomic criteria, 62 human, alpaca and bat viruses form a single CoV species showing evidence for multiple 63 recombination events. HCoV-229E and the alpaca virus showed a major deletion in the Spike 64 S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat 65 CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3’-end. ORF8 66 also existed in the 229E-related alpaca virus. Re-analysis of HCoV-229E sequences showed a 67 conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its 68 loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary 69 origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as 70 intermediate hosts preceding the establishment of HCoV-229E. 71 72 Importance 73 The ancestral origins of major human coronaviruses (HCoV) likely involve bat hosts. Here, 74 we provide conclusive genetic evidence for an evolutionary origin of the common cold virus 75 HCoV-229E in hipposiderid bats by analyzing a large sample of African bats and 76

Page 4: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

4

characterizing several bat viruses on a full genome level. Our evolutionary analyses show that 77 animal and human viruses are genetically closely related, can exchange genetic material and 78 form a single viral species. We show that the putative host switches leading to the formation 79 of HCoV-229E were accompanied by major genomic changes including deletions in the viral 80 spike glycoprotein gene and loss of an open reading frame. We re-analyze a previously 81 described genetically related alpaca virus and discuss the role of camelids as potential 82 intermediate hosts between bat and human viruses. The evolutionary history of HCoV-229E 83 likely shares important characteristics with that of the recently emerged highly pathogenic 84 MERS-Coronavirus. 85 86

Page 5: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

5

Introduction 87 Coronaviruses (CoV) are enveloped viruses with a single-stranded, positive-sense contiguous 88 RNA genome of up to 32 kilobases. The subfamily Coronavirinae contains four genera 89 termed Alpha-, Beta-, Gamma- and Deltacoronavirus. Mammals are predominantly infected 90 by alpha- and betacoronaviruses, while gamma- and deltacoronaviruses mainly infect avian 91 hosts (1, 2). 92 93 Four human coronaviruses (HCoVs) termed HCoV-229E, -NL63, -OC43 and -HKU1 94 circulate in the human population and mostly cause mild respiratory disease (3). HCoV-229E 95 is frequently detected in up to 15% of specimens taken from individuals with respiratory 96 disease (4-6). Although HCoV-229E can be detected in fecal specimens, HCoVs generally 97 don’t seem to play a role in acute gastroenteritis (7-9). Severe respiratory disease with high 98 case-fatality rates is caused by severe acute respiratory syndrome (SARS)-CoV and Middle 99 East respiratory syndrome (MERS)-CoV which emerged recently. HCoV-229E and HCoV-100 NL63 belong to the genus Alphacoronavirus, while HCoV-OC43, HCoV-HKU1, SARS- and 101 MERS-CoV belong to the genus Betacoronavirus (1, 10). 102 103 In analogy to major human pathogens including Ebola virus, rabies virus, mumps virus and 104 hepatitis B and C viruses (11-16), the evolutionary origins of SARS- and MERS-CoV were 105 traced back to bats (17-22). The genetic diversity of bat CoVs described over the last decade 106 exceeds the diversity in other mammalian hosts (2). This has led to speculations on an 107 evolutionary origin of all mammalian CoVs in bat hosts (23). Bats share important ecological 108 features potentially facilitating virus maintenance and transmission, such as close contact 109 within large social groups, longevity, and the ability of flight (13, 24). 110 111

Page 6: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

6

How humans become exposed to remote wildlife viruses is not always clear (25). Human 112 infection with SARS-CoV and MERS-CoV was likely mediated by peri-domestic animals. 113 For SARS-CoV, the suspected source of infection were carnivores (26). Preliminary evidence 114 suggested that these carnivore hosts may also have adapted SARS-CoV for human infection 115 (27). For MERS-CoV, camelids are likely intermediate hosts, supported by circulation of 116 MERS-CoV in camel herds globally and for prolonged periods of time (28-30). Whether 117 MERS-CoV only recently acquired the capacity to infect humans in camelids is unclear. 118 The evolutionary origins of HCoV-229E are uncertain. In 2007, a syndrome of severe 119 respiratory disease and sudden death was recognized in captive alpacas from the U.S. (31) and 120 an alphacoronavirus genetically closely related to HCoV-229E was identified as the causative 121 agent (32). 122 In 2009, we detected viruses in fecal specimens from 5 of 75 hipposiderid bats from Ghana 123 and showed that these bat viruses were genetically related to HCoV-229E by characterizing 124 their partial RNA-dependent RNA polymerase (RdRp) and Nucleocapsid genes (33). Lack of 125 specimens containing high CoV RNA concentrations so far prevented a more comprehensive 126 characterization of those bat viruses to further address their relatedness to HCoV-229E. Here, 127 we tested more than 2,000 bats from Ghana for CoVs related to HCoV-229E. We describe 128 highly diversified bat viruses on a full genome level and analyze the evolutionary history of 129 HCoV-229E and the genetically related alpaca CoV. 130 131 132

Page 7: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

7

Materials and Methods 133 Bat and human sampling 134 Bats were caught in the Ashanti region, central Ghana, during 2009-2011 as described 135 previously (21). Archived anonymized respiratory specimens derived from patients sampled 136 between 2002-2011 were obtained from Hong Kong/China, Germany, The Netherlands, 137 Brazil and Ghana. 138 139 RNA purification, coronavirus detection and characterization 140 RNA was purified from approximately 20 mg of fecal material suspended in 500 µL 141 RNAlater stabilizing solution using the MagNA Pure 96 system (Roche Penzberg, Germany). 142 Elution volumes were 100 µL. Testing for CoV RNA was done using a real time RT-PCR 143 assay designed to allow detection of HCoV-229E and all genetically related bat CoVs known 144 from our pilot study (33). Oligonucleotide sequences were CoV229Elike-F13948m 145 TCYAGAGAGGTKGTTGTTACWAAYCT, CoV229Elike-P13990m FAM (6-146 Carboxyfuorescein)-TGGCMACTTAATAAGTTTGGIAARGCYGG-BHQ1 (Black Hole 147 Quencher 1) and CoV229Elike-R14138m CGYTCYTTRCCAGAWATGGCRTA. Testing 148 used the SSIII RT-PCR Kit (Life Technologies, Karlsruhe, Germany) with the following 149 cycling protocol in a LightCycler 480 (Roche, Penzberg, Germany): 20 min. at 50 ºC for 150 reverse transcription, followed by 3 min. at 95 ºC and 45 cycles of 15 sec. at 95 ºC, 10 sec. at 151 58 ºC and 20 sec. at 72 ºC. CoV quantification relied on cRNA in vitro transcripts generated 152 from TA-cloned peri-amplicons using the T7-driven Megascript (Life technologies, 153 Heidelberg, Germany) kit as described previously (34). Partial RdRp gene sequences from 154 real time RT-PCR-positive specimens were obtained as described previously (18). Full CoV 155 genomes and Spike gene sequences were generated for those specimens containing highest 156 CoV RNA concentrations using sets of nested RT-PCR assays (primers available upon 157 request) located along the HCoV-229E genome and designed to amplify small sequence 158

Page 8: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

8

islets. Sequence islets were connected by bridging long-range nested PCR using strain-159 specific primers (available upon request) and the Expand High Fidelity kit (Roche) on cDNA 160 templates generated with the Superscript III reverse transcriptase (Life Technologies). 161 162 Phylogenetic analyses 163 Bayesian phylogenetic reconstructions were made using MrBayes V3.1 (35) under 164 assumption of a GTR+G+I nucleotide substitution model for partial RdRp sequences and the 165 WAG amino acid substitution model for translated open reading frames (ORFs). Two million 166 generations were sampled every 100 steps, corresponding to 20,000 trees of which 25% were 167 discarded as burn-in before annotation using TreeAnnotator V1.5 and visualization using 168 FigTree V1.4 from the BEAST package (36). Neighbor-joining phylogenetic reconstructions 169 were made using MEGA5.2 (37) and a percentage nucleotide distance model, the complete 170 deletion option and 1,000 bootstrap replicates. Genome comparisons were made using 171 MEGA5.2 (37); SSE V1.1 (38) and recombination analyses were made using SimPlot V3.5 172 (39). 173 174 175

Page 9: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

9

Results 176 Specimens from 2,087 bats belonging to 11 species were available for PCR testing. Table 1 177 provides details on the overall sample composition and detection rates in individual bat 178 species. Only bats belonging to the family Hipposideridae tested positive in 81 of 1,853 179 specimens (4.4%). All positive-testing bats had been morphologically identified in the field as 180 either Hipposideros cf. ruber or H. abae. Those were the most abundant species within the 181 sample. No HCoV-229E-related RNA was detected in the 17 available specimens from H. 182 jonesi and H. cf. gigas. 183 184 An 816 nucleotide (nt) fragment from the RdRp gene was obtained from 41 of the 81 positive 185 specimens (GenBank accession nos. KT253259-KT253299). This fragment was used for 186 further analysis as the 816 nt sequence yields improved resolution in inference of phylogeny 187 as compared to shorter sequences derived from RT-PCR screening of field-derived samples 188 (2). To expand the available genomic data for HCoV-229E, the 816 nt RdRp fragment was 189 also sequenced from 23 HCoV-229E strains from patients sampled between 2002-2011 in 190 China, Germany, The Netherlands, Brazil, and Ghana. In addition, the 816 nt RdRp fragment 191 was sequenced from two historical HCoV-229E strains isolated in 1965 and the 1980ies (40) 192 (GenBank accession nos. KT253300-KT253323). In analogy to the official taxonomic 193 designation SARS-related CoV including human SARS-CoV and related CoVs from other 194 animals (1), we hereafter restrict usage of the term HCoV-229E to the human virus and refer 195 to the animal viruses as 229E-related CoV. Figure 1A shows a Bayesian phylogeny of the 196 partial RdRp gene. The bat virus diversity we observed in our pilot study (represented by 197 viruses Buoyem344 and Kwamang19) was expanded greatly. A phylogenetically basal virus 198 termed Kwamang8 obtained within our pilot study was not detected again, although the 199 present study contained specimens from the same cave and bat species. All human strains 200 occupied an apical phylogenetic position and were not intermixed with any of the animal 201

Page 10: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

10

viruses. The recently described alpaca 229E-related CoV (32) clustered with two viruses 202 obtained from hipposiderid bats in a parallel study from our groups in the Central African 203 country Gabon (41). The two Gabonese bat-associated viruses differed from the alpaca 229E-204 related CoV by only 3.2% nucleotide content within the RdRp fragment. Hipposiderid bat 205 CoVs were neither sorted by sampling sites, nor by their host species in their RdRp genes. 206 Overall, bat 229E-related CoVs sampled over 3 years differed up to 13.5% in their nt and 207 3.3% in their amino acid (aa) sequences. Although the HCoV-229E dataset used for 208 comparison was sampled over 50 years, the human-associated viruses showed 5-10fold less 209 genetic diversity than bat viruses with only 1.4% nt and 0.7% aa variation. Because of the 210 small sequence variation in HCoV-229E, Figure 1A contains only nine representative HCoV-211 229E strains. The neighbor-joining phylogeny shown in Figure 1B represents the high 212 sequence identity between all HCoV-229E strains determined in this study. 213 214 To analyze to which extent bat 229E-related CoV show genetic variation, the Spike gene 215 encoding the viral glycoprotein was characterized from 15 representative bat viruses (labeled 216 with a triangle in Figure 1A). Figure 1C shows a Bayesian phylogenetic tree of the bat 229E-217 related CoV Spike gene sequences and HCoV-229E full Spike sequences sampled over 50 218 years. The bat viruses formed three genetically diverse lineage, of which two phylogenetically 219 basal lineages contained bat viruses only. These lineages were sorted according to their 220 sampling sites Kwamang (abbreviated KW) and Akpafu Todzi (abbreviated AT). A third 221 lineage contained closely related bat viruses obtained from three different sample sites 222 separated by several hundred kilometers (Buoyem, Kwamang and Forikrom) (21). These data 223 suggested co-circulation of different Spike gene lineages within sampling sites as well as the 224 existence of separate lineages between sites. However, the small number of viruses 225 characterized from the phylogenetically basal bat clades 1 and 2 implies that caution should 226 be taken in assertions on geographically separated Spike gene lineages. The alpaca 229E-227

Page 11: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

11

related CoV and all HCoV-229E strains clustered in apical phylogenetic position compared to 228 the bat viruses. The most closely related bat viruses from lineage 1 differed from HCoV-229E 229 by 8.4-13.7%. The two other bat virus lineages were less related to HCoV-229E with 30.6-230 33.0% aa sequence distance. 231 232 Topologies of the Bayesian phylogenetic reconstructions of RdRp and Spike genes from bats 233 and the alpaca were not congruent, compatible with past recombination events across animal 234 229E-related CoVs. The high similarity of the RdRp gene of human HCoV-229E strains did 235 not allow comparisons of the RdRp-based with the Spike-based topology. To further 236 investigate the genomic relationships of bat 229E-related CoVs and HCoV-229E, the full 237 genomes were determined directly from fecal specimens from four representative bat viruses 238 (labeled with circles in Figures 1A and C). Figure 2A shows that bat 229E-related CoV 239 genomes comprise 28,014-28,748 nt, which exceeds the length of known HCoV-229E strains 240 by 844-1,479 nt. As shown in Figure 2B, HCoV-229E and all bat viruses were closely related 241 within the putative ORF1ab. This allowed the delineation of non-structural proteins (nsp) 1-242 16 for all bat viruses in analogy to HCoV-229E. Table 2 provides details on length and 243 cleavage sites of the predicted nsp. Sequence identity in seven concatenated nsp is used by the 244 International Committee for the Taxonomy of Viruses (ICTV) for CoV species designation 245 (1). As shown in Table 3, the four fully sequenced bat viruses showed translated aa sequence 246 identities of 93.3-97.1% with HCoV-229E. This was well above the 90% threshold 247 established by the ICTV, indicating all bat 229E-related CoVs and HCoV-229E form a single 248 species. Bat virus Kwamang8, which formed a phylogenetically basal sister-clade to the other 249 bat viruses and HCoV-229E, could not be sequenced on a full genome level. The aa sequence 250 of the partial RdRp gene of Kwamang8 differed by only 3.3% from other bat viruses and 251 HCoV-229E. Based upon previous comparisons of CoV RdRp sequences for tentative species 252 delineation (2, 18), Kwamang8 forms part of the same species as the other bat viruses and 253

Page 12: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

12

HCoV-229E. This CoV species would also include the recently described alpaca 229E-related 254 CoV (32), which showed 96.9-97.2% aa sequence identity with HCoV-229E and 94.2-97.8% 255 with the bat viruses in the seven concatenated nsp domains. 256 257 As shown in Figure 2A, all seven open reading frames (ORFs) known from HCoV-229E 258 were found in bat 229E-related CoVs in the sequence ORF1a/1b-Spike-ORF4-Envelope-259 Membrane-Nucleocapsid. Amino acid identities between predicted ORFs of the bat viruses 260 and HCoV-229E ranged from the 67.2-91.6% described above for the translated Spike genes 261 to 88.3-94.6% (ORF1ab), with bat virus lineage 1 consistently showing highest aa sequence 262 identities. Table 4 provides details for all sequence comparisons. 263 We looked for additional support for the existence of these predicted ORFs by analyzing the 264 sequence context at their 5’-termini. This is because in CoVs, ORFs are typically preceded by 265 highly conserved transcription regulatory sequence (TRS) elements (42). All putative ORFs 266 from bat-229E related CoVs showed high conservation of the typical HCoV-229E TRS core 267 sequence UCU C/A AACU and adjacent bases. Table 5 provides details on all putative TRS 268 elements within bat 229E-related CoV genomes. 269 270 Figure 3A shows Bayesian phylogenetic trees reconstructed for all individual ORFs. The 271 alpaca 229E-related CoV clustered in intermediate position between HCoV-229E and the bat 272 viruses in the ORF1ab and Spike, but with bat viruses only in Membrane, Envelope, 273 Nucleocapsid, and ORF4. The divergent topologies again suggested recombination events in 274 229E-related CoVs. To find further evidence for recombination events and identify genomic 275 breakpoints, 229E-related CoVs were analyzed by bootscanning. As shown in Figure 3B, 276 bootscanning supported multiple recombination events involving HCoV-229E, bat 229E-277 related CoVs and the alpaca 229E-related CoV. Major recombination breakpoints occurred 278 within the ORF1ab and the beginning of the Spike gene, compatible with previous analyses of 279

Page 13: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

13

CoV recombination patterns (2) and the divergent topologies between the RdRp and Spike 280 genes noted above. Bootscanning also suggested a potential genomic breakpoint within the 281 Spike gene, mapping to the borders of the S1 (associated with for receptor binding) and S2 282 domains (associated with membrane fusion). This would be consistent with previous evidence 283 supporting intra-Spike recombination events in bat-associated CoVs (43). To obtain further 284 support for potential intra-Spike recombination events, separate phylogenetic reconstructions 285 for the S1 and the S2 domains were made. As shown in Figure 3B, these phylogenetic 286 reconstructions supported recombination events involving the alpaca 229E-related CoV and 287 HCoV-229E, but not the bat 22E-related CoVs. In the S1 domain, the alpaca 229E-related 288 CoV clustered with clinical HCoV-229E strains, while the HCoV-229E reference strain inf-1 289 isolated in 1962 clustered in phylogenetically basal sister relationship. Only in the S2 domain, 290 the intermediate position of the alpaca compared to bat and human 229E-related CoVs noted 291 before in comparisons of the full Spike was maintained. These data may hint at recombination 292 events between HCoV-229E and the alpaca virus and further supported genetic compatibility 293 between these two viruses belonging to one CoV species. 294 295 Three major differences existed between HCoV-229E, the alpaca 229E-related CoV and the 296 bat 229E-related CoVs. The first of these differences occurred in the putative ORF4. Similar 297 to HCoV-229E strains characterized from clinical specimens, a contiguous ORF4 existed in 298 all bat viruses that was 156-164 aa residues longer than the alpaca 229E-related CoV ORF4. 299 Re-analysis of the putative ORF4 sequence of the alpaca 229E-related CoV showed that this 300 apparently shorter ORF4 was due to an insertion of a single cytosine residue at position 181. 301 Without this putative insertion, the alpaca 229E-related CoV ORF4 showed the same length 302 as homologous ORFs in bat 229E-related CoVs and HCoV-229E. Since the HCoV-229E 303 ORF4 is known to accumulate mutations in cell culture (40), the apparently truncated ORF in 304 the alpaca 229E-related CoV isolate may thus not occur in vivo. The extended ORF4 of the 305

Page 14: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

14

alpaca 229E-related CoV would be most closely related to bat viruses from clade 1 with 5.5% 306 aa sequence distance, compared to at least 8.8% distance from HCoV-229E strains. 307 308 The second difference was a considerably longer S1 portion of the bat 229E-related CoV 309 Spike genes compared to HCoV-229E. Figure 4 shows that the three bat lineages contained 310 185-404 additional aa residues upstream of the putative receptor binding domain (44, 45) 311 compared to HCoV-229E. Bat lineage 1 which was phylogenetically most closely related to 312 HCoV-229E carried the smallest number of additional aa residues. Of note, the alpaca 229E-313 related CoV was identical to HCoV-229E in the number of aa residues within this region of 314 the Spike gene. 315 316 The third major difference was the existence of an additional putative ORF downstream of the 317 Nucleocapsid gene in all bat viruses. Non-homologous ORFs of unknown function 318 downstream the Nucleocapsid occur in several alpha- and betacoronaviruses, including Feline 319 infectious peritonitis virus (FIPV), Transmissible gastroenteritis virus of swine (TGEV), 320 Rhinolophus bat CoV HKU2, Scotophilus bat CoV 512, Miniopterus bat CoV HKU8 (23), the 321 Chaerephon bat CoVs BtKY22/BtKY41, the Cardioderma bat CoV BtKY43 (46) and bat 322 CoV HKU10 from Chinese Hipposideros and Rousettus species (47). In the genus 323 Betacoronavirus, only Bat CoV HKU9 from Rousettus and the genetically related Eidolon bat 324 CoV BtKY24 (46) carry additional ORFs at this genomic position. No ORF in the 3’-terminal 325 genome region is known from HCoV-229E. The alpaca 229E-related CoV contains an ORF at 326 this position termed ORFX by Crossley et al. (32). In analogy to consecutive numbers used to 327 identify HCoV-229E ORFs, we refer to this ORF as ORF8 hereafter. The putative TRS 328 context preceding ORF8 was conserved in all bat 229E-related CoV and in the alpaca 229E-329 related CoV, suggesting that a corresponding subgenomic mRNA8 may exist. The 3’-UTR of 330 bat 229E-related CoVs immediately followed the putative ORF8. This was supported by the 331

Page 15: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

15

existence of a conserved octanucleotide sequence and highly conserved stem elements 332 forming part of the pseudo-knot typically located at the 5’-end of alphacoronavirus 3’-UTRs 333 (48). As shown in Figure 5, HCoV-229E shows a high degree of sequence conservation 334 compared to bat 229E-related CoVs and the alpaca 229E-related CoV in this genomic region, 335 including a highly conserved putative TRS. Bioinformatic analyses (49-51) provided evidence 336 for the presence of two transmembrane domains in the predicted proteins 8 of the alpaca and 337 the genetically related bat 229E-related viruses. This may imply a role of the predicted protein 338 8 in coronaviral interactions with cellular or viral membranes. 339 As shown in Figure 5, one of the bat 229E-related CoV lineages represented by virus KW2E-340 F56 contained a highly divergent ORF8. In protein BLAST comparisons, the KW2E-F56 341 ORF8 showed limited similarity to the putative ORF7b of HKU10 and to the putative ORF8 342 located upstream the Nucleocapsid of a Nigerian Hipposideros betacoronavirus termed Zaria 343 bat CoV (47, 52). This may hint at cross-genus recombination events between different 344 hipposiderid bat CoVs in the past. However, overall aa sequence identity between these bat 345 CoV ORFs was very low with maximally 28.2%. As shown in Figure 6, only the central part 346 of these ORFs contained a stretch of 46 more conserved aa residues showing up to 39.1% 347 sequence identity and 47.8% similarity (Blosum62 matrix). The origin and function of the 348 divergent ORF8 thus remain to be determined. 349 350 351

Page 16: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

16

Discussion 352 We characterize highly diverse bat CoVs on a full genome level and show that these viruses 353 form one species together with HCoV-229E and a recently described virus from alpacas (32). 354 We analyze the genomic differences between human, bat and alpaca 229E-related CoVs to 355 elucidate potential host transitions during the formation of HCoV-229E. 356 357 A major difference between bat 229E-related CoVs and HCoV-229E was the Spike deletion 358 in HCoV-229E compared to the bat viruses. Interestingly, the bat 229E-related CoV lineage 1 359 which was phylogenetically most related to HCoV-229E also carried the smallest number of 360 additional aa residues. Most chiropteran CoVs are restricted to the gastrointestinal tract, 361 whereas HCoVs mainly replicate in the respiratory tract (2). The Spike deletion in HCoV-362 229E compared to ancestral bat viruses is thus noteworthy, since deletions in this protein have 363 been associated with changes in coronaviral tissue tropism. This is best illustrated by TGEV, 364 whose full-length Spike variants are associated with a dual tropism for respiratory and enteric 365 tract, whereas the deleted variant termed porcine respiratory CoV (PRCV) mainly replicates 366 in the respiratory tract (53). One could hypothesize that adaptation of bat 229E-related CoV 367 lineage 1 to both non-chiropteran hosts and to respiratory transmission may have been easier 368 compared to the other bat 229E-related CoV lineages. 369 Because the exact aa residues of the HCoV-229E RBD conveying cell entry are not known, it 370 is difficult to predict whether the bat viruses may interact with the HCoV-229E cellular 371 receptor Aminopeptidase N (45) or its Hipposideros homologue. Characterization of this bat 372 molecule and identification of permissive cell culture systems may allow initial susceptibility 373 experiments for chimeric viruses. Of note, although the alpaca 229E-related CoV was 374 successfully isolated (32), no data on receptor usage and cellular tropism are available so far 375 (2, 53). 376

Page 17: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

17

Another major difference was the existence of an ORF8 downstream the Nucleocapsid gene 377 in bat 229E-related viruses and the detection of putative sequence remnants of this ORF in 378 HCoV-229E. Hypothetically, deterioration of ORF8 in HCoV-229E could have occurred due 379 to loss of gene function in human hosts after zoonotic transmission from bats or intermediate 380 hosts. This may parallel gradual deletions in the SARS-CoV accessory ORF8 during the 381 human epidemic compared to bat SARS-related CoVs (54) and is consistent with 382 characterizations of HCoV-229E clinical strains showing high variability of this genomic 383 region (55). 384 385 The virus-host association between 229E-related CoVs and the bat genus Hipposideros is 386 strengthened by our virus detections in Hipposideros species in Ghana and in Gabon (41), 387 which is separated from Ghana by about 1,800 km. The observed link between 229E-related 388 alphacoronaviruses and hipposiderid bats is paralleled by the detections of genetically closely 389 related betacoronaviruses in different Hipposideros species from Ghana, Nigeria, Thailand 390 and Gabon (33, 41, 52, 56), suggesting restriction of these CoVs to hipposiderid bat genera. 391 Due to their proofreading capacity, CoVs show evolutionary rates of 10E-5 to 10E-6 392 substitutions per site per replication cycle, which is much slower than rates observed for other 393 RNA viruses (57, 58). Our data thus suggest a long evolutionary history of 229E-related 394 CoVs in Old World hipposiderid bats that greatly exceeds that of HCoV-229E in humans, 395 confirming previous hypotheses from our group (33). 396 397 The putative role of the alpaca 229E-related CoV in the formation of HCoV-229E is unclear. 398 Our data enable new insights into the evolutionary history of HCoV-229E. First, the alpaca 399 229E-related CoV contained an intact ORF8 which was genetically related to the homologous 400 gene in bat 229E-related CoVs. Second, genes of the alpaca CoV clustered either with bat 401 viruses only or in intermediate position between bat viruses and HCoV-229E. Because the 402

Page 18: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

18

alpaca 229E-related CoV showed the same deletion in its Spike gene as HCoV-229E 403 compared to bat 229E-related CoVs, it may be possible that alpacas represent a first host 404 switch from bats followed by a second inter-host transfer from alpacas to humans. The 405 relatedness of the alpaca 229E-related CoV to older HCoV-229E strains rather than to 406 contemporary ones reported by Crossley et al. would be compatible with this scenario (32). 407 However, the alpaca 229E-related CoV was reported only from captive animals in the U.S. 408 and whether this virus is indeed endemic in New World alpacas is unclear. Additionally, the 409 apparent intra-Spike recombination event may speak against a role of the alpaca virus as the 410 direct ancestor of HCoV-229E. Further analyses will be required to confirm this putative 411 recombination event, ideally including additional sequence information from old HCoV-229E 412 strains. Furthermore, a hypothetical direct transfer of Old World bat viruses to New World 413 alpacas appears geographically unfeasible. It would be highly relevant to investigate Old 414 World camelids for 229E-related CoVs that may have been passed on to captive alpacas and 415 that may represent direct ancestors of HCoV-229E. 416 Additional constraints to consider in the hypothetical role of camelids for the evolutionary 417 history of 229E-related CoVs is the time and place of putative host switches from bats. 418 Camels were likely introduced to Africa not earlier than 5,000 years ago from the Arabian 419 Peninsula (59, 60) and could not possibly come into direct contact with West African H. cf. 420 ruber or H. abae of the Guinean savanna. The majority of CoV species seems to be confined 421 to host genera (2). Therefore, it may be possible that 229E-related CoV transmission was 422 mediated through closely related species like H. tephrus, which occurs in the Sahel zone and 423 comes into contact to populations of H. cf. ruber distantly related to those from the Guinean 424 savanna (61). This bat species should be analyzed for 229E-related CoVs together with other 425 genera of the family Hipposideridae, like Asellia or Triaenops, which are desert-adapted bats 426 sharing their habitat with camelids both in Arabia and Africa and may harbor genetically 427 related CoVs. An important parallel to this evolutionary scenario is the role of camelids for 428

Page 19: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

19

the emerging MERS-CoV (30, 62), whose likely ancestors also occur in bats (20, 21). 429 However, we cannot rule out that the alpaca 229E-related CoV and HCoV-229E represent 430 two independent zoonotic acquisitions from 229E-related CoVs existing in hipposiderid bats 431 and potentially yet unknown intermediate hosts. 432 433 The existence of different serotypes in the expanded 229E-related CoV species is unclear. 434 CoV neutralization is mainly determined by antibodies against the S protein, and particularly 435 the S1 domain (63). The phylogenetic relatedness of the S1 domains from the alpaca 229E-436 related CoV and HCoV-229E suggests that these viruses form one serotype. The most closely 437 related bat 229E-related CoV lineage showed 8.4% aa sequence distance in the translated 438 Spike gene from HCoV-229E. This was comparable to the 7.8-18.6% aa distance between 439 FIPV, TGEV und canine CoV, which belong to one CoV species (Alphacoronavirus 1) and 440 for which cross-neutralization was observed (64). The about 30% Spike aa sequence distance 441 between the other bat 229E-related lineages and HCoV-229E were comparable to the distance 442 between HCoV-NL63 and HCoV-229E, which form two different serotypes (65). HCoV-443 229E thus likely forms one serotype that includes the alpaca 229E- and potentially the most 444 closely related bat 229E-related lineage, while the other bat 229E-related lineages may form 445 different serotypes. In our study, lack of bat sera and absence of bat 229E-related CoV 446 isolates prevented serological investigations. The generation of pseudotyped viruses carrying 447 bat 229E-related Spike motifs may allow future serological studies. Of note, our joint analyses 448 of Ghanaian patients with respiratory disease in this study and previous work from our group 449 investigating Ghanaian villagers (66) showed that Ghanaians were infected with the globally 450 circulating HCoV-229E, whereas no evidence of bat 229E-related CoV infecting humans was 451 found. If serotypes existed in 229E-related CoVs, serologic studies may thus aid to elucidate 452 putative exposure of humans and potential camelid intermediate hosts to these bat viruses. 453

Page 20: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

20

It should be noted that throughout Africa, bats are consumed as wild game (67) and humans 454 frequently live in close proximity of bat caves (68), including usage of bat guano as fertilizer 455 and drinking water from these caves (21). These settings potentially facilitate the exposure of 456 humans and their peri-domestic animals, including camelids, to these previously remote bat 457 viruses. 458 In summary, HCoV-229E may be a paradigmatic example of the successful introduction of a 459 bat CoV into the human population, possibly with camelids as intermediate hosts. 460 461 Acknowledgement 462 We thank Monika Eschbach-Bludau, Sebastian Brünink, Tobias Bleicker, Fabian Ebach and 463 Thierno Diawo Dallo at the Institute of Virology, Bonn, for technical assistance and Ebenezer 464 Kofi Badu, Priscilla Anti, Olivia Agbenyega, Florian Gloza-Rausch, Stefan Klose and 465 Thomas Kruppa for their help during the organization and conducting of field work. For all 466 capturing, sampling, and exportation of bat specimens, we obtained permission from the 467 respective countries’ authorities. This study was supported by the European Union FP7 468 projects EMPERIE (contract number 223498) and ANTIGONE (contract number 278976) 469 and by the German Research Foundation (DFG grants DR 772/3-1, KA1241/18-1 and DR 470 772/7-1, TH 1420/1-1). 471

Page 21: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

21

References 472 473 1. de Groot RJ, Baker, S.C., Baric, R., Enjuanes, L., Gorbalenya, A.E., Holmes, 474

K.V., Perlman, S., Poon, L., Rottier, P.J.M.,, Talbot PJ, Woo, P.C.Y. and 475 Ziebuhr, J. 2012. Family Coronaviridae, p 806-820. In King A, Adams M, Carstens 476 EB, Lefkowitz EJ (ed), Virus taxonomy: classification and nomenclature of viruses: 477 ninth report of the International Committee on Taxonomy of Viruses. 478 Elsevier/Academic Press, Amsterdam; Boston. 479

2. Drexler JF, Corman VM, Drosten C. 2014. Ecology, evolution and classification of 480 bat coronaviruses in the aftermath of SARS. Antiviral Res 101:45-56. 481

3. van der Hoek L. 2007. Human coronaviruses: what do they cause? Antivir Ther 482 12:651-658. 483

4. Walsh EE, Shin JH, Falsey AR. 2013. Clinical impact of human coronaviruses 229E 484 and OC43 infection in diverse adult populations. J Infect Dis 208:1634-1642. 485

5. Berkley JA, Munywoki P, Ngama M, Kazungu S, Abwao J, Bett A, Lassauniere 486 R, Kresfelder T, Cane PA, Venter M, Scott JA, Nokes DJ. 2010. Viral etiology of 487 severe pneumonia among Kenyan infants and children. JAMA 303:2051-2057. 488

6. Gaunt ER, Hardie A, Claas EC, Simmonds P, Templeton KE. 2010. Epidemiology 489 and clinical presentations of the four human coronaviruses 229E, HKU1, NL63, and 490 OC43 detected over 3 years using a novel multiplex real-time PCR method. J Clin 491 Microbiol 48:2940-2947. 492

7. Esper F, Ou Z, Huang YT. 2010. Human coronaviruses are uncommon in patients 493 with gastrointestinal illness. J Clin Virol 48:131-133. 494

8. Paloniemi M, Lappalainen S, Vesikari T. 2015. Commonly circulating human 495 coronaviruses do not have a significant role in the etiology of gastrointestinal 496 infections in hospitalized children. J Clin Virol 62:114-117. 497

Page 22: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

22

9. Risku M, Lappalainen S, Rasanen S, Vesikari T. 2010. Detection of human 498 coronaviruses in children with acute gastroenteritis. J Clin Virol 48:27-30. 499

10. de Groot RJ, Baker SC, Baric RS, Brown CS, Drosten C, Enjuanes L, Fouchier 500 RA, Galiano M, Gorbalenya AE, Memish Z, Perlman S, Poon LL, Snijder EJ, 501 Stephens GM, Woo PC, Zaki AM, Zambon M, Ziebuhr J. 2013. Middle East 502 Respiratory Syndrome Coronavirus (MERS-CoV); Announcement of the Coronavirus 503 Study Group. J Virol 87:7790-7792. 504

11. Drexler JF, Geipel A, Konig A, Corman VM, van Riel D, Leijten LM, Bremer 505 CM, Rasche A, Cottontail VM, Maganga GD, Schlegel M, Muller MA, Adam A, 506 Klose SM, Carneiro AJ, Stocker A, Franke CR, Gloza-Rausch F, Geyer J, Annan 507 A, Adu-Sarkodie Y, Oppong S, Binger T, Vallo P, Tschapka M, Ulrich RG, 508 Gerlich WH, Leroy E, Kuiken T, Glebe D, Drosten C. 2013. Bats carry pathogenic 509 hepadnaviruses antigenically related to hepatitis B virus and capable of infecting 510 human hepatocytes. Proc Natl Acad Sci U S A 110:16151-16156. 511

12. Drexler JF, Corman VM, Muller MA, Lukashev AN, Gmyl A, Coutard B, Adam 512 A, Ritz D, Leijten LM, van Riel D, Kallies R, Klose SM, Gloza-Rausch F, Binger 513 T, Annan A, Adu-Sarkodie Y, Oppong S, Bourgarel M, Rupp D, Hoffmann B, 514 Schlegel M, Kummerer BM, Kruger DH, Schmidt-Chanasit J, Setien AA, 515 Cottontail VM, Hemachudha T, Wacharapluesadee S, Osterrieder K, 516 Bartenschlager R, Matthee S, Beer M, Kuiken T, Reusken C, Leroy EM, Ulrich 517 RG, Drosten C. 2013. Evidence for novel hepaciviruses in rodents. PLoS Pathog 518 9:e1003438. 519

13. Drexler JF, Corman VM, Muller MA, Maganga GD, Vallo P, Binger T, Gloza-520 Rausch F, Rasche A, Yordanov S, Seebens A, Oppong S, Adu Sarkodie Y, 521 Pongombo C, Lukashev AN, Schmidt-Chanasit J, Stocker A, Carneiro AJ, Erbar 522 S, Maisner A, Fronhoffs F, Buettner R, Kalko EK, Kruppa T, Franke CR, Kallies 523

Page 23: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

23

R, Yandoko ER, Herrler G, Reusken C, Hassanin A, Kruger DH, Matthee S, 524 Ulrich RG, Leroy EM, Drosten C. 2012. Bats host major mammalian 525 paramyxoviruses. Nat Commun 3:796. 526

14. Leroy EM, Kumulungui B, Pourrut X, Rouquet P, Hassanin A, Yaba P, Delicat 527 A, Paweska JT, Gonzalez JP, Swanepoel R. 2005. Fruit bats as reservoirs of Ebola 528 virus. Nature 438:575-576. 529

15. Calisher CH, Childs JE, Field HE, Holmes KV, Schountz T. 2006. Bats: important 530 reservoir hosts of emerging viruses. Clin Microbiol Rev 19:531-545. 531

16. Quan PL, Firth C, Conte JM, Williams SH, Zambrana-Torrelio CM, Anthony 532 SJ, Ellison JA, Gilbert AT, Kuzmin IV, Niezgoda M, Osinubi MO, Recuenco S, 533 Markotter W, Breiman RF, Kalemba L, Malekani J, Lindblade KA, Rostal MK, 534 Ojeda-Flores R, Suzan G, Davis LB, Blau DM, Ogunkoya AB, Alvarez Castillo 535 DA, Moran D, Ngam S, Akaibe D, Agwanda B, Briese T, Epstein JH, Daszak P, 536 Rupprecht CE, Holmes EC, Lipkin WI. 2013. Bats are a major natural reservoir for 537 hepaciviruses and pegiviruses. Proc Natl Acad Sci U S A 110:8194-8199. 538

17. Wacharapluesadee S, Lumlertdacha B, Boongird K, Wanghongsa S, Chanhome 539 L, Rollin P, Stockton P, Rupprecht CE, Ksiazek TG, Hemachudha T. 2005. Bat 540 Nipah virus, Thailand. Emerg Infect Dis 11:1949-1951. 541

18. Drexler JF, Gloza-Rausch F, Glende J, Corman VM, Muth D, Goettsche M, 542 Seebens A, Niedrig M, Pfefferle S, Yordanov S, Zhelyazkov L, Hermanns U, 543 Vallo P, Lukashev A, Muller MA, Deng H, Herrler G, Drosten C. 2010. Genomic 544 characterization of severe acute respiratory syndrome-related coronavirus in European 545 bats and classification of coronaviruses based on partial RNA-dependent RNA 546 polymerase gene sequences. J Virol 84:11336-11349. 547

19. Ge XY, Li JL, Yang XL, Chmura AA, Zhu G, Epstein JH, Mazet JK, Hu B, 548 Zhang W, Peng C, Zhang YJ, Luo CM, Tan B, Wang N, Zhu Y, Crameri G, 549

Page 24: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

24

Zhang SY, Wang LF, Daszak P, Shi ZL. 2013. Isolation and characterization of a 550 bat SARS-like coronavirus that uses the ACE2 receptor. Nature 503:535-538. 551

20. Ithete NL, Stoffberg S, Corman VM, Cottontail VM, Richards LR, Schoeman 552 MC, Drosten C, Drexler JF, Preiser W. 2013. Close relative of human Middle East 553 respiratory syndrome coronavirus in bat, South Africa. Emerg Infect Dis 19:1697-554 1699. 555

21. Annan A, Baldwin HJ, Corman VM, Klose SM, Owusu M, Nkrumah EE, Badu 556 EK, Anti P, Agbenyega O, Meyer B, Oppong S, Sarkodie YA, Kalko EK, Lina 557 PH, Godlevska EV, Reusken C, Seebens A, Gloza-Rausch F, Vallo P, Tschapka 558 M, Drosten C, Drexler JF. 2013. Human betacoronavirus 2c EMC/2012-related 559 viruses in bats, Ghana and Europe. Emerg Infect Dis 19:456-459. 560

22. Woo PC, Wang M, Lau SK, Xu H, Poon RW, Guo R, Wong BH, Gao K, Tsoi 561 HW, Huang Y, Li KS, Lam CS, Chan KH, Zheng BJ, Yuen KY. 2007. 562 Comparative analysis of twelve genomes of three novel group 2c and group 2d 563 coronaviruses reveals unique group and subgroup features. J Virol 81:1574-1585. 564

23. Woo PC, Lau SK, Huang Y, Yuen KY. 2009. Coronavirus diversity, phylogeny and 565 interspecies jumping. Exp Biol Med (Maywood) 234:1117-1127. 566

24. Luis AD, Hayman DTS, O'Shea TJ, Cryan PM, Gilbert AT, Pulliam JRC, Mills 567 JN, Timonin ME, Willis CKR, Cunningham AA, Fooks AR, Rupprecht CE, 568 Wood JLN, Webb CT. 2013. A comparison of bats and rodents as reservoirs of 569 zoonotic viruses: are bats special? Proceedings of the Royal Society B: Biological 570 Sciences 280. 571

25. Wolfe ND, Dunavan CP, Diamond J. 2007. Origins of major human infectious 572 diseases. Nature 447:279-283. 573

26. Guan Y, Zheng BJ, He YQ, Liu XL, Zhuang ZX, Cheung CL, Luo SW, Li PH, 574 Zhang LJ, Guan YJ, Butt KM, Wong KL, Chan KW, Lim W, Shortridge KF, 575

Page 25: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

25

Yuen KY, Peiris JS, Poon LL. 2003. Isolation and characterization of viruses related 576 to the SARS coronavirus from animals in southern China. Science 302:276-278. 577

27. Graham RL, Baric RS. 2010. Recombination, reservoirs, and the modular spike: 578 mechanisms of coronavirus cross-species transmission. J Virol 84:3134-3146. 579

28. Chu D, Poon L, Gomaa MM, Shehata MM, Perera R, Abu Zeid D, El Rifay AS, 580 Siu LY, Guan Y, Webby RW, Ali MA, Peiris M, Kayali G. 2014. MERS 581 Coronaviruses in Dromedary Camels, Egypt Emerg Infect Dis 20. 582

29. Muller MA, Corman VM, Jores J, Meyer B, Younan M, Liljander A, Bosch BJ, 583 Lattwein E, Hilali M, Musa BE, Bornstein S, Drosten C. 2014. MERS coronavirus 584 neutralizing antibodies in camels, Eastern Africa, 1983-1997. Emerg Infect Dis 585 20:2093-2095. 586

30. Reusken CB, Haagmans BL, Muller MA, Gutierrez C, Godeke GJ, Meyer B, 587 Muth D, Raj VS, Smits-De Vries L, Corman VM, Drexler JF, Smits SL, El Tahir 588 YE, De Sousa R, van Beek J, Nowotny N, van Maanen K, Hidalgo-Hermoso E, 589 Bosch BJ, Rottier P, Osterhaus A, Gortazar-Schmidt C, Drosten C, Koopmans 590 MP. 2013. Middle East respiratory syndrome coronavirus neutralising serum 591 antibodies in dromedary camels: a comparative serological study. Lancet Infect Dis 592 13:859-866. 593

31. Crossley BM, Barr BC, Magdesian KG, Ing M, Mora D, Jensen D, Loretti AP, 594 McConnell T, Mock R. 2010. Identification of a novel coronavirus possibly 595 associated with acute respiratory syndrome in alpacas (Vicugna pacos) in California, 596 2007. J Vet Diagn Invest 22:94-97. 597

32. Crossley BM, Mock RE, Callison SA, Hietala SK. 2012. Identification and 598 characterization of a novel alpaca respiratory coronavirus most closely related to the 599 human coronavirus 229E. Viruses 4:3689-3700. 600

Page 26: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

26

33. Pfefferle S, Oppong S, Drexler JF, Gloza-Rausch F, Ipsen A, Seebens A, Muller 601 MA, Annan A, Vallo P, Adu-Sarkodie Y, Kruppa TF, Drosten C. 2009. Distant 602 relatives of severe acute respiratory syndrome coronavirus and close relatives of 603 human coronavirus 229E in bats, Ghana. Emerg Infect Dis 15:1377-1384. 604

34. Drexler JF, Kupfer B, Petersen N, Grotto RM, Rodrigues SM, Grywna K, 605 Panning M, Annan A, Silva GF, Douglas J, Koay ES, Smuts H, Netto EM, 606 Simmonds P, Pardini MI, Roth WK, Drosten C. 2009. A novel diagnostic target in 607 the hepatitis C virus genome. PLoS Med 6:e31. 608

35. Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian phylogenetic inference 609 under mixed models. Bioinformatics 19:1572-1574. 610

36. Drummond AJ, Suchard MA, Xie D, Rambaut A. 2012. Bayesian phylogenetics 611 with BEAUti and the BEAST 1.7. Mol Biol Evol 29:1969-1973. 612

37. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: 613 molecular evolutionary genetics analysis using maximum likelihood, evolutionary 614 distance, and maximum parsimony methods. Mol Biol Evol 28:2731-2739. 615

38. Simmonds P. 2012. SSE: a nucleotide and amino acid sequence analysis platform. 616 BMC Res Notes 5:50. 617

39. Lole KS, Bollinger RC, Paranjape RS, Gadkari D, Kulkarni SS, Novak NG, 618 Ingersoll R, Sheppard HW, Ray SC. 1999. Full-length human immunodeficiency 619 virus type 1 genomes from subtype C-infected seroconverters in India, with evidence 620 of intersubtype recombination. J Virol 73:152-160. 621

40. Dijkman R, Jebbink MF, Wilbrink B, Pyrc K, Zaaijer HL, Minor PD, Franklin 622 S, Berkhout B, Thiel V, van der Hoek L. 2006. Human coronavirus 229E encodes a 623 single ORF4 protein between the spike and the envelope genes. Virol J 3:106. 624

Page 27: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

27

41. Maganga GD, Bourgarel M, Vallo P, Dallo TD, Ngoagouni C, Drexler JF, 625 Drosten C, Nakoune ER, Leroy EM, Morand S. 2014. Bat distribution size or shape 626 as determinant of viral richness in african bats. PLoS One 9:e100172. 627

42. Thiel V, Herold J, Schelle B, Siddell SG. 2001. Infectious RNA transcribed in vitro 628 from a cDNA copy of the human coronavirus genome cloned in vaccinia virus. J Gen 629 Virol 82:1273-1281. 630

43. Corman VM, Ithete NL, Richards LR, Schoeman MC, Preiser W, Drosten C, 631 Drexler JF. 2014. Rooting the phylogenetic tree of middle East respiratory syndrome 632 coronavirus by characterization of a conspecific virus from an African bat. J Virol 633 88:11297-11303. 634

44. Breslin JJ, Mork I, Smith MK, Vogel LK, Hemmila EM, Bonavia A, Talbot PJ, 635 Sjostrom H, Noren O, Holmes KV. 2003. Human coronavirus 229E: receptor 636 binding domain and neutralization by soluble receptor at 37 degrees C. J Virol 637 77:4435-4438. 638

45. Hofmann H, Simmons G, Rennekamp AJ, Chaipan C, Gramberg T, Heck E, 639 Geier M, Wegele A, Marzi A, Bates P, Pohlmann S. 2006. Highly conserved 640 regions within the spike proteins of human coronaviruses 229E and NL63 determine 641 recognition of their respective cellular receptors. J Virol 80:8639-8652. 642

46. Tao Y, Tang K, Shi M, Conrardy C, Li KS, Lau SK, Anderson LJ, Tong S. 2012. 643 Genomic characterization of seven distinct bat coronaviruses in Kenya. Virus Res 644 167:67-73. 645

47. Lau SK, Li KS, Tsang AK, Shek CT, Wang M, Choi GK, Guo R, Wong BH, Poon 646 RW, Lam CS, Wang SY, Fan RY, Chan KH, Zheng BJ, Woo PC, Yuen KY. 647 2012. Recent transmission of a novel alphacoronavirus, bat coronavirus HKU10, from 648 Leschenault's rousettes to pomona leaf-nosed bats: first evidence of interspecies 649

Page 28: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

28

transmission of coronavirus between bats of different suborders. J Virol 86:11906-650 11918. 651

48. Williams GD, Chang RY, Brian DA. 1999. A phylogenetically conserved hairpin-652 type 3' untranslated region pseudoknot functions in coronavirus RNA replication. J 653 Virol 73:8349-8355. 654

49. Yachdav G, Kloppmann E, Kajan L, Hecht M, Goldberg T, Hamp T, 655 Honigschmid P, Schafferhans A, Roos M, Bernhofer M, Richter L, Ashkenazy H, 656 Punta M, Schlessinger A, Bromberg Y, Schneider R, Vriend G, Sander C, Ben-657 Tal N, Rost B. 2014. PredictProtein--an open resource for online prediction of protein 658 structural and functional features. Nucleic Acids Res 42:W337-343. 659

50. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. 2001. Predicting 660 transmembrane protein topology with a hidden Markov model: application to 661 complete genomes. J Mol Biol 305:567-580. 662

51. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ. 2015. The Phyre2 web 663 portal for protein modeling, prediction and analysis. Nat Protoc 10:845-858. 664

52. Quan PL, Firth C, Street C, Henriquez JA, Petrosov A, Tashmukhamedova A, 665 Hutchison SK, Egholm M, Osinubi MO, Niezgoda M, Ogunkoya AB, Briese T, 666 Rupprecht CE, Lipkin WI. 2010. Identification of a severe acute respiratory 667 syndrome coronavirus-like virus in a leaf-nosed bat in Nigeria. MBio 1. 668

53. Sanchez CM, Gebauer F, Sune C, Mendez A, Dopazo J, Enjuanes L. 1992. 669 Genetic evolution and tropism of transmissible gastroenteritis coronaviruses. Virology 670 190:92-105. 671

54. Anonymous. 2004. Molecular evolution of the SARS coronavirus during the course 672 of the SARS epidemic in China. Science 303:1666-1669. 673

Page 29: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

29

55. Farsani SM, Dijkman R, Jebbink MF, Goossens H, Ieven M, Deijs M, 674 Molenkamp R, van der Hoek L. 2012. The first complete genome sequences of 675 clinical isolates of human coronavirus 229E. Virus Genes 45:433-439. 676

56. Gouilh MA, Puechmaille SJ, Gonzalez JP, Teeling E, Kittayapong P, 677 Manuguerra JC. 2011. SARS-Coronavirus ancestor's foot-prints in South-East Asian 678 bat colonies and the refuge theory. Infect Genet Evol 11:1690-1702. 679

57. Eckerle LD, Becker MM, Halpin RA, Li K, Venter E, Lu X, Scherbakova S, 680 Graham RL, Baric RS, Stockwell TB, Spiro DJ, Denison MR. 2010. Infidelity of 681 SARS-CoV Nsp14-exonuclease mutant virus replication is revealed by complete 682 genome sequencing. PLoS Pathog 6:e1000896. 683

58. Eckerle LD, Lu X, Sperry SM, Choi L, Denison MR. 2007. High fidelity of murine 684 hepatitis virus replication is decreased in nsp14 exoribonuclease mutants. J Virol 685 81:12135-12144. 686

59. Mburu DN, Ochieng JW, Kuria SG, Jianlin H, Kaufmann B, Rege JE, Hanotte 687 O. 2003. Genetic diversity and relationships of indigenous Kenyan camel (Camelus 688 dromedarius) populations: implications for their classification. Anim Genet 34:26-32. 689

60. Epstein H, Mason IL. 1971. The origin of the domestic animals of Africa. Africana 690 Pub. Corp. 691

61. Vallo P, Guillén-Servent A, Benda P, Pires DB, Koubek P. 2008. Variation of 692 mitochondrial DNA in the Hipposideros caffer complex (Chiroptera: Hipposideridae) 693 and its taxonomic implications. Acta Chiropterologica 10:193-206. 694

62. Perera RA, Wang P, Gomaa MR, El-Shesheny R, Kandeil A, Bagato O, Siu LY, 695 Shehata MM, Kayed AS, Moatasim Y, Li M, Poon LL, Guan Y, Webby RJ, Ali 696 MA, Peiris JS, Kayali G. 2013. Seroepidemiology for MERS coronavirus using 697 microneutralisation and pseudoparticle virus neutralisation assays reveal a high 698

Page 30: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

30

prevalence of antibody in dromedary camels in Egypt, June 2013. Eurosurveillance 699 18:8-14. 700

63. Meyer B, Drosten C, Muller MA. 2014. Serological assays for emerging 701 coronaviruses: challenges and pitfalls. Virus Res 194:175-183. 702

64. Horzinek MC, Lutz H, Pedersen NC. 1982. Antigenic relationships among 703 homologous structural polypeptides of porcine, feline, and canine coronaviruses. 704 Infect Immun 37:1148-1155. 705

65. Hofmann H, Pyrc K, van der Hoek L, Geier M, Berkhout B, Pohlmann S. 2005. 706 Human coronavirus NL63 employs the severe acute respiratory syndrome coronavirus 707 receptor for cellular entry. Proc Natl Acad Sci U S A 102:7988-7993. 708

66. Owusu M, Annan A, Corman VM, Larbi R, Anti P, Drexler JF, Agbenyega O, 709 Adu-Sarkodie Y, Drosten C. 2014. Human coronaviruses associated with upper 710 respiratory tract infections in three rural areas of ghana. PLoS One 9:e99782. 711

67. Mickleburgh S, Waylen K, Racey P. 2009. Bats as bushmeat: a global review. Oryx 712 43:217-234. 713

68. Anti P, Owusu M, Agbenyega O, Annan A, Badu EK, Nkrumah EE, Tschapka 714 M, Oppong S, Sarkodie YA, Drosten C. 2015. Human–Bat Interactions in Rural 715 West Africa. Emerg Infect Dis 21. 716

69. da Silva Filho LV, Zerbinati RM, Tateno AF, Boas LV, de Almeida MB, Levi JE, 717 Drexler JF, Drosten C, Pannuti CS. 2012. The differential clinical impact of human 718 coronavirus species in children with cystic fibrosis. J Infect Dis 206:384-388. 719

70. Chibo D, Birch C. 2006. Analysis of human coronavirus 229E spike and 720 nucleoprotein genes demonstrates genetic drift between chronologically distinct 721 strains. J Gen Virol 87:1203-1208. 722 723 724

Page 31: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

31

Figure legends 725 Figure 1. Phylogenetic relationships of the genus Alphacoronavirus, HCoV-229E strains 726 and the novel bat viruses 727 A, Bayesian phylogeny of an 816 nucleotide RdRp gene sequence fragment corresponding to 728 positions 13,891-14,705 in HCoV-229E prototype strain inf-1 (GenBank accession no. 729 NC002645) using a GTR+G+I substitution model. SARS-coronavirus (CoV) was used as an 730 outgroup. Viruses with additional sequence information generated in this study were marked 731 with circles (full genome) or marked with triangles (Spike gene). Bat viruses detected in our 732 previous studies from Ghana (33) and Gabon are given in cyan (41). B, Neighbour-joining 733 phylogeny of the same RdRp gene fragment with a nucleotide percentage distance substitution 734 model and the complete deletion option. The tree was rooted against HCoV-NL63. Viruses 735 were coloured according to their origin. C. Bayesian phylogeny of the full Spike gene of bat 736 229E-related CoVs, the alpaca 229E-related CoV and HCoV-229E strains identified with 737 GenBank accession numbers and year of isolation, using a WAG amino acid substitution 738 model and HCoV-NL63 as an outgroup. The novel bat 229E–related CoVs are shown in 739 boldface and red. Branches leading to the outgroup were truncated for graphical reasons as 740 indicated by slashed lines. Values at nodes show support of grouping from posterior 741 probabilities or 1,000 bootstrap replicates (only values above 0.7 were shown). 742 743 Figure 2. Genome organization of 229E-related coronaviruses and relationships between 744 viruses from bats and humans 745 A, 229E-related CoV genomes represented by black lines; ORFs are indicated by grey arrows. 746 Locations of transcription-regulatory core sequences (TRS) are marked by black dots. HCoV-747 NL63 is shown for comparison. B, Similarity plots generated using SSE V1.1 (38) using a 748 sliding window of 400 and a step size of 40 nucleotides (nt). The HCoV-229E prototype 749 strain inf-1 was used with animal viruses identified in the legend. 750

Page 32: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

32

751 Figure 3. Bayesian phylogenies of major open reading frames and recombination 752 analysis of HCoV-229E and related animal viruses 753 A, Phylogenies were calculated with a WAG amino acid substitution model. The novel bat 754 viruses are shown in red. The alpaca CoV is shown in cyan. Filled circles, posterior 755 probability support exceeding 0.95, scale bar corresponds to genetic distance. Details on the 756 origin of HCoV-229E strain VFC408 which was generated for this study can be retrieved 757 from (69). Branches leading the outgroup HCoV-NL63 were truncated for graphical reasons. 758 B, Bootscan analysis using the Jukes-Cantor algorithm with a sliding window of 1,500 and a 759 step size of 300 nt. The HCoV-220E inf-1 strain was used with animal 229E-related viruses as 760 identified in the legend. C. Phylogenies of the S1 and S2 subunit were calculated according to 761 A. One representative HCoV-229E strain was selected per decade according to (70); GenBank 762 accession nos. DQ243974, DQ243964, DQ243984, DQ243967. 763 764 Figure 4. Amino acid sequence alignment of the 5’-end of the Spike gene of HCoV-229E 765 and related animal viruses 766 Amino acid alignment of the first part of the Spike gene of 229E-related CoVs including four 767 bat 229E-related CoVs, the alpaca 229E-related CoV and the HCoV-229E inf-1 strain. 768 Conserved amino acid residues are marked in black, sequence gaps are represented by 769 hyphens. 770 771 Figure 5. Nucleotide sequence alignment of the genomic 3’-end of HCoV-229E and 772 related animal viruses 773 Nucleotide alignment of the genome region downstream the Nucleocapsid gene including 774 four bat 229E-related CoV, the alpaca 229E-related CoV and representative HCoV-229E full 775 genomes identified with GenBank accession number or strain name. Dots represent identical 776

Page 33: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

33

nucleotides, hyphens represent sequence gaps. Grey bars above alignments indicate open 777 reading frames and the beginning of the poly-A tail. The putative start and stop codon of 778 ORF8 is labelled lime green, the corresponding putative TRS element is marked blue. The 779 conserved genomic sequence elements and the highly conserved stem elements forming part 780 of the pseudo-knot (PK) were marked with grey and purple background. 781 782 Figure 6. Amino acid sequence alignment of the putative ORF8 from a bat 229E-related 783 coronavirus and closest hits from two other hipposiderid bat coronaviruses 784 Conserved amino acid residues between sequence pairs are highlighted in color according to 785 amino acid properties, sequence gaps are represented by hyphens. The central domain 786 showing higher sequence similarity between compared viruses is boxed for clarity. The 229E-787 related alphacoronavirus KW2E-F56 from a Hipposideros cf. ruber detected in this study is 788 given in red, the alphacoronavirus HKU10 originated from a Chinese H. pomona, the 789 betacoronavirus Zaria originated from a Nigerian H. gigas. 790 791

Page 34: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

0.2

partial RdRp

HCoV/SARS/Tor2/Hom sap/CAN/2003/AY274119

BtCoV/GKwam19/Hip spec/GHA/2008/FJ710046

BtCoV/GKwam8/Hip sp/GHA/2008/FJ710045

AlpacaCoV/CA08-1/Vic pac/USA/2008/JQ410000

HCoV/NL63/Hom sap/NED/1988/AY518894

BtCoV/344/Hip sp/GHA/2008/FJ710043

HCoV/229E/Hom sap/GBR/1963/NC 002645

MinkCoV/WD1133/Mus vis/USA/1998/HM245926FCoV/FIPV79-1146/Fel sil/USA/1980/DQ010921

BtCoV/HKU8/Min pus/CHN/2008/GU190248

BtCoV/512/Sco kuh/CHN/2005/DQ648858PEDV/CV777/Sus scr/BEL/1977/AF353511

BtCoV/1A/Min mag/CHN/2005/EU420138

BtCoV/HKU2/GD430/Rhi sin/CHN/2006/EF203064

HCoV/229E/HK4-27/Hom sap/CHN/2006

HCoV/229E/VFC408/Hom sap/BRA/2007

HCoV/229E/BN53/Hom sap/GER/2008

BtCoV/BUO2E-F101/Hip aba/GHA/2011BtCoV/BUO1E-F102/Hip cf. rub/GHA/2011BtCoV/BUO1E-F82/Hip cf. rub/GHA/2011BtCoV/BUO2E-F39/Hip cf. rub/GHA/2011BtCoV/BUO2B-F210/Hip cf. rub/GHA/2011BtCoV/KW2E-F151/Hip cf. rub/GHA/2011

BtCoV/FO-A-58/Hip aba/GHA/2010BtCoV/FO-A-42/Hip aba/GHA/2010BtCoV/FO-A-73/Hip cf. rub/GHA/2010BtCoV/GB59-67/Hip aba/GHA/2009BtCoV/BUO2A-75/Hip aba/GHA/2010

HCoV/229E/Hom sap/NED/2010/JX503060

HCoV/229E/Hom sap/ITA/2009/JX503061HCoV/229E/21050349/Hom sap/NED/2010

HCoV/229E/GH860/Hom sap/GHA/2008

HCoV/229E/KI975/Hom sap/ATA/1975

BtCoV/KWTELE-13/Hip cf. rub/GHA/2011

0.75

1

0.98

0.94

0.89

0.93

0.99

BtCoV/FO1A-F2/Hip aba/GHA/2010

BtCoV/AT1A-F63/Hip aba/GHA/2010BtCoV/AT1A-F45/Hip aba/GHA/2010BtCoV/AT1A-F41/Hip aba/GHA/2010BtCoV/AT1A-F1/Hip aba/GHA/2010

BtCoV/KW2E-F128/Hip cf. rub/GHA/2011BtCoV/AT1A-F26/Hip cf. rub/GHA/2010BtCoV/GB0933/Hip cf. rub/GAB/2009BtCoV/GB0931/Hip cf. rub/GAB/2009

BtCoV/BUO2E-F160/Hip cf. rub/GHA/2011BtCoV/BUO2E-F124/Hip cf. rub/GHA/2011BtCoV/KW1E-F77/Hip aba/GHA/2011BtCoV/BUO2E-F58/Hip cf. rub/GHA/2011BtCoV/FO1E-F65/Hip cf. rub/GHA/2011

BtCoV/FO1E-F66/Hip cf. rub/GHA/2011BtCoV/BUO2E-F157/Hip cf. rub/GHA/2011BtCoV/BUO2E-F155/Hip cf. rub/GHA/2011BtCoV/FO1E-F11/Hip cf. rub/GHA/2011BtCoV/BUO2E-F156/Hip cf. rub/GHA/2011

BtCoV/BUO2E-F27/Hip cf. rub/GHA/2011BtCoV/KW1E-F12/Hip cf. rub/GHA/2011

BtCoV/BUO2E-F159/Hip cf. rub/GHA/2011BtCoV/FO1E-F67/Hip cf. rub/GHA/2011BtCoV/GB59-22/Hip cf. rub/GHA/2009BtCoV/BUO2A-F15/Hip cf. rub/GHA/2010BtCoV/BUO2A-F23/Hip cf. rub/GHA/2010BtCoV/GB59-69/Hip cf. rub/GHA/2009

BtCoV/KWTELE-9/Hip cf. rub/GHA/2011BtCoV/KW2E-F73/Hip cf. rub/GHA/2011BtCoV/KW2E-F56/Hip cf. rub/GHA/2011BtCoV/KW1C-F161/Hip cf. rub/GHA/2010

1

1

1

1

1

1

11

11

1

A

Germany

NetherlandsGhana

ChinaBrazil

Historical isolatesUSA

0.02

partialRdRp

B

997/2008860/2008939/2008938/200883/20081551/20101337/2010499/2010101/2008100/2008

53/2008HK 4/27/2006HK2/26/200221050349 2010

629/2010

645/20101334/2010

1812/2010HK 7/6/2004

VFC408/2007373/2011

HK2/26B/2002HK 5/10/2002

KI/1975

LP/1965HCoV-229E/1962

HCoV-NL63

0.84

0.81

933-50/1993932-72/1993933-40/1993892-11/1989953-49/1995

clade 1

clade 3

clade 2

BU2A-F75_2010

FO1A-F42_2011

F01A-F2_2010

KF514430_1993

KW1E-F12_2011

KW2E-F73_2011

KF514429_1989

DQ243969_1982

AlpacaCoV_2008

DQ243971_1984

AT1A-F41_2010

AT1A-F1_2010

DQ243985_2003

DQ243983_2003

KW2E-F56_2011

DQ243979_2002

DQ243977_2001

DQ243966_1982

DQ243975_1992

KW2E-F151_2011

HC

oV

-NL63

DQ243978_2001

DQ243974_1992

DQ243972_1984

DQ243965_1982

FO1E-F66_2011

DQ243980_2003

AB691764_1979

JX503060_2010

DQ243964_1979

AB691767_2008

DQ243967_1982

NC_002645_1962

DQ243986_2003

AB691765_1982

BUO2E-F58_2011

DQ243984_2003

JX503061_2009

KF514433_1993

FO1E-F67_2011

KF514432_1993

DQ243976_1992

DQ243970_1982

KW1C-F161_2010

AT1A-F45_2010

DQ243973_1990

DQ243968_1982

KF514431_1995

1

1

1

1

0.98

1

0.99

1

1

0.79

1

1

1

1

1

1

1

0.77

1

1

1

0.04

completeSpike

C

Page 35: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

2,000 6,000 10,000 14,000 18,000 22,000 26,000

HCoV-229E/inf-1 SpikeE

M N 27,278

ORF4a/bORF 1ab

B

A

HCoV-NL63/Amsterdam-1 ORF1ab SpikeORF3 E

M N 27,554

BtCoV-229E/KW2E-F151 ORF1ab SpikeE

M N 28,020

ORF4 ORF8

BtCoV-229E/FO1A-F2 ORF1ab SpikeE

M N 28,014

ORF4 ORF8

BtCoV-229E/AT1A-F1 ORF1ab SpikeE

M N 28,682

ORF4 ORF8

BtCoV-229E/KW2E-F56 ORF1ab SpikeE

M N 28,748

ORF4 ORF8

Alpaca respiratory CoV SpikeE

M N 27,363

ORF4 ORF8ORF1ab

Pe

rce

nta

ge

nu

cle

oti

de

ide

nti

ty

3000 6000 9000 12000 15000 18000 21000 24000 2700050

60

70

80

90

100

HCoV-229E vs. Alpaca 229E

HCoV-229E vs. KW2E-F151

HCoV-229E vs. FO1A-F2

HCoV-229E vs. AT1A-F1

HCoV-229E vs. KW2E-F56

Percentage nucleotide identity:

Spike M NORF1ab

Genome position (nucleotide)

Page 36: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

229E 1992

229E 1979

KW2E-F56

AT1A-F1

229E/inf-1 1962

KW2E-F151

F01A-F2

229E 2003

Alpaca

229E 1982

HC

oV

-NL63

0.05

S1

KW2E-F151

AT1A-F1

229E 1979

HC

oV

-NL63

229E 1982

Alpaca

229E 1992

229E 2003

229E/inf-1 1962

F01A-F2

KW2E-F56

0.05

S2

A

B

Perc

enta

ge p

erm

uta

ted tre

es

Genome position (nucleotide)270002400021000180001500012000900060003000

100

80

60

40

20

0

Alpaca-229E

KW2E-F151

F01A-F2

AT1A-F1

KW2E-F56

Clustering ofHCoV-229E with:

Spike M NORF1abS1 S2

C

0.2

Spike

KW2E-F56

21050349

JX503060

VFC408

AT1A-F1

inf-1

KW2E-F151

F01A-F2

Alpaca CoV

HC

oV

-NL

63

0.2

ORF1ab

Alpaca CoV

AT1A-F1

KW2E-F56

inf-1

21050349

KW2E-F151

VFC408

JX503060

F01A-F2H

CoV

-NL

63

0.2

ORF4

VFC408

AT1A-F1

F01A-F2

21050349

KW2E-F56

inf-1

JX503060

Alpaca CoV

KW2E-F151

HC

oV

-NL63

VFC408

21050349

AT1A-F1

JX503060

F01A-F2

inf-1

Alpaca CoV

KW2E-F151

KW2E-F56

0.2

Membrane

HC

oV

-NL63

Nucleocapsid

0.2

JX503060

AT1A-F1

KW2E-F56

inf-1

Alpaca CoV

21050349

KW2E-F151

VFC408

F01A-F2

HC

oV

-NL63

0.2

Envelope

AT1A-F1

inf-1

KW2E-F151

Alpaca CoV

21050349

VFC408

JX503060

KW2E-F56

F01A-F2

HC

oV

-NL63

Page 37: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

10 20 30 40 50 60 70 80 90 100

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E MFVLL------VAYA-------------------------------------------------------------------------------------

Alpaca CoV MFV-L------VAYA-------------------------------------------------------------------------------------

F01A-F2 MLVLL-----------------------------------------------------------------------------------------------

KW2E-F151 MFILI-----------------------------------------------------------------------------------------------

AT1A-F1 MKYVFAFAFICFGNCTQICNITTNGGLFLDYSSLRMGVAANTNVVLAGKYPQFWTCRLATTSQQAVYYPATGFFTADVRYRRGFGFSMFKNSTLGGYSLY

KW2E-F56 MKFIIGFVIPVVAYATTFCNI--NGG--LEYNTLKLGLPPSTSVVVSGNYPQVWRCD-STGPWGPTYHEARGFFTDDVGWGRGFGFSMYANNSFGGYSLY

110 120 130 140 150 160 170 180 190 200

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E ----------------------------------------------------------------------------------------------------

Alpaca CoV ----------------------------------------------------------------------------------------------------

F01A-F2 ----------------------------------------------------------------------------------------------------

KW2E-F151 ----------------------------------------------------------------------------------------------------

AT1A-F1 YGEGCQRKQRACELGERVVLKICRFNRSSPLDFAQVGFN-DTDCLLNFVGSDKWGSIVGVTTYGDVVHIHWTDGVRYIRVPGASEWNTVGVKCYYSASCV

KW2E-F56 YGEGKQGE----ALAKVVVLKICRFNTSTNVGTAIPGANAGADCLIEFKGADKYNQIVGVATYGDIVHIHWSDGVRYLRVPGARDWNMVGVQCYFTYSCI

210 220 230 240 250 260 270 280 290 300

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E -------------------------LLHIAGCQ-------------------------------------------------------------------

Alpaca CoV -------------------------LLHIAGCQ-------------------------------------------------------------------

F01A-F2 --------------------------LLIPLVTATSNCINPTHGNIPIDTLQLGLPDNSTAIVTGYLPDKWHCTSRVNHDNV--SFTGKGFFVRDSADGE

KW2E-F151 ----------------------VACVLHIAAGQINNNCLDGNR--VGIQTMQLGLPPNVTALVTGYLPDKWHCTSNHNHNNV--SYSGKGFFVLDTTDGE

AT1A-F1 HSLVSSVLRVNVTTDSVANTTSYIPLSNVTHPTVGRYCVDGWYNATSISSLQLGLPLNSTALVTGLLPQFWHCASRQHSASIIRSYNGRGVFILDTKPGA

KW2E-F56 YSLVRSTVSVNVTT-SGGNLINYQPISNVTYATVGMYCVDGWYNLTSISSLQLGLPANATALVTGLLPTVWHCSS-MHNDATVRSYPGRGVFVLDSAYGA

310 320 330 340 350 360 370 380 390 400

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E ----------------------------------------------------------------------------------------------------

Alpaca CoV ----------------------------------------------------------------------------------------------------

F01A-F2 ASFALVDSGAYSPDKYYLLYLNHLWWGGGHVKLKICKWSSPVDFNAPSAISKVSSCIVDFRTTDKINRILGFTVSGETVRIHWDNGVYTVYVPGAYAWDK

KW2E-F151 ASFALVDTGAFSTDKYYLLYLNHILYGGGYVKLKICKWSQYVDFNAPSAISFSSSCIIDFKTADKKGRVLGFSASGDTVRIHWSDAVHTVYVPGASAWSR

AT1A-F1 HAVSIMSNTSDCVSKYCLYYGNAA--SGSFTLVKICKWPARPALNAPS-VTTGEQCLLKRAGVDQFNQVIGFTTSGNTVRIHWQDGVHSLYVPGAEEWNL

KW2E-F56 SAFSLTPNGSDCVNTYCLYYGNFD--KKGITVFKICKWPSTPPLNTPS-FTSGETCLLKTSGSDNLNQVVGVTASGDTVRIHWSKGVHSLYVPGASLWNV

410 420 430 440 450 460 470 480 490 500

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E -------------------------TTNGLNTSYSVCNGCVGYSENVFAVESGGYIPSDFAFNNWFLLTNTSSVVDGVVRSFQPLLLNCLWSVSGLRFTT

Alpaca CoV -------------------------TINVLNTSHSVCNGCVGYSENVFAVESGGYIPSDFAFNNWFLLTNTSSLVDGVVRSFQPLLLNCLWSVSGSRFTT

F01A-F2 VNVRCVSIAACSFSIVTEPIAVNVTTVNGVITSYQVCNSCVGYSDNIFAVESGGYIPSDFAFNNWFLLTNTSSVVDGVVRGVQPLLLNCLWPVPGLQSTT

KW2E-F151 VNVRCPKIAACYFSITEDPIAVNVTTTNGLITSYSVCNGCVGYAENIFAVESGGYIPSDFAFNNWFLLTNTSSVVDGVVRSVQPLLLNCLWPVPGLQSTT

AT1A-F1 VNVKCADIASCYFSIVDQPITVNVTTVQGRIVEYDVCPTCTGFADNIFSVDDGGYIPSGFSFNNWFLLTNSSSIVEGVVRTTQPLLLNCLWPVPGLQSTT

KW2E-F56 VNVKCANIQSCYFSIVDNATTVNVTTLNGRIVQYDVCPDCVGFSDNIFSVEEGGYIPSGFSFNNWFLLTNSSSVVDGVVRTKQPLLLNCLWPVPGLQSTT

Page 38: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

10 20 30 40 50 60 70 80 90 100....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E/inf-1 TGATGAAGTAAACTAAACATG-------------------------------------------------------------------------------HCoV-229E/JX503061 .........C...........-------------------------------------------------------------------------------

HCoV-229E/21050349 .........C...........-------------------------------------------------------------------------------Alpaca CoV .........C...........---------------------------------------------CATCCAGAAG------------------------KW2E-F151 .........C...........---------------------------------------------CATCCCGAAGATGAACAGGCCCCCCTTTGGCAAGF01A-F2 .........C...........---------------------------------------------CATCCCGAAGATGAACAGGCCCCCCTTTGGCAAGAT1A-F1 .........C...........---------------------------------------------CATCCTGAAGATCAAGCTGCCCCCCTCTGGAGTGKW2E-F56 .AT.TC.A.C...........AAGGTGGTGGTGATTCTCTGCTTTCTTGTGGTTGGGAGTTTTTGCTTACCATTGAAAGAGGTAAACACCCATCGCGTTA

Nucleocapsid putative ORF8

110 120 130 140 150 160 170 180 190 200....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E/inf-1 ------------------------------------------------CCACTGTGTTGTTTGAAAT---------------------------------HCoV-229E/JX503061 ------------------------------------------------...........----....---------------------------------HCoV-229E/21050349 ------------------------------------------------.T.........----....---------------------------------Alpaca CoV ----ATTTGGAATGGTTTTAGTTGAGGGTTCTATGATTTGGCTAATGG...................CATTCAATTGTTAGTTGTTGCTAATTCAGATGG

KW2E-F151 TTTTGTTTGGAATGTTGTTAGTTGAGGGTTCTATGACTTGGCTAATGG...................CATTCAATTGTTAGTTGCTGCTAGTTCAGATGGF01A-F2 TTTTGTTTGGAATGTTGTTAGTTGAGGGTTCTATGACTTGGCTAATGG...................CATTCAATTGTTAGTTGCTGCTAGTTCAGATGGAT1A-F1 TTCTTTTTGGATTTGTGCTTGTTGAAGGCGGTATAATTTGGATCACCG..GT...A..A........TTTTCAGTTGTTGGTAGCTGCTTTTACAGATGGKW2E-F56 CTGCTTATAAGCAGATATTATGCGACTCA---TATACTAGTTTAGGAC.TTG...CACTAGCCC...AATGCTTACCAACACCATGATACTGTTCAATGG

putative ORF8

210 220 230 240 250 260 270 280 290 300....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E/inf-1 -------TCAGGCTTTAGTTGGAATTTTGCTTTTGTTCTTTCTTTTATTATCTTTCTTTTGCCTGTTTTTAGAGAGATTTGGCGCCTTGGTGCCGTAGATHCoV-229E/JX503061 -------............................C....GT........C.........--------------------------------------..HCoV-229E/21050349 -------............................C....GT..................--------------------------------------..Alpaca CoV TTTGTATGTG.TT..A........................G................GC.........C.................C.............KW2E-F151 TTTGTATGTG.TT..A........................G.............................................CC............F01A-F2 TTTGTATGTG.TT..A........................G.............................................CC............AT1A-F1KW2E-F56

TGCTTATGTT.TG..A........CA.GTG.........A........GC.....T.A......TG..........C.........C.............TGATTGGGTT.CAACAC..ATT..G..GCAAAGAA..GG.GA.A..G.GCCTAAGT..AGTATG.AAGACCACTT.CAGG.TAATAC..CGTATAGCA.C

putative ORF8

310 320 330 340 350 360 370 380 390 400....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E/inf-1 GAATACATTGCTTTT------------------------------------CTCTGATCTATGTATGATGGTACGATCAGAGCTGCTTTTAATTAACATGHCoV-229E/JX503061 .........T....C------------------------------------..........C...................A..................HCoV-229E/21050349 .........T....C------------------------------------..........C...A--................................Alpaca CoV ..G..T.........------------------------------------...C........C.......C.T..........................KW2E-F151 .....T.........------------------------------------...C........C.......C.T...............C..........F01A-F2 .....T.........------------------------------------...C........C.......C.T...............C..........AT1A-F1 ..........A....------------------------------------...C........C.......C.T..........................KW2E-F56 CCT..TGG..GA...CATTACTTGCTGTACATGTGTCGCAGTGGTGAACGC...A..GAA..TCTA.....A.TT.GAGCTT..AAA.A.---.......

putative ORF8

410 420 430 440 450 460 470 480 490 500....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E/inf-1 ATCCCTTGCTTTGGCTTGACAAGGATCTAGTCTTATACACAATGGTAAGCCAGTGGTAGTAAAGGTATAAGAAATTTGCTACTATGTTACTGAACCTAGGHCoV-229E/JX503061 ...................................................T................................................

HCoV-229E/21050349 ...................................................T................................................Alpaca CoV ...........C..........................................A......G......................................

KW2E-F151 ...........C.................................................G...............A......................F01A-F2 ...........C.................................................G...............A......................AT1A-F1 ...........C.................................................G......................................KW2E-F56 G..........C.............C..........................................................................

stem 1 stem 1stem 2 stem 2PK

610 620 630 640....|....|....|....|....|....|....|....|....|.

HCoV-229E/inf-1 AGTTAGTAACTGCTAATGGAACGGTTTCGATATGGATACACAAAAA

HCoV-229E/JX503061 ..............................................HCoV-229E/21050349 ..............................................

Alpaca CoV ..............................................KW2E-F151 ..............................................

F01A-F2 ...............C..............................AT1A-F1 ..............................................

KW2E-F56 ...............C..............................

poly-A tail

510 520 530 540 550 560 570 580 590 600....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

HCoV-229E/inf-1 TGAACGCTAGTATAACTCATTACAAATGTGCTGGAGTAATCAAAGATCGCATTGACGAGCCAACAATGGAAGAGCCAGTCATTTGTCTTGAGACCTATCTHCoV-229E/JX503061 ....................................................................................................HCoV-229E/21050349 ....................................................................................................

Alpaca CoV .....A..................................T.........T.................................................

KW2E-F151 .....A............................................T.C...............................................F01A-F2 .....A............................................T.C.......................................G.......AT1A-F1 .....A.....G......................................T.................................................KW2E-F56 ..................................................T.C...............................................

conservedoctanucleotide

Page 39: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

10 20 30 40 50 60 70 80

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

αCoV KW2E-F56 MKV----------------------------------------------------------------------VVILCFL

αCoV HKU10 MKL----------------------------------------------------------------------LLLLSIF

βCoV Zaria MPVMGGRSGRGPSILLLLILASLTYLSNGASHHNRDNAEHMDPVDQKTFETIVVIILVTVLIFSTIVNALLCRAVLLMVL

90 100 110 120 130 140 150 160

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|

αCoV KW2E-F56 VVGSFCLPLKEVNTHRVTAYK-QIL-------------CDSYTSLGPCVTSPIMLTNTMILFNGDWVATRIKLQRIGDIV

αCoV HKU10 SFSYSAPT-----TYRASQAA-KVLIYTTEKVTLNNQRTHSYTKWGVCSTGWNTYTNTMVVVNGRWVETAKPPEPTAIAI

βCoV Zaria VLNADAAP-----TYRVTRLPTEVL--NTDCVSNTIPCLDAFTVIGFCITGPTPLTNTMRYDVDHWVVVKQLHFPEWQVP

170 180 190 200 210 220 230

....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|.

αCoV KW2E-F56 PKFSMEDHLQGNTAYSN-PYGGFHYLLYMCRSGERLREILDGIRAS----------------KY------------

αCoV HKU10 PTVPYELRSEKPGFGSHFDYGRIEYESVLCAVFEHVSNDIAKIAAQL-------AQTQRRHHTFAVTTFKWSSPSN

βCoV Zaria PSIVKTPQMTSGAESSTNFWDAFGNEALLCSSGAALEEIVKELKTNLNELYPQDGEPTRQSHTL------------

Hipposideros

Page 40: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

Table 1. Overview of bats tested for 229E-related coronaviruses in Ghana Species n Positives (%) Coleura afra 68 0 Hipposideros abae 242 19 (7.8) H. cf. gigas 12 0 H. jonesi 5 0 H. cf. ruber 1611 62 (3.8) Nycteris cf. gambiensis 91 0 Rhinolophus alcyone 4 0 R. landeri 9 0 Taphozous perforatus 21 0 Lissonycteris angolensis 20 0 Rousettus aegyptiacus 4 0 Total 2,087 81 (3.9)

Page 41: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

Table 2. Coding capacity for the putative non-structural proteins of the novel bat 229E-related

coronaviruses KW2E-F151 F01A-F2 AT1A-F1 KW2E-F56

1st to last amino acid Protein size 1st to last amino acid Protein size 1st to last amino acid Protein size 1st to last amino acid Protein size

NSP1 Met1-Gly111 111 Met1-Gly111 111 Met1-Gly111 111 Met1-Gly109 109

NSP2 Asn112-Gly897 786 Asn112-Gly897 786 Asn112-Gly897 786 Asn110-Gly895 786

NSP3 Gly898-Ala2494 1597 Gly898-Ala2494 1597 Gly898-Ala2492 1595 Gly896-Ala2489 1594

NSP4 Gly2495-Gln2975 481 Gly2495-Gln2975 481 Gly2493-Gln2973 481 Gly2490-Gln2970 481

NSP5 Ala2976-Gln3277 302 Ala2976-Gln3277 302 Ala2974-Gln3275 302 Ala2971-Gln3272 302

NSP6 Ser3278-Gln3556 279 Ser3278-Gln3556 279 Ser3276-Gln3553 278 Ser3273-Gln3551 279

NSP7 Ser3557-Gln3639 83 Ser3557-Gln3639 83 Ser3554-Gln3636 83 Ser3552-Gln3634 83

NSP8 Ser3640-Gln3834 195 Ser3640-Gln3834 195 Ser3637-Gln3831 195 Ser3635-Gln3829 195

NSP9 Asn3835-Gln3943 109 Asn3835-Gln3943 109 Asn3832-Gln3940 109 Asn3830-Gln3938 109

NSP10 Ala3944-Gln4078 135 Ala3944-Gln4078 135 Ala3941-Gln4075 135 Ala3939-Gln4073 135

NSP11 Ser4079-Glu4097 19 Ser4079-Glu4097 19 Ser4076-Glu4094 19 Ser4074-Glu4092 19

NSP12 Ser4079-Gln5005 927 Ser4079-Gln5005 927 Ser4076-Gln5002 927 Ser4074-Gln5000 927

NSP13 Ala5006-Gln5602 597 Ala5006-Gln5602 597 Ala5003-Gln5599 597 Ala5001-Gln5597 597

NSP14 Ser5603-Gln6120 518 Ser5603-Gln6120 518 Ser5600-Gln6117 518 Ser5598-Gln6115 518

NSP15 Gly6121-Gln6468 348 Gly6121-Gln6468 348 Gly6118-Gln6465 348 Gly6116-Gln6463 348

NSP16 Ser6469-Lys6768 300 Ser6469-Lys6768 300 Ser6466-Lys6766 301 Ser6464-Lys6763 300

Page 42: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

Table 3. Comparison of amino acid identities of seven conserved replicase domains of the bat 229E-related

coronaviruses, HCoV-229E and the alpaca 229E-related coronavirus for species delineation

Percentage amino acid sequence identity

Human Coronavirus 229Ea vs.

Domains within Bat

229Eb KW2E-F56 AT1A-F1 KW2-F151 F01A-F2 ACoVc vs

Bat 229Eb

ADRP 75.6-100 75-75.6 91.1-92.9 84.5-85.1 84.5-85.1 76.8-90.5

NSP5(3CLpro) 90.7-100 90.4-90.7 97.4-97.7 96.4-96.7 97.4-97.7 90.4-97.4

NSP12 (RdRp) 97.5-100 95.7-96 97.3-97.6 96.9-97.3 97.2-97.7 97.3-98.9

NSP13 (NTPase/Hel) 97.2-100 96.5-97.2 97.2-97.8 97.3-98 98-98.7 97.8-99.3

NSP14 (ExoN/N7-MTase) 96.1-100 95-95.6 97.5-98.1 97.3-97.9 96.9-97.5 96.3-99.2

NSP15 (NendoU) 92.8-100 92.2 96.3-96.6 96.6-96.8 96.8-97.1 91.4-96.8

NSP16 (O-MT) 91.7-100 90.7-91 91.7-92 97.3-97 97.3-97.7 90.7 – 98.0

Concatenated domains 94.5-100 93.3-93.6 96.4-96.8 96.4-96.7 96.7-97.1 94.2-97.8 aincluding - HCoV 229E - Inf-1, HCoV 229E - 0349, HCoV 229E - J0304;

bincluding - Bat CoV KW2E-F56, AT1A-F1, KW2E-F151 and F01A-F2;

cACoV - Alpaca Coronavirus

GenBank accession numbers of reference sequences: HCoV-229E - Inf-1: NC_002645.1; HCoV-229E - 0349: JX503060; HCoV-299E - J0304: JX503061; Alpaca CoV (ACoV): JQ410000

Page 43: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

Table 4. Amino acid identity between open reading frames of human, bat and camelid

229-related coronaviruses

Percentage Amino Acid Sequence Identity

Human Coronavirus 229Ea vs.

KW2E-F151 F01A-F2 AT1A-F1 KW2E-F56 AcoV within Bat CoVb

ACoVc vs Bat CoVb

ORF1a 89.5 - 89.9 89.5 - 89.8 92.6 - 93.1 84.1 - 84.6 92.9 - 93.3 83.8 - 97.9 85.1 - 93.5

ORF1ab 92.5 - 92.9 92.6 - 93 94.2 - 94.6 88.3 - 88.8 94.6 - 95 88.7 - 98.3 89.3 - 95.2

Spike 87.5 - 91.6 87.4 - 91.4 67.2 - 68.9 67.2 - 69.1 92.8 - 94.4 66.8 - 92.4 69.7 - 90.8

ORF4 92.4 - 93.1 92.6 - 93.2 77.3 - 78.8 71.2 - 73.6 79.7 - 78.1 75.7 - 96.4 67.2 - 82.8

Envelope 89.6 - 90.9 89.6 - 90.9 77.6 - 78.9 78.7 - 80 89.6 - 90.9 77.3 - 98.7 77.3 - 100

Membrane 90.2 - 90.7 89.3 - 89.9 86.2 - 86.7 87.1 - 87.6 89.8 - 90.2 86.7 - 98.7 86.3 - 99.1

Nucleocapsid 90.7 - 92 90.2 - 91.5 88.6 - 90.4 75.8 - 76.6 88.4 - 89.7 78.7 - 99.5 78.2 - 94

ORFX/8 - - - - - 12.5 - 100 15.2 - 83.9 aincluding - HCoV 229E - Inf-1, HCoV 229E - 0349, HCoV 229E - J0304

bincluding - Bat CoV KW2E-F56, AT1A-F1, KW2E-F151, F01A-F2;

cACoV - Alpaca Coronavirus

Page 44: 2015 Evidence for an ancestral association of human coronavirus 229E with bats

Table 5. Putative transcription regulatory sequences of the novel bat 229E-related coronaviruses and HCoV-229E HCoV-229E/ inf-1 KW2E-F151 F01A-F2 AT1A-F1 KW2E-F56

Leader (62) UCUCAACUAAACN220 (293) AUG

(62) UCUCAACUAAACN220 (293) AUG

(62) UCUCAACUAAACN220 (293) AUG

(62) UCUCAACUAAACN220 (293) AUG

(62) UCUCAACUAAACN220 (293) AUG

Spike (20571) UCUCAACUAAAUAA A (20586) AUG

(20585) UCUCAACUAAAUAA A (20600) AUG

(20585) UCUCAACUAAAUAA A (20600) AUG

(20576) UCUCAACUAAAAA (20589) AUG

(20570) UCUCAACUAAGUA (20583) AUG

ORF4 (24054) UCAACUAAAN38 (24101) AUG

(24644) UCAACUAAACN38 (24691) AUG

(24638) UCAACUAAACN38 (24685) AUG

(25290) UCAACUAAACN38 (25337) AUG

(25258) UCAACUAAACN38 (25304) AUG

Envelope (24599) UCUCAACUAAN152 (24762) AUG

(25190) UCUCAACUAACN149 (25349) AUG

(25184) UCUCAACUAACN149 (25343) AUG

(25836) UCUCAACUAACN149 (25992) AUG

(25805) UCAACUAACN131 (25962) AUG

Membrane (24991) UCUAAACUAAACG ACA (25007) AUG

(25578) UCUAAACUAAACGA CA (25594) AUG

(25572) UCUAAACUAAACGA CA (25588) AUG

(26224) UCUAAACUAAACG (26237) AUG

(26185) UCUAAACUAAACG (26198) AUG

Nucleocapsid (25680) UCUAAACUGAACGA AAAG (25698) AUG

(26270) UCUAAACUGAACGA AAAG (26288) AUG

(26264) UCUAAACUGAACGA AAAG (26282) AUG

(26934) UCUAAACUGAACGA AAACC (26953) AUG

(26874) UCUAAACUGAACGA AAACC (26893) AUG

ORF8 (27468) UCAACUAAAC (27478) AUG

(27462) UCAACUAAAC (27472) AUG

(28130) UCAACUAAAC (28141) AUG

(28124) UCAACUAAAC (28134) AUG

First bracket: Genome position of the first residue of the putative TRS sequence, second bracket: genome position of the first

base of the start codon; Nlower case: number of base residues between end of the putative TRS sequence and start codon (where

applicable)