Literature DB >> 30568804

Identification and characterization of Coronaviridae genomes from Vietnamese bats and rats based on conserved protein domains.

My V T Phan1,2, Tue Ngo Tri3, Pham Hong Anh3, Stephen Baker3, Paul Kellam4,5, Matthew Cotten1,2.   

Abstract

The Coronaviridae family of viruses encompasses a group of pathogens with a zoonotic potential as observed from previous outbreaks of the severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus. Accordingly, it seems important to identify and document the coronaviruses in animal reservoirs, many of which are uncharacterized and potentially missed by more standard diagnostic assays. A combination of sensitive deep sequencing technology and computational algorithms is essential for virus surveillance, especially for characterizing novel- or distantly related virus strains. Here, we explore the use of profile Hidden Markov Model-defined Pfam protein domains (Pfam domains) encoded by new sequences as a Coronaviridae sequence classification tool. The encoded domains are used first in a triage to identify potential Coronaviridae sequences and then processed using a Random Forest method to classify the sequences to the Coronaviridae genus level. The application of this algorithm on Coronaviridae genomes assembled from agnostic deep sequencing data from surveillance of bats and rats in Dong Thap province (Vietnam) identified thirty-four Alphacoronavirus and eleven Betacoronavirus genomes. This collection of bat and rat coronaviruses genomes provided essential information on the local diversity of coronaviruses and substantially expanded the number of coronavirus full genomes available from bat and rats and may facilitate further molecular studies on this group of viruses.

Entities:  

Keywords:  Pfam; machine learning; profile Hidden Markov model; protein domains; random forest; virus classification

Year:  2018        PMID: 30568804      PMCID: PMC6295324          DOI: 10.1093/ve/vey035

Source DB:  PubMed          Journal:  Virus Evol        ISSN: 2057-1577


1. Introduction

The Coronaviridae family comprises enveloped positive-sense single-stranded RNA viruses of the order Nidovirales with a genome of up to 32 kb in length. The family is divided into Coronavirinae and Torovirinae sub-families, which are further divided into six genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, Deltacoronavirus, Torovirus While viruses in the genera Alphacoronaviruses and Betacoronaviruses infect mostly mammals, the Gammacoronavirus infect avian species and members of the Deltacoronavirus genus have been found in both mammalian and avian hosts (de Groot 2012; Drexler, Corman, and Drosten 2014). Coronaviruses (CoVs) cause a range of respiratory, enteric, and neurological diseases in human and animals. In human CoV infections, the severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) cause severe respiratory tract disease with high mortality rates, and there is strong evidence of zoonosis for both viruses (Hu et al. 2015; Dudas et al. 2018; Leopardi et al. 2018). Given such zoonotic movement, detailed descriptions of the Coronaviridae in broad animal reservoirs that may cross the host barriers to cause diseases in humans are important; many of these Coronaviridae strains in animal reservoirs could represent uncharacterized strains and be missed by conventional diagnostic assays. Advances in nucleic acid sequencing technology (commonly termed Next-Generation Sequencing, NGS) are providing large sets of sequence data obtained from a variety of biological samples and allowing the characterization of both known and novel virus strains. Algorithms that can accurately and rapidly detect and classify low-frequency virus sequences amidst a high-sequence background are useful. The desired features of these classification algorithms are the ability to rapidly process large number of sequences and to accurately identify more distantly related sequences. Use of such tools in the field during outbreak sequencing is common, thus methods that are stand-alone requiring no internet connection are desirable. All viruses encode a collection of proteins required to ensure self-replication and persistence of the encoding virus. Enzymes for genome mRNA production and genome replication, proteases for protein maturation, proteins for genome encapsidation, and proteins for undermining the host antiviral responses can all be identified conserved protein motifs or domains. Likely because of selective pressures, viral genomes are streamlined and the functional protein content encoded by viruses is much higher than for a cellular organisms. Thus, describing a viral genome by the collection of encoded protein domains is a potentially useful classification method that we would like to explore in more detail. Profile Hidden Markov Models (HMMs) provide a probabilistic framework for describing multiple sequence alignments that can reveal position-specific patterns (Krogh, Mian, and Haussler 1994; Eddy 1996, 1998; Durbin et al. 1998; Sonnhammer et al. 1998). The Pfam protein families database (Finn et al. 2016) of >16,000 protein domains is available (Pfam 31.0 at http://Pfam.xfam.org/). Within the Pfam collection, each domain family is defined by a manually selected and aligned set of protein sequences, which is used to construct a profile HMM of the domain. The HMM domain concept and search algorithms for generating and detecting profile HMMs have gone through a number of refinements and a current rapid implementation for finding profile HMMs in novel sequences is HMMER3 (http://hmmer.org/, Eddy 2011). A number of strategies have been developed to use protein domains for virus sequence classification. The Virus Pathogen Resource (ViPR) site has a useful compilation of Pfam domains found in specific virus families (https://www.viprbrc.org/brc/home.spg? decorator=vipr) however the catalog is currently limited to fourteen virus families while the International Committee on Taxonomy of Viruses (ICTV) currently recognizes ninty-six virus families (Lefkowitz et al. 2018). The use of profile Hidden Markov Models (HMMs) for virus classification and discovery was recently reviewed (Reyes et al. 2017). The use of an HMM structure as the basis for sequence classification has the potential to identify more distant members of protein domain family. Both Metavir (Roux et al. 2011) and VirSorter (Roux et al. 2015) make extensive use of protein domains as part of effective virus classification algorithms. The implementation VirSorter is primarily focused on identifying novel bacteriophage sequences. MetLab (Norling et al. 2016) and vFAM (Skewes-Cox et al. 2014) methods have demonstrated the utility of such a protein domain classification approach. ClassyFlu (Van der Auwera et al. 2014) builds influenza subtype specific profile HMM-defined protein domains for the HA coding region and then uses this database of HMMs to classify test influenza HA segments. We describe here a strategy using Pfam protein domains as the basis for identifying and classifying Coronaviridae genomic sequences. We show that the method can be used for rapid identification of Coronaviridae sequences in de novo assembled contigs, although sensitivity requires longer, ideally genome-length contigs. If sufficient sequence across the virus genome is available, the method can provide virus classification to the genus level. We then employ this method to identify fourty-five novel Coronaviridae genome sequences from random-primed deep sequencing data from bats and rats sampled from Dong Thap province of Vietnam.

2. Materials and methods

2.1 Study setting and design

Fecal pellets from Scotophilus kuhlii bats were collected from roosts on bat guano farms in the Dong Thap province in southern Vietnam, ∼150 km south west of Ho Chi Minh City as shown in the map (Fig. 1). Rat fecal pellets from Rattus argentiventer were collected from trapped rice-field rats or from rats purchased in wet markets in Dong Thap (locations are indicated in the map, Fig. 4A). Samples were stored at −80°C until processed for NGS. Approvals for the study were obtained from the Oxford Tropical Research Ethics Committee (Approval No. 15–12) (Oxford, UK), the institutional ethical review board of Dong Thap Provincial Hospital and the Sub-Department of Animal Health, Dong Thap province (Dong Thap, Vietnam).
Figure 1.

Location of the sampling sites. The right panel shows the map of Vietnam, the left inset shows the Dong Thap province (marked in blue and separated by dotted lines with neighboring provinces within the Mekong Delta region of southern Vietnam). The Mekong Delta river branches and flooding areas are marked in green. Names of communal regions within Dong Thap province are indicated. Locations of the guano farms where bats were samples are marked with red diamonds, and the locations of rat sampling sites are marked in orange triangles.

Figure 4.

Sensitivity and specificity plot of various triage conditions. The HMM domain content of the forty-one virus mock contig set (111,577 viral genome fragments including 3,316 Coronaviridae fragments) was determined for each fragment. The CTD or CATD domain content plus the contig length (≥500 nt, ≥3,000 nt, ≥10,000 nt, ≥20,000 nt) were used as a triage to classify fragments as ‘Coronaviridae’ or ‘not Coronaviridae’. The contigs classified as Coronaviridae for each triage condition were then identified to the genus level using RF classification. The sensitivity (true positive/true positive + false negative) and specificity (true negative/true negative + false positive) for each combined triage and classification method were determined based on the original identity of the input genomes. Panel A. RF classification after triage by 500 nt or larger and CTD or CATD content. Panel B. As in A but with 3,000 nt or larger contigs. Panel C. As in A but with 10,000 nt or larger contigs. Panel D. As in A but with 20,000 nt or larger contigs. Each colored node represents the outcome of one complete triage/classification cycle, each combined method was repeated five times.

Location of the sampling sites. The right panel shows the map of Vietnam, the left inset shows the Dong Thap province (marked in blue and separated by dotted lines with neighboring provinces within the Mekong Delta region of southern Vietnam). The Mekong Delta river branches and flooding areas are marked in green. Names of communal regions within Dong Thap province are indicated. Locations of the guano farms where bats were samples are marked with red diamonds, and the locations of rat sampling sites are marked in orange triangles.

2.2 Sample processing, library preparation, and NGS

Total nucleic acid was extracted as previously described (de Vries et al. 2012; Cotten et al. 2014). In brief, a volume of 110 µl of each sample was centrifuged for 10 min at 10,000 × g. Unprotected (non-encapsidated) DNA in the samples was degraded by addition of 20 U TURBO DNase (Ambion). Remaining nucleic acid was subsequently extracted using the Boom method (Boom et al. 1990). Reverse transcription was performed using non-ribosomal random hexamers (Endoh et al. 2005), and second strand DNA synthesis was performed using 5 U of Klenow fragment (New England Biolabs) followed by phenol/chloroform extraction and ethanol precipitation. Illumina libraries were prepared for each sample, the material was sheared to 400–500 bp in length, separately indexed, and multiplexed at ninety-six samples per HiSeq 2500 run, generating two to three million 250-nt paired-end reads per sample.

2.3 De novo assembly and identification of Coronaviridae contigs

Raw sequencing reads were trimmed to remove residual sequencing adapters and trimmed from the 3′ end to a median Phred score > 35 using QUASR (Watson et al. 2013). The quality controlled reads were assembled into contigs using de novo assembly with SPAdes 3.10 (Bankevich et al. 2012). Coverage was estimated for contigs followed by additional filtering for minimum contig size cutoff (300 nt). Final details on the genomes including GenBank accession numbers, sample locations, and collection dates can be found in Table 1. For most of the samples, complete or nearly complete genomes were obtained from the original SPAdes assembly. However, in a subset of samples (usually those with mixed infections or with too high-sequence coverage), SPAdes yielded two or more subgenomic contigs that were manually checked by consulting short reads and re-assembled.
Table 1.

Genome metrics. Compilation of metrics for the new Coronaviridae sequences reported here. The table includes for each genome the genome id and GenBank accession number, and the Coronaviridae genus as identified by the tool described in this article. Also included are the host species, the sample collection date, and location data. Finally, the number of short reads in the sample mapping to the complete genome using bowtie2–2.2.3 (Langmead et al. 2009) and the ‘–very-sensitive-local’ settings and the final genome length are reported.

Genome_IDGenBankaCorona_ genusHostHost_speciesFarm_IDCollection_ dateDistrictTotal_qc_ reads(for)bMapped_ readscGenome_ length (nt)
16715_23MH687934 Alpha Bat Scotophilus kuhlii 5511-Jun-2014Chau Thanh2,336,43357,09728,864
16715_24MH687935 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,957,49155,68429,152
16715_31MH687936 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,419,72518,69628,828
16715_32MH687937 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,830,49660,64328,297
16715_39_c1MH687938 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,235,26951,70928,238
16715_39_c2MH687939 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,235,26911,04228,307
16715_45MH687940 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh2,544,49112,46328,170
16715_47_c1MH687941 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,723,72017,63228,257
16715_47_c2MH687942 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,723,72016,65128,321
16715_5MH687943 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh2,556,70832,05628,272
16715_53MH687944 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh3,260,5906,25128,400
16715_56MH687945 Alpha Bat Scotophilus kuhlii 9916-Sep-2014Cao Lanh2,596,08432,89528,481
16715_61MH687946 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh2,910,0893,46428,173
16715_63MH687947 Alpha Bat Scotophilus kuhlii 9817-Jun-2014Cao Lanh2,619,814124,45329,462
16715_7MH687948 Alpha Bat Scotophilus kuhlii 5511-Jun-2014Chau Thanh2,304,29759,10828,340
16715_76MH687949 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh2,618,66236,60128,232
16715_77MH687950 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh2,703,8037,84828,303
16715_78MH687951 Alpha Bat Scotophilus kuhlii 5511-Jun-2014Chau Thanh2,540,98281,20229,118
16715_84MH687952 Alpha Bat Scotophilus kuhlii 9910-Jun-2014Cao Lanh2,334,4645,62927,225
16715_86MH687953 Alpha Bat Scotophilus kuhlii 5511-Jun-2014Chau Thanh2,643,55379,66028,747
16845_24MH687954 Alpha Bat Scotophilus kuhlii 5518-Sep-2014Chau Thanh2,794,6604,33428,333
16845_47MH687955 Alpha Bat Scotophilus kuhlii 9817-Sep-2014Cao Lanh2,250,4355,38028,437
16845_53MH687956 Alpha Bat Scotophilus kuhlii 9916-Sep-2014Cao Lanh2,589,540183,76328,562
16845_64MH687957 Alpha Bat Scotophilus kuhlii 5518-Sep-2014Chau Thanh3,114,8584,26428,173
16845_87MH687958 Alpha Bat Scotophilus kuhlii 5518-Sep-2014Chau Thanh2,453,6606,51428,054
17819_17MH687959 Alpha Bat Scotophilus kuhlii 9812-Nov-2014Cao Lanh2,555,41727,17728,706
17819_22MH687960 Alpha Bat Scotophilus kuhlii 5513-Nov-2014Chau Thanh2,546,1172,30627,491
17819_4MH687961 Alpha Bat Scotophilus kuhlii 5513-Nov-2014Chau Thanh2,740,6625,38628,380
17819_50MH687962 Alpha Bat Scotophilus kuhlii 5513-Nov-2014Chau Thanh2,392,5923,80028,053
20724_95MH687963 Alpha Bat Scotophilus kuhlii 996-Feb-2015Cao Lanh7,568,96711,33128,169
20745_10MH687964 Alpha Bat Scotophilus kuhlii 996-Feb-2015Cao Lanh1,043,8796,91328,210
20745_17MH687965 Alpha Bat Scotophilus kuhlii 5513-Feb-2015Chau Thanh1,862,52536,55828,199
20745_6MH687966 Alpha Bat Scotophilus kuhlii 996-Feb-2015Cao Lanh9,273,64836,63928,628
20745_8MH687967 Alpha Bat Scotophilus kuhlii 996-Feb-2015Cao Lanh12,697,1444,02128,038
16715_52MH687968 Beta Rat Rattus argentiventer 6314-Nov-2014Cao Lanh2,519,21830,55131,047
20724_33MH687969 Beta Rat Rattus argentiventer 6512-Nov-2014Tam Nong536,6629,91831,976
20724_34_c12MH687970 Beta Rat Rattus argentiventer 6512-Nov-2014Tam Nong740,7178,57931,038
20724_34_c13MH687971 Beta Rat Rattus argentiventer 6512-Nov-2014Tam Nong740,71718,46331,171
20724_38MH687972 Beta Rat Rattus argentiventer 6512-Nov-2014Tam Nong8,568,19614,28131,334
20724_39MH687973 Beta Rat Rattus argentiventer 6512-Nov-2014Tam Nong6,703,53868,65231,389
20724_43MH687974 Beta Rat Rattus argentiventer 6512-Nov-2014Tam Nong255,79427,83131,224
22054_56MH687975 Beta Rat not available 629-Dec-2013Cao Lanh442,5672,98229,727
22084_1MH687976 Beta Rat Rattus argentiventer 654-Feb-2015Tam Nong29,862,809100,93631,068
22084_10MH687977 Beta Rat Rattus argentiventer 654-Feb-2015Tam Nong343,51421,21031,355
22084_6MH687978 Beta Rat Rattus argentiventer 654-Feb-2015Tam Nong28,864,231121,99931,289

aGenBank accession number.

bTotal paired eads (after quality control).

cTotal read mapped to final genome.

Genome metrics. Compilation of metrics for the new Coronaviridae sequences reported here. The table includes for each genome the genome id and GenBank accession number, and the Coronaviridae genus as identified by the tool described in this article. Also included are the host species, the sample collection date, and location data. Finally, the number of short reads in the sample mapping to the complete genome using bowtie2–2.2.3 (Langmead et al. 2009) and the ‘–very-sensitive-local’ settings and the final genome length are reported. aGenBank accession number. bTotal paired eads (after quality control). cTotal read mapped to final genome.

2.4 Phylogenetic analyses

Global coronavirus reference sequences sharing ≥80 per cent nt similarity to the reported CoVs in this study were retrieved from GenBank in addition to selected reference CoV sequences for comparison. The coding regions of spike protein from all reference and the assembled Coronaviridae sequences were extracted and aligned in MUSCLE (Edgar 2004), followed by manual check in AliView (Larsson 2014). The best-fitted nucleotide substitution models were determined in IQ-TREE v1.5.2 using the Akaike Information Criterion (Nguyen et al. 2015). Maximum likelihood (ML) phylogenetic trees were inferred in IQ-TREE employing GTR + I + Γ4 model of substitution, bootstrapping for 1,000 pseudoreplicates. Bootstrap values of ≥70 per cent were considered as statistically significant, and resulting trees were visualized and edited in FigTree v1.4.3 (Rambaut 2016).

2.5 Protein domain database

Using the HMMER3 hmmsearch function (Eddy 2011) and the Pfam collection of profile HMM protein domains, we examined all available Coronaviridae full genome sequences from GenBank (as of January 2018; N = 2,255). A set of seventy-nine Pfam domains were found at least once in a Coronaviridae genome and these domains formed the basis for the classification methods explored here. The seventy-nine Pfam domains used for Coronaviridae classification and their frequencies in the set of 2,255 Coronaviridae genomes are listed in Table 2.
Table 2.

Pfam domains used for Coronaviridae classification. A compilation of the Pfam domains used for classification of the Coronaviridae sequences. All available full Coronaviridae genome sequences in GenBank were retrieved using the query ‘txid11118[Organism] AND 25600[SLEN]:48000[SLEN] NOT patent’ to yield a set of 2,255 sequences (3 January 2018). All open reading frames encoding peptides > 100 amino acids in length were were analyzed with the hmmer-3.1b2-hmmscan program (Eddy 2011) and the complete Pfam A database (http://Pfam.xfam.org/, Finn et al. 2016). Domains with E-value < 0.01 were counted and the domain frequencies in the set of 2,255 Coronaviridae genomes were reported.

Pfam_idaNameFrequencybCategoryc
pfam13086AAA_113,712CATD
pfam06460NSP132,492CATD
pfam13604AAA_302,438CTD
pfam01443Viral_helicase12,427CTD
pfam01661Macro2,390CTD
pfam01600Corona_S16,838abundant
pfam01601Corona_S25,446abundant
pfam00937Corona_nucleoca4,993abundant
pfam01635Corona_M3,794abundant
pfam08715Viral_protease3,290abundant
pfam06478Corona_RPol_N2,712abundant
pfam09408Spike_rec_bind2,599abundant
pfam06471NSP112,466abundant
pfam13087AAA_122,434abundant
pfam13538UvrD_C_22,432abundant
pfam05409Peptidase_C302,406abundant
pfam08717nsp82,398abundant
pfam08716nsp72,396abundant
pfam16348Corona_NSP4_C2,386abundant
pfam09401NSP102,384abundant
pfam08710nsp92,382abundant
pfam13245AAA_192,067abundant
pfam00680RdRP_11,868abundant
pfam03053Corona_NS3b1,773abundant
pfam16451Spike_NTD1,461abundant
pfam16251NAR1,142abundant
pfam11633SUD-M855moderate
pfam03187Corona_I586moderate
pfam03996Hema_esterase563moderate
pfam02710Hema_HEFG554moderate
pfam03262Corona_6B_7B534moderate
pfam02723NS3_envE495moderate
pfam03620IBV_3C474moderate
pfam08779SARS_X4414moderate
pfam11289APA3_viroporin390moderate
pfam09399SARS_lipid_bind387moderate
pfam11501Nsp1371moderate
pfam12124Nsp3_PL2pro371moderate
pfam12379DUF3655370moderate
pfam12383SARS_3b363moderate
pfam11963DUF3477340moderate
pfam04753Corona_NS2322moderate
pfam01831Peptidase_C16315moderate
pfam05213Corona_NS2A276moderate
pfam135632_5_RNA_ligase2245moderate
pfam10469AKAP7_NLS238moderate
pfam16688CNV-Replicase_N224moderate
pfam02398Corona_7176moderate
pfam10943DUF263292moderate
pfam12093Corona_NS878moderate
pfam17072Spike_torovirin70moderate
pfam03905Corona_NS451moderate
pfam05528Coronavirus_548moderate
pfam07204Orthoreo_P1027rare
pfam00035dsrm16rare
pfam04694Corona_314rare
pfam11030Nucleocapsid-N12rare
pfam00943Alpha_E2_glycop7rare
pfam00270DEAD7rare
pfam03622IBV_3B7rare
pfam13238AAA_185rare
pfam12226Astro_capsid_p5rare
pfam11395DUF28735rare
pfam00523Fusion_gly5rare
pfam00485PRK5rare
pfam07690MFS_14rare
pfam04582Reo_sigmaC4rare
pfam01481Arteri_nucleo2rare
pfam06336Corona_5a2rare
pfam00517GP412rare
pfam08291Peptidase_M15_32rare
pfam00069Pkinase2rare
pfam07714Pkinase_Tyr2rare
pfam01815Rop2rare
pfam00083Sugar_tr2rare
pfam00704Glyco_hydro_181rare
pfam01358PARP_regulatory1rare
pfam02123RdRP_41rare
pfam00429TLV_coat1rare

aPfam domains from Pfam 31.0 at http://Pfam.xfam.org/

bThe frequency of the domain occurrence in a set of all Coronaviridae genome sequences (2,255 entries) retrieved from GenBank on 3 January 2018.

cCATD, Coronaviridae Absolute Triage Domain; CTD, Coronaviridae Triage Domain. See text for details. Abundant, moderate, and rare indicate the frequency of the domain in all Coronaviridae genome sequences.

Pfam domains used for Coronaviridae classification. A compilation of the Pfam domains used for classification of the Coronaviridae sequences. All available full Coronaviridae genome sequences in GenBank were retrieved using the query ‘txid11118[Organism] AND 25600[SLEN]:48000[SLEN] NOT patent’ to yield a set of 2,255 sequences (3 January 2018). All open reading frames encoding peptides > 100 amino acids in length were were analyzed with the hmmer-3.1b2-hmmscan program (Eddy 2011) and the complete Pfam A database (http://Pfam.xfam.org/, Finn et al. 2016). Domains with E-value < 0.01 were counted and the domain frequencies in the set of 2,255 Coronaviridae genomes were reported. aPfam domains from Pfam 31.0 at http://Pfam.xfam.org/ bThe frequency of the domain occurrence in a set of all Coronaviridae genome sequences (2,255 entries) retrieved from GenBank on 3 January 2018. cCATD, Coronaviridae Absolute Triage Domain; CTD, Coronaviridae Triage Domain. See text for details. Abundant, moderate, and rare indicate the frequency of the domain in all Coronaviridae genome sequences. A Random Forest (RF) classification using the Scikit-learn (Pedregosa et al. 2011) RandomForestClassifier module was performed on the initial triage contigs using a full genome Coronaviridae genera training set as follows. For all full genomes in each of the six Coronaviridae genera, all six open reading frames were translated and peptides ≥ 100 amino acids in length were collected. HMMER3 hmmsearch (Eddy 2011) was applied to the set of peptides, screening against the Coronaviridae Pfam domains ever found in the Coronaviridae (see Table 2). For domain hits with e-values < 0.01, the Pfam domain scores were collected into an array organized by genera. An RF model using these six genera sets was then trained and built from 1,000 random trees. Classification of novel query sequences was performed as follows. For each query nucleotide sequence, all six open reading frames were translated and peptides ≥ 100 amino acids in length were collected. HMMER3 hmmsearch (Eddy 2011) was applied to the set of peptides, screening against the database of Coronaviridae Pfam domains and for domain hits with e-values < 0.01, the domain scores were collected into an array. An initial triage was performed to retain only contigs with at least one of the five Coronaviridae Triage Domains (CTDs, see below) to yield the initial triage contigs. The RF model was then applied on each query contig and the genus level probability was calculated as the mean predicted probabilities of 1,000 random trees in the forest. A CSV table with the probabilities and a heatmap of the same data are the output from the process. To facilitate the use of this tool for virus discovery and classification, the process of HMMER3 Pfam domain identification, the encoding of the domain content into a matrix and the RF classification against all available Coronaviridae genomes was incorporated into a single, platform-independent Docker image (available for download at https://hub.docker.com/r/matthewcotten/cotten_myphan_coronavirus_classification_tool/). The tool can be downloaded and installed on any computing platform (Unix, Mac, Windows) and includes all required dependencies. Further details on installation and running the tool can be found at the Docker hub link.

2.6 Forty-one virus mock contig set

For testing the specificity and sensitivity of the classification methods, a test set of random genome fragments from forty-one virus families including two of each of the six Coronaviridae genera was prepared. The set was derived from 492 full genomes (12 genomes from each of 40 virus families) plus 2 genomes from each of the 6 Coronaviridae genera. For each genome, 300 random fragments in the size range from 500 nt to full genome size were prepared, combined resulting in a total of 111,577 fragments including 3,316 Coronaviridae fragments.

3. Results

The Pfam domain content of all 2,255 available Coronaviridae genomes was determined and a variety of distribution patterns were observed, from domains whose frequency across all Coronaviridae was >95 per cent, to rare domains present in only a few known genomes. To illustrate this variety, two genomes examples of Alphacoronavirus, Betacoronavirus, Gammacoronavirus, Deltacoronavirus, Torovirus, and Bafinivirus were selected and all domains encoded by the full genomes were identified and their positions in each virus genome is marked by colored rectangles indicating frequent, moderate, or rare occurrence (Fig. 2). The domain content is both extensive and varied by genus and this information might be used to identity and classify Coronaviridae sequences.
Figure 2.

The distribution of Pfam domains across Coronaviridae genera. Panel A: Two examples of Alpha-, Beta-, Gamma-, Delta-, Toro- and Bafinivirus were selected and all protein domains encoded by the full genomes, detected by profile HMMs, were identified and their positions in each virus genome is marked by colored rectangles. Panel B: The Coronaviridae Absolute Triage Domains (CATDs) are marked with an red, the Coronaviridae Triage Domains (CTDs) are marked with orange, the frequent Pfam domains are marked in shades of blue, the moderately frequent Pfam domains are marked in shades of green and the rare Pfam domains are marked in gray.

The distribution of Pfam domains across Coronaviridae genera. Panel A: Two examples of Alpha-, Beta-, Gamma-, Delta-, Toro- and Bafinivirus were selected and all protein domains encoded by the full genomes, detected by profile HMMs, were identified and their positions in each virus genome is marked by colored rectangles. Panel B: The Coronaviridae Absolute Triage Domains (CATDs) are marked with an red, the Coronaviridae Triage Domains (CTDs) are marked with orange, the frequent Pfam domains are marked in shades of blue, the moderately frequent Pfam domains are marked in shades of green and the rare Pfam domains are marked in gray. A hierarchical-clustering of domain content of all genomes in each genus showed three distribution patterns (Fig. 3). Domains present in a high frequency in single genus (upper-third of the cluster map), domains present at high frequency in most or all genera (bottom-third of the cluster map), and domains with low frequency in some genomes or genera (middle of the cluster map). In particular, five domains (AAA_30, Macro, Viral_helicase1, AAA_11, NSP13) were found to be encoded by genomes from all six Coronaviridae genera in >95 per cent of the 2,255 Coronaviridae genomes examined. We define these domains as Coronaviridae Triage Domains (CTDs). Of the five CTDs, three were promiscuous (encoded in all Coronaviridae genomes as well as in other virus families), while two domains appeared specific for Coronaviridae (AAA_11 and NSP13, termed Coronaviridae Absolute Triage Domains (CATD)) and were encoded in all Coronaviridae genomes, but were not found in genomes from forty other common virus families infecting animals (results not shown).
Figure 3.

Cluster map of Pfam protein domains encoded by Coronaviridae genomes. The protein domain repertoire, as detected by profile HMMs, is plotted as the frequency of each domain in all available full genomes from all Coronaviridae genera. Each row represents a protein domain, each column represents a Coronaviridae genus. Colors indicate domain frequency within that genus (darkest blue = 1 = all genomes in this genus encode this domain; white = 0 = no genomes in this genus encode this domain, see color bar at upper left).

Cluster map of Pfam protein domains encoded by Coronaviridae genomes. The protein domain repertoire, as detected by profile HMMs, is plotted as the frequency of each domain in all available full genomes from all Coronaviridae genera. Each row represents a protein domain, each column represents a Coronaviridae genus. Colors indicate domain frequency within that genus (darkest blue = 1 = all genomes in this genus encode this domain; white = 0 = no genomes in this genus encode this domain, see color bar at upper left). Sensitivity and specificity plot of various triage conditions. The HMM domain content of the forty-one virus mock contig set (111,577 viral genome fragments including 3,316 Coronaviridae fragments) was determined for each fragment. The CTD or CATD domain content plus the contig length (≥500 nt, ≥3,000 nt, ≥10,000 nt, ≥20,000 nt) were used as a triage to classify fragments as ‘Coronaviridae’ or ‘not Coronaviridae’. The contigs classified as Coronaviridae for each triage condition were then identified to the genus level using RF classification. The sensitivity (true positive/true positive + false negative) and specificity (true negative/true negative + false positive) for each combined triage and classification method were determined based on the original identity of the input genomes. Panel A. RF classification after triage by 500 nt or larger and CTD or CATD content. Panel B. As in A but with 3,000 nt or larger contigs. Panel C. As in A but with 10,000 nt or larger contigs. Panel D. As in A but with 20,000 nt or larger contigs. Each colored node represents the outcome of one complete triage/classification cycle, each combined method was repeated five times. The utility of domain content for Coronaviridae classification was first tested by developing a simple triage method to identify potential Coronaviridae sequence contigs. Preliminary work identified four triage conditions as useful for this purpose. These triage conditions were CTD_any (the contig encodes at least one of the five CTDs), CTD_all (the contig encodes all five CTDs), CATD_any (the contig encodes at least one of the two CATDs) CATD_all (the contig encodes both CATDs). The presence of these domains was combined with contig length cutoffs of 500 nt, 3,000 nt, 10,000 nt, and 20,000 nt. The performance of these triage conditions for identifying coronavirus contigs was examined. Using a mock contig set derived from forty-one virus families (see Section 2), the Pfam domain content of each fragment was determined using HMMER3 and used to sort the fragments into Coronaviridae groups based on fragment length plus CTD and CATD domain content. The accuracy of the classification was assessed in comparison to the classification of the original genome annotation in GenBank (Fig. 4). We ran each classification process five times to control for the random selection of features. The analyses were run with four size cutoff classes and sensitivity/specificity values for correct classification were color and shape coded for Triage method (≥500 nt panel A, ≥3,000 nt panel B, ≥10,000 nt panel C, and ≥20,000 nt panel D, Fig. 4). The highest sensitivity/specificity values were obtained with the CTD_all triage for all four size classes (dark blue triangles, Fig. 4). Overall performance was observed in the order CTD_any < CATD_any < CATD_all < CTD_all. The combination of triage with either the CTD_all or CATD_all triage, with a 20,000-nt size cutoff and RF classification using all Coronaviridae domains resulted in classification of Coronaviridae sequence fragments with both sensitivity > 0.9 and specificity > 0.975. We next applied this protein domain-based method to classify Coronaviridae genomic sequences generated from next-generation sequencing surveillance data. The NGS data were derived from bat and rat fecal samples collected in the Dong Thap province (Fig. 1) and processed for agnostic, random primed NGS as described in Section 2. All de novo assembled Coronaviridae contigs that passed the quality control and minimum length cutoff were subjected to Pfam domain content identification, triage by CATD content and length and RF classification. This process is summarized in Fig. 5. The process identified thirty-four potential Coronaviridae genomes from 177 bat fecal samples and 11 Coronaviridae genomes from 391 rat fecal samples. These forty-five genomes were classified to the Coronaviridae genus level using the Coronaviridae classification tool (Fig. 6).
Figure 5.

Workflow of the Coronaviridae classification tool to identify Coronaviridae genomes in NGS data. First, short read NGS data from surveillance samples were de novo assembled into larger contigs using SPAdes. Subsequently, putative Coronaviridae genome sequences were identified by their encoded triage domains (contig length > 10,000 nt and the presence of at least one CTD) followed by machine learning classification (using RF) to the Coronaviridae genus level.

Figure 6.

Identification of Coronaviridae genomes. De novo assembled contigs from rat and bat sample data sets were processed using a triage (contig length > 10,000 nt and the presence of at least one CTD) followed by RF classification to the Coronaviridae genus level. About forty-five samples contained Alphacoronavirus and Betacoronavirus sequences with probabilities > 0.5 (darker blue in the heatmap). These sequences were included in the complete set of samples processed for full genome coronavirus handling. Panel A. Heatmap of predicted Coronaviridae genus probabilities. Panel B. Table of probabilities prediction.

Workflow of the Coronaviridae classification tool to identify Coronaviridae genomes in NGS data. First, short read NGS data from surveillance samples were de novo assembled into larger contigs using SPAdes. Subsequently, putative Coronaviridae genome sequences were identified by their encoded triage domains (contig length > 10,000 nt and the presence of at least one CTD) followed by machine learning classification (using RF) to the Coronaviridae genus level. Identification of Coronaviridae genomes. De novo assembled contigs from rat and bat sample data sets were processed using a triage (contig length > 10,000 nt and the presence of at least one CTD) followed by RF classification to the Coronaviridae genus level. About forty-five samples contained Alphacoronavirus and Betacoronavirus sequences with probabilities > 0.5 (darker blue in the heatmap). These sequences were included in the complete set of samples processed for full genome coronavirus handling. Panel A. Heatmap of predicted Coronaviridae genus probabilities. Panel B. Table of probabilities prediction. Note that screening using specific PCR targeting the conserved RNA-dependent RNA polymerase (RdRp) gene had previously identified CoV sequences in some of the same samples (29). The RdRp sequences generated in that study were closest to the bat Alphacoronavirus NC_009657 (Scotophilus bat coronavirus 512) and the rat Betacoronaviruses NC_026011 (Betacoronavirus HKU24) or KF294372 (Longquan Rl rat coronavirus). These are likely to be the same viruses described here at the full genome level. Two lineages of Alphacoronavirus were identified in the surveyed bat samples, one group showed some relationship to the Scotophilus bat coronavirus 512 (GenBank NC_009657, Tang et al. 2006) with 96 per cent shared identity across the genome. The other group of Alphacoronavirus was distant from any known Alphacoronavirus strains and may represent new species. Two groups of Betacoronavirus were identified in rat samples, the closest available virus genome in GenBank was the Betacoronavirus_HKU24 (KM349743) and the Longquan mouse coronavirus (KF294357) with 95 and 94 per cent shared identity across the entire genome. The next closest coronaviruses were the human coronavirus OC43 and the porcine hemagglutinating encephalomyelitis virus (PHEV) with ∼70 per cent shared amino acid identity across the entire genome. The genome organization for the new coronaviruses was similar to the closest reference genomes sharing similar open reading frame organization as well as similar Pfam domains (Fig. 7A). Furthermore, the expected ribosome slippage sequences between the ORF 1 A and 1AB, as well as the repeat sequences and protease cleavage sites were all present in the new CoVs genomes (results not shown).
Figure 7.

Analyses of identified coronavirus genomes. Panel A. Open reading frames and domain content of the three classes of coronavirus identified in this study. All open reading frames > 130 amino acids in length and the Pfam domains are displayed for an example reported genome from each of the lineage 1 and 2 of Alphacoronavirus, and Betacoronavirus plus the closest known genomes (Alphacoronavirus Scotophilus bat CoV 512, NC_009657 and Betacoronavirus strain HKU24, NC_026011). Panel B. Maximum-likelihood phylogenetic tree of the spike protein coding sequences from Alphacoronaviruses from this study (highlighted in red) plus selected reference sequences. The tree is mid-point rooted for clarity and only bootstraps ≥70 per cent are shown. Horizontal branch lengths are drawn to the scale of nucleotide substitutions per site. Panel C. Maximum-likelihood phylogenetic tree of the spike protein coding sequences from Betacoronaviruses plus a collection of spike coding regions from relevant Betacoronaviruses. The tree is mid-point rooted for clarity and only bootstraps ≥70 per cent are shown. Horizontal branch lengths are drawn to the scale of nucleotide substitutions per site.

Analyses of identified coronavirus genomes. Panel A. Open reading frames and domain content of the three classes of coronavirus identified in this study. All open reading frames > 130 amino acids in length and the Pfam domains are displayed for an example reported genome from each of the lineage 1 and 2 of Alphacoronavirus, and Betacoronavirus plus the closest known genomes (Alphacoronavirus Scotophilus bat CoV 512, NC_009657 and Betacoronavirus strain HKU24, NC_026011). Panel B. Maximum-likelihood phylogenetic tree of the spike protein coding sequences from Alphacoronaviruses from this study (highlighted in red) plus selected reference sequences. The tree is mid-point rooted for clarity and only bootstraps ≥70 per cent are shown. Horizontal branch lengths are drawn to the scale of nucleotide substitutions per site. Panel C. Maximum-likelihood phylogenetic tree of the spike protein coding sequences from Betacoronaviruses plus a collection of spike coding regions from relevant Betacoronaviruses. The tree is mid-point rooted for clarity and only bootstraps ≥70 per cent are shown. Horizontal branch lengths are drawn to the scale of nucleotide substitutions per site. To examine the relationship between the reported viruses and known Alpha- and Betacoronaviruses, the spike protein encoding regions of these genomes were compared with the spike coding regions from the most closely related coronavirus genomes from GenBank. Consistent with observation at the full genome scale, phylogenetic analyses suggested that the Vietnamese bat Alphacoronaviruses belonged to two lineages; viruses in the one lineage are closely related (sharing 94–96 per cent nt identities) to Scotophilus bat CoVs strains A515 (DQ648719), A527 (DQ648791), CYCU-S1/TW/2013 (KT346372), and 512 (NC_009675), while viruses in the second lineage were more distantly related (sharing 75–76 per cent nt identities) to the four previously mentioned bat CoVs strains (Fig. 7B). The CoVs identified from Vietnamese rats were classified as Betacoronavirus and belonged to two distinct lineages as shown in phylogenetic tree (Fig. 7B); viruses from one lineage closely related to the CoV strain HKU24 from Hongkong (KM349743, Lau et al. 2015), while viruses in the second lineage are more related to the rat CoV strains Longquan-189 (KF294370) and -370 (KF294371) (Wang et al. 2015) and rat CoVs from China (KY370051, KY370049, KY37048, and KY370043).

4. Discussion

Members of the Coronaviridae family of viruses cause health problems in a variety of animal hosts. MERS-CoV often moves from camels to humans (Memish et al. 2014; Dudas et al. 2018) and can spread in health care systems with serious consequences (Assiri et al. 2013). SARS-CoV moved from civet cats to humans and caused substantial morbidity and mortality before it was brought under control (Poon et al. 2004). Several porcine coronaviruses cause frequent problems including PEDV (Kocherhans et al. 2001) and swine acute diarrhea syndrome coronavirus (SADS-CoV) (Zhou et al. 2018). Given the frequent association of Coronaviridae members with severe diseases, a more comprehensive description of Coronaviridae diversity, especially in animals with frequent human contact, is an important objective. We describe a Coronaviridae sequence classification strategy based on the set of protein domains encoded by the genome sequence. The classification is not dependent on a single domain, but rather the composite score of all domains present in the query sequence. This is a strength of the method that can limit false positive identifications which might be due for example, to shorter regions of homology to a bacterial or host or repetitive sequence. The requirement for longer sequence contigs is also a weakness of the method as sufficient query sequence must be available to encode multiple protein domains. This also limits the tool to assembled contigs rather than short read data. In other words, the sensitivity of the classification is directly dependent upon the length of the genomic sequences, that is, higher sensitivity of genus assignment with longer or complete genome sequence. The classification tool provides a robust, rapid, and alignment-free method to classify large sets of more distantly related sequences. Once a database is generated, the algorithm can be used in the field or resource-limited settings and the classification can be performed with typical contig sets within minutes on a standard laptop. With the availability of the platform independent Docker version of the algorithm (see Section 2), scientists can easily run the analyses on any computing platform. Given the large number of genomes available for most of the Coronaviridae genera, this domain-based classification method can provide a sensitive measure of genome and annotation quality. One consideration is that the genus classification may be broad and the diversity within that genus includes genomes with more distant variations in the protein domains. An additional consideration is that the genus classification of individual Coronaviridae genomes in GenBank may not be correct (mis-annotation), that the genome sequences may include errors (machine errors, PCR errors, chimeric sequences) or have been assembled incorrectly or with sequence duplications or deletions (mis-assemblies). The domain method described here can help identify these patterns. Bats have been suggested to harbor great diversity of CoVs and play a key role in the emergence and transmission of pathogenic CoVs causing severe diseases in human (Menachery, Graham, and Baric 2017). Rats, on the other hand, represent the largest order of mammalian species and are potentially a major zoonotic source of human infectious diseases (Meerburg, Singleton, and Kijlstra 2009; Luis et al. 2013). As part of a large-scale zoonotic surveillance in Vietnam (Rabaa et al. 2015), we applied agnostic deep sequencing to 177 bat and 391 rat samples from a single location. The sample collection was from the Dong Thap province in southern Vietnam where humans and domestic and farm animals live in close proximity. The site is 154 km from Ho Chi Minh City, the largest city in south of Vietnam. From this modest sample size surveillance, forty-four complete or nearly complete genomes belonging to Coronaviridae family were identified, thirty-four of which were from bat samples belonging to the Alphacorovirus genus and eleven genomes from ten rat samples belonging to the Betacoronavirus genus. The bat fecal samples were pooled material from five to ten individuals, thus the total individual bats samples screened ranged from 885 to 1,770 and the frequency of full genome identification was 1.9–4.0 per cent (34/1,770 to 34/855). In comparison, the 391 screened rat samples were derived from individual fecal pellets and frequency of the coronavirus genome identification was 2.8 per cent (11/391). Given this small sample size, the frequency of CoVs identified was not strikingly different between bats and rats. While it is possible to use Blast methods to identify/classify the putative viral genomes, one advantage of the tool described here is that it rapidly provides a genus-level classification, which is not directly obtained from a Blast search. Certainly, one could use a Blast search to find the closest sequence in a database and then use this as a classification if (1) the closest homology sequence covers 100 per cent of the length of the query sequence, and (2) the closest entry contains sufficient classification annotation. Finally, we are not presenting this new tool as a replacement for Blast (which is a reliable and highly trusted tool). Instead, the domain-based classification method described here provides a rapid, alternative and complementary classification method that can help organize complex data sets. Of note, this is the first whole genome characterization of Alpha- and Betacoronaviruses from rats and bats from southern Vietnam. Although there are reports in the literature of enhanced CoV prevalence in various bat species (Anthony et al. 2017), at least in the current survey, when the same agnostic sequencing approach was applied to both bat and rat samples, similar virus frequencies were observed, suggesting that rats and in more general rodents may be as important a host for CoVs as bat species. Indeed surveys using similar agnostic NGS methods have identified a wide range of virus diversity in rats and other rodents (Li et al. 2015, 2016; Wang et al. 2015, 2017). The zoonotic potential of these reported viruses is yet to be defined. Prior to the current study, only single examples of these viruses were available and the availability of a larger number of full genomes for these viruses will facilitate more functional studies. Sensitive surveillance that is not dependent on known viral sequences for primer design is important for documenting circulating viruses and for identifying those that might cause problems in the future. This is particularly of great importance for virus discovery and to characterize potentially zoonotic viruses in exotic hosts that have not been described. The bat Alphacoronaviruses described here had only a single genome available in GenBank and we have enriched the public database with thirty-four additional genomes. The rat Betacoronaviruses also provide description of novel, more distant members of a recognized but poorly characterized group of CoVs. The disease potential of these viruses in their normal host and their zoonotic potential are yet to determine. Certainly, the reported sequences would add important information to the growing collection of coronavirus diversity and particular with great relevance in the context of south and east Asia where SARS-CoV first emerged. We hope they are useful for future studies, for monitoring disease of unknown origin and for increasing our understanding of the molecular diversity of these viruses.

Accession numbers

The coronavirus sequences described in this study have been deposited in GenBank under the accession numbers MH687934–MH687978.
  42 in total

Review 1.  Rodent-borne diseases and their risks for public health.

Authors:  Bastiaan G Meerburg; Grant R Singleton; Aize Kijlstra
Journal:  Crit Rev Microbiol       Date:  2009       Impact factor: 7.624

2.  Hospital outbreak of Middle East respiratory syndrome coronavirus.

Authors:  Abdullah Assiri; Allison McGeer; Trish M Perl; Connie S Price; Abdullah A Al Rabeeah; Derek A T Cummings; Zaki N Alabdullatif; Maher Assad; Abdulmohsen Almulhim; Hatem Makhdoom; Hossam Madani; Rafat Alhakeem; Jaffar A Al-Tawfiq; Matthew Cotten; Simon J Watson; Paul Kellam; Alimuddin I Zumla; Ziad A Memish
Journal:  N Engl J Med       Date:  2013-06-19       Impact factor: 91.245

3.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

4.  Species-independent detection of RNA virus by representational difference analysis using non-ribosomal hexanucleotides for reverse transcription.

Authors:  Daiji Endoh; Tetsuya Mizutani; Rikio Kirisawa; Yoshiyuki Maki; Hidetoshi Saito; Yasuhiro Kon; Shigeru Morikawa; Masanobu Hayashi
Journal:  Nucleic Acids Res       Date:  2005-04-07       Impact factor: 16.971

Review 5.  Jumping species-a mechanism for coronavirus persistence and survival.

Authors:  Vineet D Menachery; Rachel L Graham; Ralph S Baric
Journal:  Curr Opin Virol       Date:  2017-03-31       Impact factor: 7.090

6.  Global patterns in coronavirus diversity.

Authors:  Simon J Anthony; Christine K Johnson; Denise J Greig; Sarah Kramer; Xiaoyu Che; Heather Wells; Allison L Hicks; Damien O Joly; Nathan D Wolfe; Peter Daszak; William Karesh; W I Lipkin; Stephen S Morse; Jonna A K Mazet; Tracey Goldstein
Journal:  Virus Evol       Date:  2017-06-12

7.  Discovery, diversity and evolution of novel coronaviruses sampled from rodents in China.

Authors:  Wen Wang; Xian-Dan Lin; Wen-Ping Guo; Run-Hong Zhou; Miao-Ruo Wang; Cai-Qiao Wang; Shuang Ge; Sheng-Hua Mei; Ming-Hui Li; Mang Shi; Edward C Holmes; Yong-Zhen Zhang
Journal:  Virology       Date:  2014-11-09       Impact factor: 3.616

8.  Viral population analysis and minority-variant detection using short read next-generation sequencing.

Authors:  Simon J Watson; Matthijs R A Welkers; Daniel P Depledge; Eve Coulter; Judith M Breuer; Menno D de Jong; Paul Kellam
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2013-02-04       Impact factor: 6.237

9.  Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

Authors:  Matthew Cotten; Bas Oude Munnink; Marta Canuti; Martin Deijs; Simon J Watson; Paul Kellam; Lia van der Hoek
Journal:  PLoS One       Date:  2014-04-02       Impact factor: 3.240

Review 10.  Ecology, evolution and classification of bat coronaviruses in the aftermath of SARS.

Authors:  Jan Felix Drexler; Victor Max Corman; Christian Drosten
Journal:  Antiviral Res       Date:  2013-10-31       Impact factor: 5.970

View more
  20 in total

Review 1.  A multicenter consensus: A role of furin in the endothelial tropism in obese patients with COVID-19 infection.

Authors:  Antoine Fakhry AbdelMassih; Jianping Ye; Aya Kamel; Fady Mishriky; Habiba-Allah Ismail; Heba Amin Ragab; Layla El Qadi; Lauris Malak; Mariam Abdu; Miral El-Husseiny; Mirette Ashraf; Nada Hafez; Nada AlShehry; Nadine El-Husseiny; Nora AbdelRaouf; Noura Shebl; Nouran Hafez; Nourhan Youssef; Peter Afdal; Rafeef Hozaien; Rahma Menshawey; Rana Saeed; Raghda Fouda
Journal:  Obes Med       Date:  2020-07-15

2.  Predicting host taxonomic information from viral genomes: A comparison of feature representations.

Authors:  Francesca Young; Simon Rogers; David L Robertson
Journal:  PLoS Comput Biol       Date:  2020-05-26       Impact factor: 4.475

3.  Domain-centric database to uncover structure of minimally characterized viral genomes.

Authors:  John C Bramley; Alex L Yenkin; Mark A Zaydman; Aaron DiAntonio; Jeffrey D Milbrandt; William J Buchser
Journal:  Sci Data       Date:  2020-06-25       Impact factor: 6.444

4.  Sample descriptors linked to metagenomic sequencing data from human and animal enteric samples from Vietnam.

Authors:  Mark Woolhouse; Jordan Ashworth; Carlijn Bogaardt; Ngo Tri Tue; Steve Baker; Guy Thwaites; Tran My Phuc
Journal:  Sci Data       Date:  2019-10-15       Impact factor: 6.444

Review 5.  Individual risk management strategy and potential therapeutic options for the COVID-19 pandemic.

Authors:  Amin Gasmi; Sadaf Noor; Torsak Tippairote; Maryam Dadar; Alain Menzel; Geir Bjørklund
Journal:  Clin Immunol       Date:  2020-04-07       Impact factor: 3.969

Review 6.  Gamma irradiation-mediated inactivation of enveloped viruses with conservation of genome integrity: Potential application for SARS-CoV-2 inactivated vaccine development.

Authors:  Fouad A Abolaban; Fathi M Djouider
Journal:  Open Life Sci       Date:  2021-06-02       Impact factor: 0.938

Review 7.  Influence of COVID-19 on the poultry production and environment.

Authors:  Hafez M Hafez; Youssef A Attia; Fulvia Bovera; Mohamed E Abd El-Hack; Asmaa F Khafaga; Maria Cristina de Oliveira
Journal:  Environ Sci Pollut Res Int       Date:  2021-07-09       Impact factor: 4.223

8.  Genetic characterization and phylogenetic analysis of porcine deltacoronavirus (PDCoV) in Shandong Province, China.

Authors:  Wenchao Sun; Li Wang; Haixin Huang; Wei Wang; Liang Cao; Jinyong Zhang; Min Zheng; Huijun Lu
Journal:  Virus Res       Date:  2020-01-18       Impact factor: 3.303

9.  Positive Selection of ORF1ab, ORF3a, and ORF8 Genes Drives the Early Evolutionary Trends of SARS-CoV-2 During the 2020 COVID-19 Pandemic.

Authors:  Lauro Velazquez-Salinas; Selene Zarate; Samantha Eberl; Douglas P Gladue; Isabel Novella; Manuel V Borca
Journal:  Front Microbiol       Date:  2020-10-23       Impact factor: 5.640

Review 10.  Photobiomodulation therapy as a high potential treatment modality for COVID-19.

Authors:  Sepideh Soheilifar; Homa Fathi; Navid Naghdi
Journal:  Lasers Med Sci       Date:  2020-11-25       Impact factor: 3.161

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.