Literature DB >> 28948041

Characterization of Posa and Posa-like virus genomes in fecal samples from humans, pigs, rats, and bats collected from a single location in Vietnam.

Bas B Oude Munnink1,2, My V T Phan1,2, Peter Simmonds3, Marion P G Koopmans2, Paul Kellam1,4, Lia van der Hoek5, Matthew Cotten1,2.   

Abstract

Porcine stool-associated RNA virus (posavirus), and Human stool-associated RNA virus (husavirus) are viruses in the order Picornavirales recently described in porcine and human fecal samples. The tentative group (Posa and Posa-like viruses: PPLVs) also includes fish stool-associated RNA virus (fisavirus) as well as members detected in insects (Drosophila subobscura and Anopheles sinensis) and parasites (Ascaris suum). As part of an agnostic deep sequencing survey of animal and human viruses in Vietnam, we detected three husaviruses in human fecal samples, two of which share 97-98% amino acid identity to Dutch husavirus strains and one highly divergent husavirus with only 25% amino acid identity to known husaviruses. In addition, the current study found forty-seven complete posavirus genomes from pigs, ten novel rat stool-associated RNA virus genomes (tentatively named rasavirus), and sixteen novel bat stool-associated RNA virus genomes (tentatively named basavirus). The five expected Picornavirales protein domains (helicase, 3C-protease, RNA-dependent RNA polymerase, and two Picornavirus capsid domain) were found to be encoded by all PPLV genomes. In addition, a nucleotide composition analysis revealed that the PPLVs shared compositional properties with arthropod viruses and predicted non-mammalian hosts for all PPLV lineages. The study adds seventy-six genomes to the twenty-nine PPLV genomes currently available and greatly extends our sequence knowledge of this group of viruses within the Picornavirales order.

Entities:  

Keywords:  Picornavirales; basavirus; husavirus; next generation sequencing; posavirus; rasavirus; virus discovery

Year:  2017        PMID: 28948041      PMCID: PMC5597861          DOI: 10.1093/ve/vex022

Source DB:  PubMed          Journal:  Virus Evol        ISSN: 2057-1577


1. Introduction

The order Picornavirales includes a wide range of viruses that infect a variety of hosts. According to the latest International Committee on Taxonomy of Viruses (ICTV) classification (ICTV 2017), the order comprises five families: the families Dicistroviridae and Iflaviridae contain members which infect insects (e.g. cripavirus and deformed wing virus), the family Secoviridae members which infect plants (e.g. turnip ringspot virus), the family Picornaviridae members infecting vertebrates (e.g. enteroviruses) and the family Marnaviridae. The latter contains only Heterosigma akashiwo virus for which algae is the natural host (Le Gall et al. 2008). Although members of the Picornavirales are highly diverse, they share a number of common features, including a single stranded positive-sense RNA genome and co-linear genes encoding a helicase, protease, and RNA-dependent RNA polymerase (RdRP) replication block (Le Gall et al. 2008). The genome lengths for Picornavirales range from 7.2 to 9.8 kb. Typically, the encoded polyprotein is cleaved by virus-encoded proteases (Blom et al. 1996). Generally, members of the Picornavirales are monopartite, although some members of the Secoviridae have genomes with two segments (Le Gall et al. 2008). Increasing improvements in next-generation sequencing (NGS) has identified a number of divergent members of the order Picornavirales. Porcine stool-associated RNA viruses (posaviruses) were found in the feces of healthy pigs and water collected from swine farms (Shan et al. 2011, Hause et al. 2015, 2016), fish stool-associated RNA virus (fisavirus) was identified in the intestinal content of a healthy carp (Reuter et al. 2015), and human stool-associated RNA virus (husavirus) was identified in the feces of predominantly healthy humans (Oude Munnink et al. 2015). Although structually closely related (based on the genome organisation), these viruses display broad genetic diversity with often less than 40% amino acid identity in specific coding regions thereby suggesting a deep evolutionary history of the virus family. Although posaviruses can be detected at high frequency in pig fecal samples (21%), a recent study using immunoprecipitation coupled with PCR detection assay showed that posavirus antibodies were infrequently detected (Hause et al. 2016). The possibility that posaviruses may not infect pigs but rather infect gut commensal organisms or have a dietary or environmental origin is supported by blast analysis of posavirus sequences that showed that some posavirus strains have greatest sequence similarity to an RNA sequence from the parasite Ascaris suum (Shan et al. 2011; Wang et al. 2011). Furthermore, a mRNA sequence from the mosquito Anopheles sinensis and a virus recently identified in the fruit fly Drosophila subobscura have been described showing some sequence identity to posaviruses (Webster et al. 2016). Although the viruses have been identifed in samples from different hosts, the true infection hosts for fisavirus, posavirus, and husavirus remain to be determined. As part of a study to define patterns of viral zoonosis in Vietnam (Rabaa et al. 2015), we performed detailed agnostic (random-primed) whole-genome deep sequencing (Cotten et al. 2014) on fecal samples from bats, humans, pigs, and rats and rectal swabs from humans and pigs. We have analyzed these sequence data for the presence of PPLVs and we describe here a large set of novel virus genomes from human, rat, pig, and bat samples that share homology and protein domain architecture with the previous described posaviruses.

2. Results

For simplicity, we will use the term Posa and Posa-like viruses (PPLVs) throughout the manuscript. The PPLV category comprises virus and virus sequences that show >30% amino acid homology to the existing posavirus and husavirus genome sequences, do not cluster within the five established Picornavirales families and show a Picornavirales genome organization with the expected five Picornavirales protein domains (see below for further details). A search for PPLV genomes in sequences was performed as follows: short read data (3–4 million 250 nt paired end reads per sample) were de novo assembled into longer sequence contigs and a protein sequence based USEARCH analysis (Edgar 2010) was performed against a database containing all Picornavirales protein entries in GenBank, including all known posaviruses sequences. This search identified three husaviruses, forty-seven posaviruses, ten novel rasaviruses, and sixteen novel basaviruses genomes. The genome lengths of the newly identified PPLV genome sequences varied from 8,262 to 11,318 nucleotides and for all viruses the read coverage across the genome and G + C content was determined. The results of these analyses and the available demographical data for these samples are summarized in Table 1.
Table 1.

Overview of the PPLVs identified in this study.

Illumina IDAccession numberENA number lane 1ENA number lane 2LineageSample sourceSample typeSource ageDate of sampleGenome lengthG + C content (%)Median depth of coveragea
Basavirus_16715_47KX673228ERR1301485ERR1301574Bv_7Scotophilus kuhliiFecalUnknown10 Jun 20149,59136.7116
Basavirus_16715_47_2KX673229ERR1301485ERR1301574Bv_3Scotophilus kuhliiFecalUnknown17 Jun 20149,58532.142
Basavirus_16715_5KX673230ERR1301446ERR1301535Bv_6Scotophilus kuhliiFecalUnknown12 Nov 20148,77640.313
Basavirus_16715_61KX673234ERR1301499ERR1301588Bv_7Scotophilus kuhliiFecalUnknown11 Jun 20149,56636.6129
Basavirus_16715_61_2KX673235ERR1301499ERR1301588Bv_6Scotophilus kuhliiFecalUnknown11 Jun 20149,06540.3208
Basavirus_16715_69KX673237ERR1301507ERR1301596Bv_3Scotophilus kuhliiFecalUnknown11 Jun 20149,52832.221
Basavirus_16715_71KX673238ERR1301509ERR1301598Bv_7Scotophilus kuhliiFecalUnknown11 Jun 20149,58536.9132
Basavirus_16715_71_2KX673239ERR1301509ERR1301598Bv_3Scotophilus kuhliiFecalUnknown11 Jun 20149,48332.129
Basavirus_16715_77KX673240ERR1301514ERR1301603Bv_1Scotophilus kuhliiFecalUnknown10 Jun 20148,82938.628
Basavirus_16715_84KX673241ERR1301519ERR1301608Bv_7Scotophilus kuhliiFecalUnknown17 Jun 20149,39136.948
Basavirus_16715_86KX673242ERR1301521ERR1301610Bv_3Scotophilus kuhliiFecalUnknown17 Jun 20149,53032.294
Basavirus_16845_64KX673243ERR1301829ERR1301914Bv_3Scotophilus kuhliiFecalUnknown18 Sep 20149,53732.2160
Basavirus_16845_79KX673244ERR1301842ERR1301927Bv_2Scotophilus kuhliiFecalUnknown18 Sep 20149,14735.955
Basavirus_16845_94KX673245ERR1301855ERR1301940Bv_4Scotophilus kuhliiFecalUnknown17 Sep 20148,26248.227
Basavirus_17819_2KX673289ERR1302928ERR1303004Bv_5Scotophilus kuhliiFecalUnknown12 Nov 20149,27146.9171
Basavirus_17819_8KX673290ERR1302934ERR1303010Bv_5Scotophilus kuhliiFecalUnknown12 Nov 20149,24047.1274
Husavirus_16370_59KX673221ERR1301365ERR1301415Hv_1Homo sapiensFecal7 years2 Apr 20138,85652.810
Husavirus_16915_89KX673248ERR1302009ERR1302077Hv_2Homo sapiensRectal swab51 years24 Mar 20138,57650.817
Husavirus_19344_29KX673274ERS1725523ERS1725523Hv_1Homo sapiensFecal59 years2 Oct 20149,0275397
Posavirus_12087_40KX673215ERR473400NAPv_6Sus domesticusFecal1 month12 Mar 201211,03643.832
Posavirus_12087_42KX673216ERR473402NAPv_6Sus domesticusFecal1.5 months14 Mar 201211,31844.124
Posavirus_12144_61KX673217ERR477293NAPv_1Sus domesticusFecal20 months10 Apr 20129,72935.8135
Posavirus_14226_38KX673218ERR775480NAPv_3Sus domesticusFecal15 months6 Mar 20129,19046.534
Posavirus_14226_39KX673219ERR775481NAPv_3Sus domesticusFecal15 months6 Mar 20129,07146.425
Posavirus_14250_11KX673220ERR779984NAPv_5Sus domesticusFecal2 months21 Mar 20129,68033.255
Posavirus_16915_3KX673246ERR1301944ERR1302014Pv_3Sus domesticusRectal swab5 months19 Mar 20139,29646.845
Posavirus_16915_5KX673247ERR1301946ERR1302016Pv_3Sus domesticusRectal swab5 months19 Mar 20139,20446.751
Posavirus_17489_2KX673249ERR1302404ERR1302484Pv_1Sus domesticusRectal swab5 months27 Mar 20139,83536.5127
Posavirus_17489_26KX673250ERR1302422ERR1302502Pv_1Sus domesticusRectal swab5 months3 Apr 20139,82736.4111
Posavirus_17489_27KX673251ERR1302423ERR1302503Pv_1Sus domesticusRectal swab5 months3 Apr 20139,77336.516
Posavirus_17489_28KX673252ERR1302424ERR1302504Pv_1Sus domesticusRectal swab5 months3 Apr 20139,82436.465
Posavirus_17489_30KX673253ERR1302426ERR1302506Pv_1Sus domesticusRectal swab5 months3 Apr 20139,67236.4193
Posavirus_17489_34KX673254ERR1302428ERR1302508Pv_5Sus domesticusRectal swab5 months3 Apr 20138,97031.52184
Posavirus_17489_35KX673255ERR1302429ERR1302509Pv_5Sus domesticusRectal swab5 months3 Apr 20139,73333.5146
Posavirus_17489_36KX673256ERR1302430ERR1302510Pv_5Sus domesticusRectal swab5 months3 Apr 20139,72933.5842
Posavirus_17489_39KX673257ERS1725815ERS1725815Pv_5Sus domesticusRectal swab5 months3 Apr 20139,64033.520
Posavirus_17489_4KX673258ERR1302405ERR1302485Pv_4Sus domesticusRectal swab5 months27 Mar 20138,97031.529
Posavirus_17489_4_2KX673259ERR1302405ERR1302485Pv_1Sus domesticusRectal swab5 months27 Mar 20139,64736.45
Posavirus_17489_40KX673260ERR1302433ERR1302513Pv_5Sus domesticusRectal swab5 months3 Apr 20139,65033.570
Posavirus_17489_45KX673261ERR1302438ERR1302518Pv_5Sus domesticusRectal swab5 months4 Apr 20139,71033.218
Posavirus_17489_47KX673262ERR1301975ERR1302045Pv_5Sus domesticusRectal swab5 months4 Apr 20139,78333.224
Posavirus_17489_5KX673263ERR1302406ERR1302486Pv_4Sus domesticusRectal swab5 months27 Mar 20139,22231.2101
Posavirus_17489_50KX673264ERR1302440ERR1302520Pv_6Sus domesticusRectal swab5 months4 Apr 201311,09444.645
Posavirus_17489_51KX673265ERR1302441ERR1302521Pv_4Sus domesticusRectal swab2 years4 Apr 20139,28631.4309
Posavirus_17489_60KX673266ERR1302450ERR1302530Pv_5Sus domesticusRectal swab5 months4 Apr 20139,85933.439
Posavirus_17489_7KX673267ERR1302407ERR1302487Pv_1Sus domesticusRectal swab5 months27 Mar 20139,43436.312
Posavirus_17489_86KX673268ERR1302473ERR1302553Pv_3Sus domesticusRectal swab5 months18 Apr 20139,28346.7237
Posavirus_17489_87KX673269ERR1302474ERR1302554Pv_3Sus domesticusRectal swab5 months18 Apr 20139,24246.714
Posavirus_17489_90KX673270ERR1302477ERR1302557Pv_3Sus domesticusRectal swab5 months18 Apr 20138,46946.614
Posavirus_17489_91KX673271ERR1302478ERR1302558Pv_3Sus domesticusRectal swab5 months18 Apr 20139,27246.966
Posavirus_17489_95KX673272ERR1302481ERR1302561Pv_4Sus domesticusRectal swab5 months18 Apr 20139,21630153
Posavirus_17489_95_2KX673273ERR1302481ERR1302561Pv_5Sus domesticusRectal swab5 months18 Apr 20139,57032.847
Posavirus_17668_10KX673276ERR1302758ERR1302847Pv_3Sus domesticusRectal swab5 months4 May 20139,30946.6154
Posavirus_17668_11KX673277ERR1302759ERR1302848Pv_1Sus domesticusRectal swab5 months4 May 20139,85536.5200
Posavirus_17668_11_2KX673278ERR1302759ERR1302848Pv_4Sus domesticusRectal swab5 months4 May 20139,06631.218
Posavirus_17668_12KX673279ERR1302760ERR1302849Pv_3Sus domesticusRectal swab5 months4 May 20139,1964638
Posavirus_17668_13KX673280ERR1302761ERR1302850Pv_5Sus domesticusRectal swab5 months4 May 20139,66633.431
Posavirus_17668_13_2KX673281ERR1302761ERR1302850Pv_3Sus domesticusRectal swab5 months4 May 20139,11146.152
Posavirus_17668_33KX673282ERR1302780ERR1302869Pv_6Sus domesticusRectal swab6 months16 Oct 201311,17143.578
Posavirus_17668_33_2KX673283ERR1302780ERR1302869Pv_3Sus domesticusRectal swab6 months16 Oct 20139,28946.326
Posavirus_17668_33_3KX673284ERR1302780ERR1302869Pv_9Sus domesticusRectal swab6 months16 Oct 20139,49847.177
Posavirus_17668_4KX673275ERR1302752ERR1302841Pv_5Sus domesticusRectal swab5 months4 May 20139,65133.530
Posavirus_17668_47KX673285ERR1302794ERR1302883Pv_6Sus domesticusRectal swab5 months16 Oct 201311,29545.199
Posavirus_17668_71KX673286ERR1302816ERR1302905Pv_7Sus domesticusRectal swab6 months17 Oct 20139,40446.347
Posavirus_17668_83KX673287ERR1302827ERR1302916Pv_7Sus domesticusRectal swab5 months17 Oct 20139,36646.946
Posavirus_17668_86KX673288ERR1302830ERR1302919Pv_7Sus domesticusRectal swab5 months17 Oct 20139,37746.842
Rasavirus_16715_12KX673222ERR1301453ERR1301542Rv_2Rattus argentiventerFecalUnknown12 Nov 20149,60140.2251
Rasavirus_16715_28KX673223ERR1301467ERR1301556Rv_2Rattus argentiventerFecalUnknown14 Nov 20149,60640.4823
Rasavirus_16715_3KX673224ERR1301444ERR1301533Rv_2Rattus argentiventerFecalUnknown12 Nov 20149,58040.517
Rasavirus_16715_36KX673225ERR1301474ERR1301563Rv_1Rattus argentiventerFecalUnknown14 Nov 20149,91144.316058
Rasavirus_16715_4KX673226ERR1301445ERR1301534Rv_2Rattus argentiventerFecalUnknown12 Nov 20149,61940.314
Rasavirus_16715_43KX673227ERR1301481ERR1301570Rv_2Rattus argentiventerFecalUnknown14 Nov 20149,56240.424
Rasavirus_16715_52KX673231ERR1301490ERR1301579Rv_2Rattus argentiventerFecalUnknown10 Jun 20148,67840.827
Rasavirus_16715_52_2KX673232ERR1301490ERR1301579Rv_1Rattus argentiventerFecalUnknown10 Jun 20148,49744.750
Rasavirus_16715_57KX673233ERR1301495ERR1301584Rv_2Rattus argentiventerFecalUnknown10 Jun 20149,59940.39097
Rasavirus_16715_67KX673236ERR1301505ERR1301594Rv_2Rattus argentiventerFecalUnknown11 Jun 20149,58440.3966

Median depth of coverage was determined by mapping all quality controlled reads to the final genome using Bowtie2 with –very-sensitive-local settings. Coverage was calculated as the number of reads mapped per genome/length of genome/divided by 129.

Overview of the PPLVs identified in this study. Median depth of coverage was determined by mapping all quality controlled reads to the final genome using Bowtie2 with –very-sensitive-local settings. Coverage was calculated as the number of reads mapped per genome/length of genome/divided by 129. In two human fecal samples, husaviruses (KX673274 and KX673221) showed high level of amino acid sequence identity to the previously described husaviruses KT215901, KT215902, and KT215903 (97–98% amino acid identity). In contrast, an additional husavirus detected in a human rectal swab (KX673248) showed only 25% amino acid identity over the entire polyprotein with other husaviruses. Posavirus sequences could be detected in thirty-three (of 189) pig rectal swabs (17% frequency) and in eight (of 146) pig fecal samples (5% frequency). In each of four pig rectal swabs (sample IDs 17189_4, 17819_95, 17668_11_2, and 17668_13), two distinct strains of posaviruses were identified, while in one sample (17668_33) three distinct posaviruses were identified. The posavirus sequences identified in this study have the closest sequence identity to variants detected in farmed pigs in the USA (Shan et al. 2011, Hause et al. 2015, 2016). Moreover, novel posa-like genome sequences were found in nine (of 45) rat fecal samples [provisionally named rat stool-associated RNA viruses (rasaviruses)], and in thirteen (of 135) bat fecal samples [provisionally named bat stool-associated RNA viruses (basaviruses)]. In one rat (16715_47) and in three bat fecal samples (16715_52, 16715_61, and 16715_71) two distinct rasa/basaviruses were identified. The low level of shared nucleotide identity between these novel viruses made it difficult to perform phylogenetic analyses at the whole genome level. Therefore, the protein sequence encoding the most conserved region, a provisional RdRP protein, was identified and used for phylogenetic analysis. This analysis supported a conclusion that two husaviruses (KX673221 and KX673274) belonged to a lineage that includes the previously described husaviruses (KT215901, KT215902, and KT215903), while husavirus KX673248 was distant (Fig. 1). Based on this phylogenetic analysis and using a pairwise amino acid identity cutoff of 40%, twenty-two lineages could be identified. While most lineages were found in only a single source type of sample (e.g. all porcine), the Bv_7 lineage comprised basaviruses and a virus isolated from a fruit fly Drosophila subobscura and the two lineages Pv_8 and Pv_9 comprised posaviruses and viral sequences derived from a parasite (Ascaris suum ; Shan et al. 2011) (Fig. 1). For each lineage, a representative virus genome (based on the most complete sequence of the twenty-nine sequences present in the GenBank database and the newly identified sequences) was selected and characterized in more detail (Supplementary Table S1). The length of these genomic sequences ranged from 8,576 nt to 11,318 nt with a G + C content of 31.0–53.0%. The twenty-two lineages share, on average, only 9–38% amino acid identity across the entire polyprotein (Fig. 2).
Figure 1.

Maximum-likelihood phylogenetic tree of the predicted RdRP protein domain. The maximum-likelihood phylogenetic tree was constructed in IQtree under the LG + G+I amino acid substitution model as the best-fitted model with 500 pseudo-replicates. The tree was visualized in FigTree1.4.2. Branches were colored according to the enteric samples from the hosts in which viruses were identified (blue: human, red: pig feces, dark green: rat feces, light green: fish intestinal content, brown: bat feces, and purple: insects and parasites). Significant bootstrap values (>80) are indicated with an *.

Figure 2.

Average percentage of amino acid identity across PPLVs lineages. The amino acid sequences of the complete polyprotein of different lineages were individually aligned against each other using the ClustalW in Geneious. The numbers on the x- and y-axes represent the different lineages (for more detail see Supplementary Table S1).

Maximum-likelihood phylogenetic tree of the predicted RdRP protein domain. The maximum-likelihood phylogenetic tree was constructed in IQtree under the LG + G+I amino acid substitution model as the best-fitted model with 500 pseudo-replicates. The tree was visualized in FigTree1.4.2. Branches were colored according to the enteric samples from the hosts in which viruses were identified (blue: human, red: pig feces, dark green: rat feces, light green: fish intestinal content, brown: bat feces, and purple: insects and parasites). Significant bootstrap values (>80) are indicated with an *. Average percentage of amino acid identity across PPLVs lineages. The amino acid sequences of the complete polyprotein of different lineages were individually aligned against each other using the ClustalW in Geneious. The numbers on the x- and y-axes represent the different lineages (for more detail see Supplementary Table S1). Phylogenetic analysis was also performed on amino acid sequences encoding the conserved helicase, 3C protease, capsid I and capsid II domains. Due to the high sequence diversity, each set of sequences was trimmed to the most conserved region of each identified conserved domain. Neighbor-joining (NJ) and maximum-likelihood (ML) trees were constructed individually for each of the conserved domain set of sequences. The NJ tree topology of the RdRp (left panel Supplementary Fig. S1A) was relatively consistent with the ML tree topology of RdRP (Fig. 1). However, this relatively consistency was not observed in the NJ versus ML trees in other domains (left panel compared right panel, Supplementary Fig. 1), probably due to the great sequencing divergence hence challenging proper ML tree inference. The prevalence of husaviruses among stools samples from Vietnamese individuals was 1.4% (1/71) in healthy human rectal swabs and 0.3% (2/573) in human diarrheal feces. Rasaviruses and basaviruses were detected in 22 and 9% of the rat and bat fecal samples, respectively. Pigs also commonly carry these viruses, with posavirus being found in 17% of the rectal swabs and in 5% of the fecal samples examined in this study. The frequency of virus detection was significantly higher in rectal swabs compared to fecal samples for posaviruses (P value = 0.002; Chi-squared test). The frequency of husavirus positive samples was too low to draw conclusions about the prevalence in rectal swabs compared to fecal samples (P value = 0.59; Fishers’ Exact test). While members of the Picornavirales typically contain a Hel-Pro-Pol replication block (Le Gall et al. 2008), some of the recently identified posaviruses initially appeared to not encode a recognizable conserved protease domain (Shan et al. 2011; Hause et al. 2015, 2016). A local HMMER search (Eddy 2011) using the complete PFAM library (Marchler-Bauer et al. 2015) failed to identify a recognizable picornavirus 3C protease domain in the majority of genomes. It was unlikely that these viruses completely lacked the protease and we suspected that the failure to detect the protease domain could be due to sequence diversity in the protease domain of RNA viruses (Koonin and Dolja 1993). Accordingly, a refined 3C protease HMM profile was constructed including all newly identified protease domains in posaviruses. A search using this refined protease domain profile identifed a putative protease domain in all of the posaviruses (Supplementary Table S1). In addition, all genomes were found to encode an RNA helicase domain, an RdRP domain and two picornavirus capsid domains [with the exception of Asv_1 since this GenBank entry is only partial and posavirus_3 where no conserved RNA helicase domain could be identified (Fig. 3)].
Figure 3.

Genomic organization of different identified PPLVs lineages. The putative conserved protein domains, as determined by a conserved domain search (see ‘Methods’), and their relative position in the viral genome are illustrated. The number next to the lineage name indicates the length of each genome, the Asv1 genome (marked with an *) was only partially (<5,000 nt) sequenced. Peach blocks indicate the presence and position of an RNA helicase domain (superfamily cl21455), red blocks indicate the 3C cysteine protease domain (superfamily cl02893), light blue blocks indicate an RNA-dependent RNA polymerase (RdRP) domain (superfamily cl02808) and dark blue blocks indicate picornavirus capsid domain (superfamily cl13999). All identified conserved domains are drawn in scale related to their genome size.

Genomic organization of different identified PPLVs lineages. The putative conserved protein domains, as determined by a conserved domain search (see ‘Methods’), and their relative position in the viral genome are illustrated. The number next to the lineage name indicates the length of each genome, the Asv1 genome (marked with an *) was only partially (<5,000 nt) sequenced. Peach blocks indicate the presence and position of an RNA helicase domain (superfamily cl21455), red blocks indicate the 3C cysteine protease domain (superfamily cl02893), light blue blocks indicate an RNA-dependent RNA polymerase (RdRP) domain (superfamily cl02808) and dark blue blocks indicate picornavirus capsid domain (superfamily cl13999). All identified conserved domains are drawn in scale related to their genome size. The G + C contents for all PPLV sequences were determined but no specific G + C content pattern was observed in virus sequences from different hosts. The husaviruses showed the highest G + C content (50.5–53.0%), followed by posaviruses (30.9–51.2%), rasaviruses (40.5–44.0%), and basaviruses (32.2–48.2%) (Table 1 and Supplementary Table S1). As previously described (Kapoor et al. 2010), nucleotide composition analysis (NCA) can be used to predict the host range of members of Supergroup 1 RNA virus, that includes the Picornavirales. Sequences from 105 PPLV genomes obtained in the current study and from published sources were analyzed using a pre-trained dataset of reference genomes from three categories of hosts (arthropod, plant, and vertebrate; Fig. 4). The analysis revealed that almost all posaviruses as well as all husa-, basa-, rasa-, fisa-, insect-, and nematoda viruses clustered within the arthropod group (Fig. 4). Two basaviruses (lineage Bv_2) cluster within the vertebrate group of the Picornaviridae and one husavirus (lineage Hv_2) clustered within the plant virus group of the Picornaviridae. These observations fell within the 5% error range of the analysis (95% prediction accuracy of the controls, Supplementary Table S2).
Figure 4.

Discriminant analysis of the dinucleotide bias in PPLVs compared to members of the Picornaviridae for which the infectious host is known. Viruses infecting arthropods are indicated with blue circles, viruses infecting plants in green circles and viruses infecting vertebrates in red circles. The lines indicate the 95% interval. Posaviruses are plotted in white diagonals, husavirus in light yellow, basavirus in yellow, rasavirus in dark yellow, fisavirus in orange and insect/nematode infecting viruses in dark orange.

Discriminant analysis of the dinucleotide bias in PPLVs compared to members of the Picornaviridae for which the infectious host is known. Viruses infecting arthropods are indicated with blue circles, viruses infecting plants in green circles and viruses infecting vertebrates in red circles. The lines indicate the 95% interval. Posaviruses are plotted in white diagonals, husavirus in light yellow, basavirus in yellow, rasavirus in dark yellow, fisavirus in orange and insect/nematode infecting viruses in dark orange.

3. Discussion

Here we report the identification of new Picornavirales members related to sequences previously identified in pig stool (posavirus) and human stools and/or rectal swabs (husavirus). In addition, we describe newly identified bat stool-associated viruses (basaviruses) and rat stool-associated viruses (rasaviruses) which have a similar genomic organization compared to posaviruses. Posaviruses are known to be widely distributed geographically with examples found in the USA (Shan et al. 2011) and in China (Zhang et al. 2014), however, this is the first detection of husavirus in human stools outside the Netherlands. These Posa and Posa-like virus genomes are collectively referred as PPLVs. The PPLV genomes were identified based on identity to previously identified posaviruses and their lack of close protein homology to any of the Picornavirales families and the presence of a set of five protein functional domains. Using standard phylogenetic analyses, the PPLVs formed lineages which are distinct from the five established Picornavirales families (Dicistroviridae, Iflaviridae, Marnaviridae, Picornaviridae and Secoviridae) and the unassigned Picornavirales genome sequences. However, there is as much diversity between the PPLVs as there is between the PPLVs and the established Picornavirales families. We used the USEARCH clustering algorithm in an attempt to determine how close the PPLV genomes are to existing Picornavirales genomes. All 108 PPLV genome sequences were combined with all available Picornavirales full genome sequences from GenBank (5,766 genomes, excluding those with stretches if Ns greater than 20). At various levels of homology (ranging from 60% to 90% nucleotide identities), the PPLV sequence clusters were distinct from the clusters formed from the Picornavirales genomes, i.e. there were no clusters containing both PPLV sequences and genomes classified in one of the Picornavirales families (results not shown). Thus we think it is valid to conclude that none of the PPLVs belong to established Picornavirales families. The PPLV group is however too diverse to be classified as a single virus family. Given the pace at which new virus sequences are becoming available, we believe the best approach is to deposit these sequences with a tentative identification as PPLV and as more detailed sequence data become available a better organization of these virus sequences into well-supported family or families can be made. Members of PPLV group have now been identified in pigs, humans, fish, rats, bats, insects (Anopheles sinensis and Drosophila subobscura), and parasites (Ascaris suum). Based on phylogenetic analysis of the RdRP domain and pairwise comparisons of the entire polyprotein, we propose that the PPLVs comprise twenty-two phylogenetic lineages. These PPLVs could also be grouped in twenty-two lineages based on the NJ trees constructed from amino acid sequences encoding other conserved domains (putative helicase, protease, capsid I, and II, Supplementary Figs S1A–D). Consistent with other members of the Picornavirales, most of the newly described PPLV sequences encoded a Hel-Pro-Pol replication block. However, in some of the genome sequences, no recognizable protease domain could be identified using conventional methods with an existing pre-made PFAM domain based on a limited number of picornavirus protease domains. However, a more detailed protease domain database based on a broader set of Picornavirales proteases, including the novel putative posavirus protease domains, revealed the presence of a protease domain across the entire range of PPLV genomes (Fig. 3). In an attempt to infer a putative cellular host for the PPLVs, a nucleotide composition analysis (NCA) was performed. NCA incorporates composition measures of dinucleotide frequencies and has been used to predict the infectious hosts of members of RNA virus supergroup I (Kapoor et al. 2010). In a set of sequences for which the infectious host was known, the analysis was able to accurately classify viruses as either being of vertebrate, plant, or arthropod origin in around 95% of the cases (Koonin et al. 2008). Using this analysis method, the PPLV genome sequences were found to cluster with viruses from the arthropod group (Fig. 4). The two outliers of the discriminant analysis (lineage HV_2 and Bv_2) fall within the 95% confidence interval, but given their substantial sequence divergence from other PPLVs it is possible that these viruses infect another hosts. The prevalence of husaviruses in fecal samples (0.3%) and rectal swabs (1.4%) in Vietnam was lower than the 3.5% prevalence observed in a cohort of predominantly healthy HIV-1 positive and negative individuals (Oude Munnink et al. 2015). Prevalence differences may be due to differences in RT-PCR detection versus genome assembly from next generation sequencing, the small sample numbers and/or true differences between the cohorts. Of interest, posavirus could be detected significantly more often in pig rectal swabs compared to pig fecal samples (P = 0.002), suggesting that the viruses are enriched on the rectal epidermis. This enrichment and the clustering of posaviruses with the arthropod viruses may be consistent. It is known that intestinal parasites can be found perirectally and can be detected using the scotch tape test (Enterobius_Vermicularis_Diagnostic_Test). An interesting follow-up analysis would be to determine the scotch tape virome and our prediction would be that members of the PPLV group can be found in these samples. In summary, this study provides a large set of seventy-six new PPLV genomes, quadrupling the available genomic data for this broad group viruses. A novel Vietnamese husavirus genetically distant from the previously described husaviruses was identified and PPLV members were also detected in rat and bat feces. In addition, we were able to clarify two additional features of posavirus virology: a putative protease domain was detected in all PPLV genomes and NCA revealed that members of the PPLV group share a conserved nucleotide composition with viruses infecting members of the arthropod phylum.

4. Methods

4.1 Samples

Fecal material was collected from 135 bats (Scotophilus kuhlii), 573 humans (Homo sapiens), 146 pigs (Sus domesticus), and 45 rats (Rattus argentiventer). In addition, rectal swabs were collected from seventy-one humans and 189 pigs. These samples were collected from a 150 square kilometer area of Dong Thap province, a southern region within the Mekong Delta River in Vietnam. All fecal samples from human enrollees were diarrheal patients admitted to Dong Thap Provincial hospital, while human rectal swabs were taken from healthy farmers and family members. Pig fecal samples and rectal swabs were collected from individual pigs from breeding farms. Rat fecal samples were collected from rats, which were purchased on the market or collected from rice-field traps. The disease state of these animals is unknown. Bat fecal samples were collected from beneath roosting sites. Ethical approval for the study was obtained from the Oxford Tropical Research Ethics Committee (OxTREC Approval No. 15-12) (Oxford, United Kingdom), the institutional ethical review board of Dong Thap Provincial Hospital (DTPH) and the Sub-Department of Animal Health Dong Thap province (Dong Thap, Vietnam).

4.2 Illumina sequencing

Fecal samples (n = 899) or rectal swabs (n = 260) were centrifuged for 10 min at 10,000 × g after which the samples were DNase treated at 37 °C for 30 min (20 U of TURBO DNase, Thermo Fisher per 100 µl of sample). Nucleic acids were extracted, transcribed into cDNA and subjected to second strand synthesis (de Vries et al. 2011, 2012). The resulting dsDNA from each sample was sheared and fractionated to 400–500 bp in length after which Illumina adapters with a unique barcode were ligated to the fragments. Resulting libraries were sequenced with the Illumina MiSeq or HiSeq platforms to generate 1–2 million 150 nt (MiSeq) or 3–4 million 250 nt (HiSeq) paired-end reads per sample.

4.3 De novo assembly and complete genome characterization

Adaptor sequences were removed and sequence reads that passed quality control were de novo assembled using SPAdes version 3.5.0 (Bankevich et al. 2012) followed by improve_assembly (Page 2012). The resulting contigs were subjected to a modified protein blast search using USEARCH (Edgar 2010) to identify novel members of the Picornavirales. To minimize the effects of Illumina cross-talk, all preliminary contigs were examined and contigs within a sample with low median coverage (greater than 10-fold lower than the major contig in the sample) were excluded from the analysis. For all PPLVs reported here, the complete or nearly complete (>8,000 nt) genome was obtained and for all viruses the genome coverage was determined by mapping all quality controlled sequence reads to the final genome. The G + C content was determined using Geneious (Kearse et al. 2012). To determine the average percentage amino acid identity across the PLLV lineages, amino acid sequences were aligned using the ClustalW in Geneious (Kearse et al. 2012). To identify conserved protein domains encoded by the new genomes an RPS-BLAST search (Marchler-Bauer et al. 2015) against the Conserved Domain Database (CDD) was performed. The initial screen identified the helicase, RdRP, and picornavirus capsid (I and II) domains across almost all genomes. However, the 3C protease domain was identifed in only a subset of genomes, suggesting either a true absence or a mis-identification due to great sequence divergence. A modified 3C protease domain profile was generated from a protein sequence alignment of the conserved domain (pfam00548) from the CDD and used to identify the 3C protease-like regions in the new PPLV genomes. An updated alignment containing all the putative protease domains used to create a new HMM index file. A local hmmsearch analysis with this updated 3C protease profile was then performed to identify divergent putative protease domains in the PPLV genome sequences.

4.4 Discriminant analysis of the dinucleotide bias

Nucleotide composition analysis (NCA) was performed as previously described (Kapoor et al. 2010) using sequences of members of RNA virus supergroup 1 (Koonin et al. 2008) infecting vertebrates (n = 113), arthropods (n = 66), and plants (n = 172) for classification. The frequencies of each mononucleotide and dinucleotide were used for discriminant analysis to maximize discrimination between control sequences; these canonical factors were then used to infer the host origin of the RNA virus sequences obtained in the current study.

4.5 Phylogenetic analysis

All PPLV sequences identified in this study combined with all complete PPLV genomes present in the GenBank database (retrieved on 16 July 2016) were aligned using muscle (Edgar 2004). Amino acids sequences were trimmed to the region encoding for the conserved domains and alignments were manually inspected and trimmed to the most conserved part. Phylogenetic analyses were performed on the conserved putative conserved domains using IQtree (Nguyen et al. 2015), under the best-fitted amino acid model with 500 pseudo-replicates. The resulting trees were visualized using FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).

4.6 Statistical analysis

Statistical analysis was performed using the two by two table from Open Epi (Sullivan et al. 2009). As a measure of association, the Chi-squared test or the Fishers’s exact test was used.

4.7 GenBank accession numbers

All PPLV genome sequences generated in this study were deposited into the GenBank database under the accession numbers KX673215–KX673290. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  25 in total

1.  A novel genus in the order Picornavirales detected in human stool.

Authors:  Bas B Oude Munnink; Matthew Cotten; Martin Deijs; Maarten F Jebbink; Margreet Bakker; Seyed Mohammad Jazaeri Farsani; Marta Canuti; Paul Kellam; Lia van der Hoek
Journal:  J Gen Virol       Date:  2015-11       Impact factor: 3.891

2.  OpenEpi: a web-based epidemiologic and statistical calculator for public health.

Authors:  Kevin M Sullivan; Andrew Dean; Minn Minn Soe
Journal:  Public Health Rep       Date:  2009 May-Jun       Impact factor: 2.792

Review 3.  Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences.

Authors:  E V Koonin; V V Dolja
Journal:  Crit Rev Biochem Mol Biol       Date:  1993       Impact factor: 8.250

4.  Viral metagenomics analysis demonstrates the diversity of viral flora in piglet diarrhoeic faeces in China.

Authors:  Bin Zhang; Cheng Tang; Hua Yue; Yupeng Ren; Zhigang Song
Journal:  J Gen Virol       Date:  2014-04-09       Impact factor: 3.891

5.  CDD: NCBI's conserved domain database.

Authors:  Aron Marchler-Bauer; Myra K Derbyshire; Noreen R Gonzales; Shennan Lu; Farideh Chitsaz; Lewis Y Geer; Renata C Geer; Jane He; Marc Gwadz; David I Hurwitz; Christopher J Lanczycki; Fu Lu; Gabriele H Marchler; James S Song; Narmada Thanki; Zhouxi Wang; Roxanne A Yamashita; Dachuan Zhang; Chanjuan Zheng; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

6.  A novel posavirus-related single-stranded RNA virus from fish (Cyprinus carpio).

Authors:  Gábor Reuter; Péter Pankovics; Eric Delwart; Ákos Boros
Journal:  Arch Virol       Date:  2014-12-07       Impact factor: 2.574

7.  Highly diverse posaviruses in swine faeces are aquatic in origin.

Authors:  Ben M Hause; Rachel Palinski; Richard Hesse; Gary Anderson
Journal:  J Gen Virol       Date:  2016-03-22       Impact factor: 3.891

8.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

Review 9.  The Vietnam Initiative on Zoonotic Infections (VIZIONS): A Strategic Approach to Studying Emerging Zoonotic Infectious Diseases.

Authors:  Maia A Rabaa; Ngo Tri Tue; Tran My Phuc; Juan Carrique-Mas; Karen Saylors; Matthew Cotten; Juliet E Bryant; Ho Dang Trung Nghia; Nguyen Van Cuong; Hong Anh Pham; Alessandra Berto; Voong Vinh Phat; Tran Thi Ngoc Dung; Long Hoang Bao; Ngo Thi Hoa; Heiman Wertheim; Behzad Nadjm; Corina Monagin; H Rogier van Doorn; Motiur Rahman; My Phan Vu Tra; James I Campbell; Maciej F Boni; Pham Thi Thanh Tam; Lia van der Hoek; Peter Simmonds; Andrew Rambaut; Tran Khanh Toan; Nguyen Van Vinh Chau; Tran Tinh Hien; Nathan Wolfe; Jeremy J Farrar; Guy Thwaites; Paul Kellam; Mark E J Woolhouse; Stephen Baker
Journal:  Ecohealth       Date:  2015-09-24       Impact factor: 3.184

10.  Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

Authors:  Matthew Cotten; Bas Oude Munnink; Marta Canuti; Martin Deijs; Simon J Watson; Paul Kellam; Lia van der Hoek
Journal:  PLoS One       Date:  2014-04-02       Impact factor: 3.240

View more
  12 in total

1.  Viruses associated with ill health in wild chimpanzees.

Authors:  Jacob D Negrey; John C Mitani; Richard W Wrangham; Emily Otali; Rachna B Reddy; Tressa E Pappas; Kristine A Grindle; James E Gern; Zarin P Machanda; Martin N Muller; Kevin E Langergraber; Melissa Emery Thompson; Tony L Goldberg
Journal:  Am J Primatol       Date:  2022-01-11       Impact factor: 3.014

2.  Genetic diversity and cross-species transmission of kobuviruses in Vietnam.

Authors:  Lu Lu; Nguyen Van Dung; Alasdair Ivens; Carlijn Bogaardt; Aine O'Toole; Juliet E Bryant; Juan Carrique-Mas; Nguyen Van Cuong; Pham Hong Anh; Maia A Rabaa; Ngo Tri Tue; Guy E Thwaites; Stephen Baker; Peter Simmonds; Mark Ej Woolhouse
Journal:  Virus Evol       Date:  2018-02-13

3.  Complex virome in feces from Amerindian children in isolated Amazonian villages.

Authors:  Juliana D Siqueira; Maria Gloria Dominguez-Bello; Monica Contreras; Orlana Lander; Hortensia Caballero-Arias; Deng Xutao; Oscar Noya-Alarcon; Eric Delwart
Journal:  Nat Commun       Date:  2018-10-15       Impact factor: 14.919

4.  Enteric virome of Ethiopian children participating in a clean water intervention trial.

Authors:  Eda Altan; Kristen Aiemjoy; Tung G Phan; Xutao Deng; Solomon Aragie; Zerihun Tadesse; Kelly E Callahan; Jeremy Keenan; Eric Delwart
Journal:  PLoS One       Date:  2018-08-16       Impact factor: 3.240

5.  Optimization and evaluation of a non-invasive tool for peste des petits ruminants surveillance and control.

Authors:  Arnaud Bataille; Olivier Kwiatek; Salima Belfkhi; Lucile Mounier; Satya Parida; Mana Mahapatra; Alexandre Caron; Chobi Clement Chubwa; Julius Keyyu; Richard Kock; Bryony A Jones; Geneviève Libeau
Journal:  Sci Rep       Date:  2019-03-18       Impact factor: 4.379

6.  Sample descriptors linked to metagenomic sequencing data from human and animal enteric samples from Vietnam.

Authors:  Mark Woolhouse; Jordan Ashworth; Carlijn Bogaardt; Ngo Tri Tue; Steve Baker; Guy Thwaites; Tran My Phuc
Journal:  Sci Data       Date:  2019-10-15       Impact factor: 6.444

7.  The Husavirus Posa-Like Viruses in China, and a New Group of Picornavirales.

Authors:  Zhenzhi Han; Jinbo Xiao; Yang Song; Mei Hong; Guolong Dai; Huanhuan Lu; Man Zhang; Yueling Liang; Dongmei Yan; Shuangli Zhu; Wenbo Xu; Yong Zhang
Journal:  Viruses       Date:  2020-09-07       Impact factor: 5.048

8.  Multiple clades of Husavirus in South America revealed by next generation sequencing.

Authors:  Endrya do Socorro Fôro Ramos; Ulisses Alves Rosa; Geovani de Oliveira Ribeiro; Fabiola Villanova; Flávio Augusto de Pádua Milagres; Rafael Brustulin; Vanessa Dos Santos Morais; Emerson Luiz Lima Araújo; Ramendra Pati Pandey; V Samuel Raj; Ester Cerdeira Sabino; Xutao Deng; Eric Delwart; Adriana Luchs; Élcio Leal; Antonio Charlys da Costa
Journal:  PLoS One       Date:  2021-03-23       Impact factor: 3.240

9.  Full-Genome Sequences of Alphacoronaviruses and Astroviruses from Myotis and Pipistrelle Bats in Denmark.

Authors:  Christina M Lazov; Graham J Belsham; Anette Bøtner; Thomas Bruun Rasmussen
Journal:  Viruses       Date:  2021-06-04       Impact factor: 5.048

10.  Detection of novel RNA viruses from free-living gorillas, Republic of the Congo: genetic diversity of picobirnaviruses.

Authors:  Raja Duraisamy; Jean Akiana; Bernard Davoust; Oleg Mediannikov; Caroline Michelle; Catherine Robert; Henri-Joseph Parra; Didier Raoult; Philippe Biagini; Christelle Desnues
Journal:  Virus Genes       Date:  2018-02-23       Impact factor: 2.332

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.