Literature DB >> 35234620

Viral metagenomics reveals diverse viruses in the fecal samples of children with diarrhea.

Shixing Yang¹, Yumin He², Ju Zhang², Dianqi Zhang², Yan Wang³, Xiang Lu², Xiaochun Wang², Quan Shen², Likai Ji³, Hongyan Lu⁴, Wen Zhang⁵.

Abstract

Diarrhea is the third leading cause of death in developing countries in children under the age of five. About half a million children die of diarrhea every year, most of which in developing countries. Viruses are the main pathogen of diarrhea. In China, the fecal virome of children with diarrhea has been rarely studied. Using an unbiased viral metagenomics approach, we analyzed the fecal virome in children with diarrhea. Many DNA or RNA viruses associated with diarrhea identified in those fecal samples were mainly from six families of Adenoviridae, Astroviridae, Caliciviridae, Parvoviridae, Picornaviridae, and Reoviridae. Among them, the family of Caliciviridae accounts for the largest proportion of 78.42%, following with Adenoviridae (8.94%) and Picornaviridae (8.36%). In addition to those diarrhea-related viruses that have already been confirmed to cause human diarrhea, the viruses not associated with diarrhea were also identified including anellovirus and picobirnavirus. This study increased our understanding of diarrheic children fecal virome and provided valuable information for the prevention and treatment of viral diarrhea in this area.

Entities: Chemical

Keywords: Children diarrhea; Fecal samples; Viral metagenomics; Virus evolution

Mesh：

Year: 2022 PMID： 35234620 PMCID： PMC8922427 DOI： 10.1016/j.virs.2022.01.012

Source DB: PubMed Journal: Virol Sin ISSN： 1995-820X Impact factor: 4.327

Introduction

Diarrheal diseases are the third leading cause of mortality among children aged five years and below in non-industrialized countries, such as Southeast Asia and sub-Saharan Africa (GBD Diseases and Injuries Collaborators, 2020). Although benefitting from the improvement of sanitary conditions and living standards in recent years, the mortality and morbidity have been significantly reduced. Pediatric diarrhea caused by viral infections is still a major public health problem in developing countries (Troeger et al., 2018). Viruses are the main pathogen of diarrhea that account for 75% of the cases. Viruses confirmed that can cause diarrhea include human adenovirus (HAdV) (Qiu et al., 2018), norovirus (NoV) (Wang et al., 2019), sapovirus (SaV) (Becker-Dreps et al., 2020), human astrovirus (HAstV) (Olortegui et al., 2018), rotavirus (Yu et al., 2019a), parvovirus (Rikhotso et al., 2020), and picornavirus (Shen et al., 2019). Rotavirus was the most common cause of severe and fatal diarrhea. Group A rotavirus cause 25%–65% of severe infantile gastroenteritis worldwide (Tate et al., 2016). The mortality and morbidity of severe diarrhea in childhood due to rotavirus infection have substantial decrease following introduction of rotavirus vaccine (do Carmo et al., 2011). HAdV and NoV are the second and third leading etiological agent for diarrhea mortality in children respectively. HAstV, SaV and several viruses belonging to the family Picornaviridae can cause sporadic diarrhea in children. In addition, some unexplained cases may also be caused by unknown virus. Those traditional methods for virus discovery such as tissue culture, electron microscopy, serology, and polymerase chain reaction don't fit for the detection of novel or highly mutated viruses. Viral metagenomics approach has proved to be powerful tool for exploring new and existing viruses. Since the first application of metagenomics in virus discovery from soil samples collected in San Diego, viral metagenomics approach has been widely used in different research areas, including Marine ecological research, plant and agriculture, human genetics and human disease diagnosis (Breitbart et al., 2002; Neelakanta et al., 2013). In the last decade, the advent of viral metagenomics unravels a previously underestimated virome present in the human gut, but to date, little is known about virome in diarrheic children. In the current study, we investigated viral composition of fecal samples from diarrheic children in Jiangsu Province of China using a viral metagenomics approach. The objective of this study is to assess (1) if any novel viruses can be identified and (2) which known diarrhea-related viruses are prevalent in this area. We identified many DNA or RNA viruses associated with diarrhea from 100 children fecal samples. The results of this study provide valuable information for the prevention and treatment of viral diarrhea in this area.

Materials and methods

Sample collection and preparation

One hundred infants and children with signs and symptoms of diarrhea from January to December in 2016 were enrolled in this study. The patients were aged below five years old who attended the Affiliated Hospital of Jiangsu University. One hundred fresh stool samples were collected by disposable materials and were shipped on dry ice. About one gram of each fecal sample was re-suspended in 2 mL phosphate-buffered saline (PBS), vigorously vortexed for 10 min, and was then centrifuged at 14,000×g for 10 min. Finally every fecal supernatant was collected in a new 1.5 mL centrifuge tube and was stored at −80 °C for further use.

Viral nucleic acid extraction

Ten pools were randomly generated, each of which contained ten fecal supernatants. A total of 500 μL fecal supernatant equal from ten feces samples was mixed together and filtered through a 0.45-μm filter (Merck Millipore, MA, USA) to remove bacterial and eukaryotic cell-sized particles, and 200 μL supernatant from each pool was treated with a mixture of nuclease enzymes to digest unprotected nucleic acid at 37 °C for 60 min (Zhang et al., 2017). Viral RNA and DNA were extracted by using the QIAamp MinElute Virus Spin Kit (Qiagen, HQ, Germany).

Library construction and bioinformatics analysis

cDNA of viral RNA was synthesized by reverse transcription with six base random primers, then Klenow Fragment DNA polymerase (M0210L, New England Biolabs, MA, USA) was used to generate the complementary chain of cDNA. Ten libraries were constructed using the Nextera XT DNA Sample Preparation Kit (Illumina, CA, USA) and were sequenced using the Miseq Illumina platform with 250 bases paired ends with dual barcoding for each pool. For bioinformatics analysis, paired-end reads of 250 bp generated by MiSeq were debarcoded using vendor software from Illumina. An in-house analysis pipeline running on a 32-nodes Linux cluster was used to process the data. Low sequencing quality tails were trimmed using Phred quality score 10 as the threshold. Adaptors were trimmed using the default parameters of VecScreen which was NCBI BLASTn with specialized parameters designed for adaptor removal. Bacterial reads were subtracted by mapping to the bacterial nucleotide sequences from the BLAST NT database using Bowtie2 v2.2.4. The cleaned reads were de novo assembled by SOAPdenovo2 version r240 using Kmer size 63 with default settings (Deng et al., 2015). The assembled contigs, along with singlets were aligned to an in-house viral proteome database using BLASTx (v.2.2.7) with an E-value cutoff of <10−5. The candidate viral hits were then compared to an in-house non-virus non-redundant (NVNR) protein database to remove false positive viral hits, where the NVNR database was compiled using non-viral protein sequences extracted from NCBI nr fasta file (based on annotation taxonomy excluding Virus Kingdom). For obtaining the complete genome or longer contigs, each viral contig was used as one reference for mapping to the raw data using the Low Sensitivity/Fastest parameter in Geneious v11.1.2.

Phylogenetic analysis

Phylogenetic analysis was performed based on the complete genome or partial gene sequences in the present study, whilst their closest viral relatives were based on best BLASTn hits and representative members of related viral species or genera. Sequence alignment was done with MUSCLE implemented in MEGA-X (Kumar et al., 2018). Some phylogenetic trees with 1000 bootstrap resamples of the alignment data set were generated using the neighbor-joining method based on the p-distances model in MEGA-X. Bootstrap values for each node were given. Other phylogenetic trees were constructed using MrBayes v3.2.7 (Ronquist et al., 2012). The Markov chain was run for a maximum of one million generations, in which every 50 generations were sampled and the first 25% of Markov chain Monte Carlo samples were discarded as burn-in.

Nucleotide sequence accession number

The genome and fragments of viruses obtained in this study were deposited in GenBank with the accession numbers: MZ546174–MZ546192, and MZ576278–MZ576282. The raw sequence reads from metagenomic library were deposited in the Shirt Read Archive of GenBank database under accession number: SRX4189498.

Results

Viral metagenomic overview

The ten libraries totally generated 3,612,710 raw sequence reads on Illumina Miseq platform (Supplementary Table S1). After bioinformatics analysis, a total of 1,184,063 sequences reads had the best matches with viral protein (Supplementary Table S1), accounting for 32.77% of the total number of raw data reads. The majority of the eukaryotic viruses identified in this study were shown in Supplementary Table S2. The percentage of viral reads belonging to different viral families in ten libraries was analyzed. The result showed that the most abundant eukaryotic virus family was Caliciviridae (78.42% of the total analyzed virus reads), followed by the virus families of Adnoviridae (8.94%), Picornaviridae (8.36%), Parvoviridae (2.81%), Astroviridae (1.18%), Reoviridae (0.20%), Anelloviridae (0.04%), and Picobirnaviridae (0.03%) (Fig. 1A, Supplementary Table S2).

Fig. 1

Taxonomic analysis of fecal virome detected in diarrheal children on the family level. A The composition of fecal virome detected in diarrheal children. The percentage of virus sequences in different virus family was shown. B Heatmap representing the reads number of each viral family in exponential form. Eukaryotic viral families present in different libraries were shown in the form of heat map (Fig. 1B). Five viral families were separately detected in library 6, four viral families in library 8, 9, and 10 respectively, three viral families in library 2, two viral families in three different libraries including library 4, 5, and 7, while only one viral family in library 1 and 3. Those eukaryotic viruses associated with diarrhea were further analyzed by sequence comparison and phylogenetic analysis.

Nine viruses belonging to two viral genera of the family Caliciviridae

In total, Caliciviridae reads were found in eight libraries belonging to either the Norovirus or Sapovirus genus.

NoV

In this study, NoVs were identified from eight out of ten libraries (library 2, library 4–10), where eight nearly complete genome of NoVs were obtained using the assemble sequences program in Geneious 11.1.2 and were named as Norozj-2, and Norozj-4 to 10, respectively. The genomes of those NoVs are 7,379 nt to 7,738 nt in length, which have three open reading frames (ORF1, ORF2, and ORF3). The length of ORF1 are 5094 nt except for Norozj-2 and Norozj-8 which have only partial ORF1. ORF2 of all the NoVs in this study are 1623 nt in length. The length of ORF3 is 807 nt except Norozj-9, which has 33 nt missing. The ORF1 encodes viral nonstructural protein of 1,697 amino acids (aa), while the ORF2 and ORF3 encode VP1 with 540 aa and VP2 with 268 aa, respectively (the VP2 of Norozj-9 is 257 aa in length). The phylogenetic trees were constructed based on the nearly complete genome that included the reference strains of five different genotypes. The result indicated that the eight NoVs belonged to two genotypes (Fig. 2A), seven of which were GII.4 including Norozj-2, 4–8, and 10 and only Norozj-9 clustered with one GII.17 strain (KJ96295) forming an independent clade. The three strains (Norozj-2, 4 and 5) shared more than 99.18% nt identity to the strain NORO_156_02_03_2015 (MH218627) isolated from United Kingdom in 2015. The Norozj-7 and Norozj-8 had more than 99.24% nt identity to the strain Hu/NOV/GII/CHN/ZJ01 (KX586330) isolated from Jiangsu Province of China in 2015. The Norozj-6 shared 99.35% nt identity to the strain Hu/Guangzhou/GZ2014-L132/CHN/2014 (KT202796) isolated from Guangdong Province of China in 2014. The Norozj-10 shared 98.87% nt identity to the strain Hu/GII.4/GII4-NG1242/2011/JP (AB972502) isolated from Japan in 2014. The Norozj-9 shared 97.47% nt identity to the strain Musashimurayama/TAKAsanKimchi (KJ196295) isolated from Japan in 2014.

Fig. 2

The phylogenetic analysis based on the complete genome sequences of NoVs and SaVs. A The phylogenetic trees were constructed based on the nearly complete genome of NoVs that included the reference strains of five different genotypes. B The phylogenetic trees were constructed based on the nearly complete genome of SaVs that included the reference strains of four different genogroups. Viruses identified in this study were marked with red solid circle. NoV, Norovirus; SaV, Sapovirus.

SaV

One complete SaV genome was present in library 9 and was named as Sapozj-9. The genome of Sapozj-9 is 7,542 nt in length, which contains two ORFs (ORF1 and ORF2). The length of ORF1 is 6,843 nt, which encodes a polyprotein of 2,280 aa containing seven nonstructural proteins (NS1–7) followed by the major capsid protein, VP1. The conserved calicivirus NTPase motif “480GAPGIGKT487” and VPg motifs “941KGKTK945”, and “962DDEYDE967” were found in NS3 and NS5, respectively. VPg is linked to the 5′ terminal of the viral RNA and has an important role in SaV genome replication, transcription, and translation (Leen et al., 2013). The length of ORF2 is 498 nt, which encodes the minor structural protein VP2 (165 aa). The phylogenetic tree was constructed based on the nearly complete genome that included the reference strains of four different genogroups. Phylogenetic analysis showed that Sapozj-9 clustered with the strain Hu/GI/Zhejiang1/China/2014 (KT327081) and the strain Hu/GI/Hunan/China/2016 (KX980412) forming an independent clade (Fig. 2B). The Sapozj-9 shared the highest nt identity (99.53%) to the strain Hu/GI/Zhejiang1/China/2014 (KT327081) isolated from Zhejiang Province of China in 2014.

Seven viruses belonging to two viral genera of the family Picornaviridae

In this study, seven out of the ten libraries contained Picornaviridae reads including Enterovirus genus and Parechovirus genus.

Enterovirus (EV)

In this study, three nearly complete genomes of EVs were obtained and were named as CxA6zj-6, Polozj-3, and Rhozj-5, respectively. The genome of CxA6zj-6, Polozj-3, and Rhozj-5 are 7,547 nt, 7,459 nt, and 7,178 nt in length respectively, among which have one ORF encoding 2,201 aa, 2,207 aa, and 2,207 aa polyproteins, respectively. The conserved NTPase motif “GSPGTGKS” and RNA-dependent RNA polymerase motif “YGDD” were found in their polyprotein. The CxA6zj-6 shares the highest nt identity (98.84%) to the strain CVA6/S1998/BJ/CHN/2014 (MF385636) isolated from China in 2014. Phylogenetic analysis based on the VP1 nucleotide sequences showed that CxA6zj-6 belonged to sublineage D3 of Coxsackievirus A6 in the species of Enterovirus A (Fig. 3A). The Polozj-3 shares the highest nt identity (99.81%) to the strain MG212484 isolated from Russia in 2000, which cause vaccine-associated paralytic polio case. Phylogenetic analysis based on the complete genome showed that Polozj-3 clustered with other strains forming a clade and belonged to poliovirus serotype 2 in the species of Enterovirus C (Fig. 3C). The Rhozj-5 shares the highest nt identity (98.86%) to the strain SC275 (KY369900) isolated from USA in 2016. Phylogenetic analysis based on the complete genome showed that Rhozj-5 clustered with other strains formed a clade and belonged to Rhinovirus genotype B27 in the species of Rhinovirus B (Fig. 3D).

Fig. 3

The phylogenetic analysis of picornaviruses identified in this study. A The phylogenetic trees were constructed based on the VP1 nucleotide sequences of Coxsackievirus A6 that included the reference strains of four different sublineages. B The phylogenetic trees were constructed based on the complete genome sequences of parechoviruses that included the reference strains of four different genotypes. C The phylogenetic trees were constructed based on the complete genome sequences that included the reference strains of three different human poliovirus serotypes. D The phylogenetic trees were constructed based on the complete genome sequences that included the reference strains of six different genotypes in the species of Rhinovirus B. Viruses identified in this study were marked with red solid circle. HPeV, Human Parechovirus; Polio, Poliovirus; HRV, Human Rhinovirus.

Parechovirus

Here, four nearly complete genome of Human parechoviruses (HPeVs) were obtained from library 2, 4, 5, and 6 and were named as Parchzj-2, Parchzj-4, Parchzj-5 and Parchzj-6, respectively. The genome of Parchzj-2, Parchzj-4, Parchzj-5 and Parchzj-6 are 7,207 nt, 6,798 nt, 7,313 nt, and 7,265 nt in length, which have only one ORF encoding a 2,179 aa polyprotein except Parchzj-5. The length of Parchzj-5 polyprotein is 2,178 aa (one aa missing in the 3′-end). The aa identity of polyproteins among those four strains was more than 95.1%. The conserved NTPase motif “GEPGQGKS” and RNA-dependent RNA polymerase motif “YGDD” were found. BLASTn search in NCBI indicated that Parchzj-2 shared the highest nt identity (92.01%) to the strain newborn_zjlh (MT157225) isolated from China in 2018, Parchzj-4 shared the highest nt identity (96.51%) to the strain TW-50192-2012 (KT626012) isolated from Taiwan in 2012, Parchzj-5 shared the highest nt identity (99.93%) to the strain 164Chzj909 (KT879930) isolated from China in 2014, while Parchzj-6 had the highest nt identity of 96.48% to the strain newborn_zjlh (MT157225) isolated from China in 2018. The phylogenetic tree was constructed based on the nearly complete genome that included the reference strains of four different genotypes of HPeV. The result showed that Parchzj-2 and Parchzj-6 clustered with the strain newborn_zjlh (MT157225) forming a separately clade, Parchzj-4 clustered with the strains TW-50192-2012 (KT626012), SH1-China-2008 (FJ840477), and BJ-37359 (KJ659491) forming a separately clade, while Parchzj-5 clustered with the strain 164Chzj909 (KT879930) forming a separately clade (Fig. 3B). By phylogenetic analysis, all of the four HPeVs belonged to HPeV genotype 1.

One virus belonging to the genus Mamastrovirus of the family Astroviridae

In this study, one nearly complete genome of HAstV was obtained from library 8 and was named as Astzj-8. The complete genome of Astzj-8 is 6,924 nt in length with the untranslated regions (UTRs) sited in position 1–198 and position 6,845–6,924. The base composition of A, C, G, and T are 30.0%, 21.9%, 22.7%, and 25.4% respectively. The viral genome of Astzj-8 has two ORFs (ORF1 and ORF2). There is an 8 nt overlap between ORF1 and ORF2. The length of ORF1 is 4,290 nt, which encodes a 1,429 aa NS1, while ORF2 is 2,364 nt in length encoding a 787 aa structural protein (VP1). The Conserved Domains analysis (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) indicated that Astzj-8 NS1 had trypsin-like peptidase domain at position 447–565 aa and reverse transcriptase like family domain at position 991–1,395 aa including a conserved RdRp amino acid motif “1,284YGDD1,287”. The phylogenetic tree was constructed based on the nucleotide sequences of ORF2 coding region that included the reference strains of eight different genotypes of HAstVs. The result indicated that Astzj-8 clustered with other HAstV-1 strains formed a clade (Fig. 4). Sequence analysis using BLASTn search in NCBI showed that Astzj-8 shared the highest nt identity of 98.94% to the strain Hu/HY/2019/CHN (MW863310) isolated from China in 2019.

Fig. 4

The phylogenetic analysis of human astroviruses (HAstVs) identified in this study. The phylogenetic trees were constructed based on the nucleotide sequences of ORF2 coding region that included the reference strains of eight different genotypes. Viruses identified in this study were marked with red solid circle.

Two viruses belonging to the genus Bocavirus of the family Parvoviridae

Here, two nearly complete genome of Human bocavirus (HBoV) were obtained in library 6 and 8 and were named as Parvozj-6 and Parvozj-8, respectively. The genome of Parvozj-6 and Parvozj-8 are 5,400 nt and 5,377 nt in length, both of which have three ORFs. The lengths of ORF1 are 1,920 nt and 1,923 nt encoding 639 aa and 640 aa NS1, which have the conserved motifs associated with rolling circle replication, helicase, and ATP-bing Walker loop motif “GPASTGKT”. The lengths of ORF2 are 660 nt and 648 nt encoding 219 aa and 215 aa NP1, while the ORF3 of Parvozj-6 and Parvozj-8 are 2,016 nt and 2,004 nt encoding 671 aa and 667 aa VP1, separately. Two phylogenetic trees were constructed based on the nt sequences of NS1 and VP1 that included the reference strain of four HBoVs genotype. The result showed that four HBoVs genotype were clearly delineated in the NS1 and VP1 phylogenetic tree. The Parvozj-6 clustered with nine HBoV1 reference strains forming a clade, while The Parvozj-8 united with 11 HBoV2 reference strains forming a clade in the NS1 phylogenetic tree (Fig. 5). Sequence analysis using BLASTn in NCBI indicated that Parvozj-6 shared the highest nt identity of 99.96% to the strain CQ201102 (JN387082) isolated from China in 2011, while Parvozj-8 had 99.22% nt identity to the strain BJQ435 (JX257046) isolated from China in 2011.

Fig. 5

The phylogenetic analysis of human bocaviruses (HBoVs) identified in this study based on NS1 and VP1. A The phylogenetic trees were constructed based on the nucleotide sequences of NS1 that included the reference strains of four HBoVs genotype. B The phylogenetic trees were constructed based on the nucleotide sequences of VP1 that included the reference strains of four HBoVs genotype. Viruses identified in this study were marked with red solid circle.

One virus belonging to the genus Mastadenovirus of the family Adenoviridae

In this study, one nearly complete genome of HAdV was obtained in library 7 and named Adenzj-7. The genome of Adenzj-7 is 34,095 nt in length, which encodes 30 proteins. The base composition of A, C, G, and T is 24.8%, 25.9%, 25.1%, and 24.2%, respectively. BLASTn search in NCBI indicated that Adenzj-7 shared the highest nt identity of 99.95% to the strain Human/China/Shanghai/FX1-152772/2015/41 (MK883610) isolated from China in 2015. The phylogenetic tree was constructed based on the complete genome that included the reference strains of seven different HAdV species. The result showed that Adenzj-7 clustered with other HAdV species F strains forming a clade (Fig. 6).

Fig. 6

The phylogenetic analysis of human adenoviruses (HAdVs) identified in this study. The phylogenetic trees were constructed based on the nearly complete genome that included the reference strains of seven different HAdV species. Viruses identified in this study were marked with red solid circle.

Four partial segment 1 belonging to the genus Rotavirus of the family Reoviridae

In total, reads hitting in Reoviridae were found in six libraries. Only four longer contigs (Rotazj-1, 2, 6, and 10) from library 1, 2, 6, and 10 were used for sequence comparison and phylogenetic analysis. Sequence analysis using BLASTn in NCBI indicated that those four contigs shared the highest nt identity of 99.72–100% to the relative strains and belonged to the segment 1 of species Rotavirus A (RVA). Two phylogenetic trees were constructed based on the partial segment 1 of rotaviruses that included the reference strains of two RVA species, because those four contigs matched two different parts of the segment 1 of rotavirus, respectively. The result showed that Rotazj-1 and Rotazj-6 clustered with the strain E5365/2017/G1P[8] (MN106132) forming a clade (Fig. 7A), while Rotazj-2 and Rotazj-10 clustered with the strains SP010/2013/G9P[8] (LC172960), WZ810/2013 (KU243537) and E5365/2017/G1P[8] (MN106132) forming a clade (Fig. 7B).

Fig. 7

The phylogenetic analysis of human rotaviruses identified in this study. A For Rotazj-1 and Rotazj-6. B For Rotazj-2 and Rotazj-10. The phylogenetic trees were constructed based on the partial segment 1 of rotaviruses that included the reference strains of two RV species. Viruses identified in this study were marked with red solid circle. RV, Rotavirus.

Discussion

Viral metagenomics has the potential advantage to uncover new viral pathogens or recombinant/reassortment types in various samples of animal or human. At present, viral metagenomics was also used for exploring the correlation between viruses and diseases in the clinical disease diagnosis (Liang et al., 2020). We describe the eukaryotic viral communities in the fecal samples of diarrheic children here and find many diarrhea-related and unrelated viruses through unbiased high-throughput sequencing analysis. The results showed that cDNA libraries generated an average of 361,271 reads of 250-bp paired-end reads. Among the total reads, 32.77% of the reads had the best matches with viruses. The proportion of viral reads in this study was obviously higher than the study of 5% of viral reads identified in stool sample by Widad and colleagues (Mohammad et al., 2020), while it was similar to the study of 35.6% of viral reads obtained in stool samples by Victoria (Victoria et al., 2009). Many factors can affect the proportion of viral reads such as the collection and preservation of samples, pretreatment of library construction, and the proficiency of the operator. During the process of pretreatment, the 0.45-μm filter was used to remove bacterial and eukaryotic cell-sized particles, then the supernatant from each pool was treated with a mixture of nuclease enzymes to digest unprotected nucleic acid. Those processing methods in this study effectively enriched virus particles and increased the proportion of viral reads. Reoviridae is a large viral family of segmented dsRNA viruses with a wide host range. The family Reoviridae has two subfamilies including Spinareovirinae and Sedorevirinae, which consists of 15 genera. The genome of viruses belonging to the Reoviridae contains 9 to 12 segments (Steyer et al., 2013). The genus Rotavirus consists of seven species (RVA to RVJ). Rotavirus is the leading cause of viral acute gastroenteritis in young children worldwide, accounting for nearly 200,000 deaths annually (GBD Diseases and Injuries Collaborators, 2020). In recent years, nearly one hundred countries have introduced at least one RVA vaccine in their National Immunization Programs, and a substantial decline in children hospitalizations and deaths associate with severe diarrhea has been widely reported (Paternina-Caicedo et al., 2015; Carvalho-Costa et al., 2019). In China, rotavirus vaccination is still not brought into the national immunization program. A previous survey about the prevalence of RVA during 2009–2015 in China showed that 30% children with diarrhea were positive for RVA (Yu et al., 2019a). In this study, the proportion of RVA is only 0.20% according to the number of reads detected (Supplementary Fig. S1). The low proportion of RVA may be due to low viral titer in samples. As a result of this, no complete segment was assembled from all of libraries. Since no epidemiological investigation was conducted in this study, the positive rate of rotavirus infection was unknown, but it must be ≥ 6%, because six out of ten libraries have rotavirus sequence reads where there might be some libraries contain more one samples positive for rotavirus. At present, thirty-six G and fifty-one P types of rotavirus have been identified. Among them, G1P[8], G2P[4], G3P[8], G4P[8] and G9P[8] are the common G-P combinations in human rotaviruses and are responsible for over 80% of the circulating genotypes globally (Bányai et al., 2012; Zhou et al., 2020). Li et al. reported that the G9P[8] became the predominant strain with the proportion of 60.9% in China during 2015 (Yu et al., 2019b). In this study, based on the result of BLASTp, four RVAs belonged to G9P[8], which suggested that the G9P[8] might be a common type of RVA prevalent in Jiangsu Province of China. The family Caliciviridae includes non-enveloped viruses with a linear positive sense RNA genome of 6.4–8.5 kb, containing two or three ORFs. At present, the family Caliciviridae consists of 11 genera. NoVs are single-stranded RNA viruses that belong to the genus Norovirus. At least ten NoV genogroups (GI to GX) are currently recognized. Among them, GI, GII, GVII and GIX infect humans (Li et al., 2020a, Li et al., 2020b). Based on the divergence of VP1, GI viruses are subdivided into nine genotypes, whilst GII viruses contain 27 genotypes (Kroneman et al., 2013). NoVs are the leading cause of outbreaks of acute gastroenteritis worldwide (Ramani et al., 2014). In recent years, NoVs have become the most predominant virus which caused acute gastroenteritis in countries with effective rotavirus vaccination programs (McAtee et al., 2016; Lopman et al., 2016). In China, NoVs play an increasingly important role in the etiology of diarrhea and account for about 22% diarrhea cases in children aged <5 year (Zhou et al., 2017). In this study, the most abundant sequences from vertebrate viral families were Caliciviridae-related members. Among them, NoVs are identified in eight out of ten libraries and account for 78.42% of the proportion (Supplementary Fig. S1; Supplementary Table S3). Eight nearly complete genomes of NoV were obtained and seven out of eight NoVs belonged to GII.4 genotype through sequence comparison and phylogenetic analysis. These seven NoVs have close genetic relationship with other GII.4 strains previously detected from Guangdong, Jiangsu, and Hubei Province of China during 2012–2014. Most of the NoVs GII.4 here supporting a previously survey that the predominant genotype of NoVs was GII.4 before 2017 in China. In addition, one strains (Norozj-9) isolated in this study belonged to GII.17. The GII.17 genotype was rarely detected in human populations globally in the past, but a new GII.17 variant associate with sporadic acute gastroenteritis cases was frequently detected in China and some south-eastern Asian countries in late 2014 (Pan et al., 2016; Zhang et al., 2016; Kim et al., 2019; Motoya et al., 2019; Chen et al., 2020). Here, few sequence reads of GII.17 in library 9 indicated that it was not the predominant genotype in causing acute gastroenteritis outbreaks but accounted for sporadic diarrhea cases in Jiangsu of China. Sapoviruses belong to the genus Sapovirus within the family Caliciviridae. They could cause acute gastroenteritis in humans and animals (Oka et al., 2016; Yan et al., 2020). At least 19 SaV genogroups (GI to GXIX) are currently described based on VP1 (Becker-Dreps et al., 2020). SaV infections are responsible for both sporadic cases and occasional outbreaks of acute gastroenteritis. The prevalence rate for SaV was 6.5% in low and middle income countries (Kim et al., 2019). SaV GI was the dominant genogroup globally. In this study, 4,888 sequence reads belonging to SaV were only found in library 9 and were assembled to one nearly complete genome of 7,542 nt in length. Phylogenetic analysis showed that SaV identified here belonged to GI genogroup, which suggested that acute gastroenteritis cases caused by SaV GI infection existed in this area. Members of the family Adenoviridae are large, non-enveloped viruses containing a dsDNA genome of approximately 27–48 kb, which encodes about 40 different proteins. The family Adenoviridae consists of five genera including Mastadenovirus, Aviadenovirus, Atadenovirus, Siadenovirus, and Ichtadenovirus. HAdV belonging to the genus Mastadenovirus are important viral pathogens associated with hemorrhagic cystitis, pneumonia, acute gastroenteritis and so on (Hierholzer et al., 1992). At present, 103 different HAdV types are grouped into seven HAdV species (A to G). Members of the family Adenoviridae have the second most abundant sequence reads in this study. HAdV is considered to be an important intestinal pathogen associated with sporadic diarrhea in children (Chhabra et al., 2013). We detected human mastadenovirus F in just one library with 26,410 sequence reads. Such rich sequence reads of HAdV may be due to high HAdV titer in this library or larger viral genome of HAdV than other viruses. At present, only few studies about HAdV causing diarrhea in children were reported in China. Li et al. identified 3.1% proportion of HAdV infection among children with acute diarrhea in Hangzhou of China (Li et al., 2020b). The positive rate of HAdV infection (≤10%) here was obviously lower in comparison to other studies, such as Widad et al. identified the positive rate of 44.1% (Mohammad et al., 2020). The difference of positive rate may be caused by regional divergence or different health status. The HAdV (Adenzj-7) detected in this study belonged to serotype 41, which shared >99% nt identity to other HAdV serotype 41 strains detected in some countries and regions including America, Japan, South Africa, Brazil, and Sweden. It indicated that the same HAdV serotype 41strain was prevalent all over the world. The Picornaviridae is a large family of small, icosahedral viruses with +ssRNA genomes of 7.1–8.9 kb in size. The family currently consists of 158 species grouped into 68 genera (https://www.picornaviridae.com/), most of which have similar genome structure encoding a single polyprotein transcribed from a single ORF flanked by a 5′ and 3′ UTR. The members of the family Picornaviridae may cause subclinical infections of humans and animals or conditions ranging from mild febrile illness to severe diseases of heart, liver and the central nervous system (Tracy et al., 2006; Wang et al., 2015; Jubelt et al., 2014). Here, multiple picornaviriuses had been detected including one Coxsackievirus A6, one human poliovirus 2, one human rihinovirus B27, and four human parechoviruses. Coxsackievirus A6 is one of the major agents to cause hand, foot and mouth disease (HFMD) outbreaks globally (Yoshitomi et al., 2018; Kanbayashi et al., 2019; Saxena et al., 2020). The clinic symptoms of HFMD mainly show fever, sore throat, oral ulcers and maculopapular or vesicular rash distributed over the hands, feet and buttocks. Diarrhea isn't the main symptom of patient with HFMD. Coxsackievirus A6 detected here may be not associated with diarrhea of children. Human poliovirus found in this study was not unexpected because children can sometimes experience diarrhea as a result of poliovirus vaccination (Sugawara et al., 2009). Human rihinovirus is recognized as a cause of severe acute respiratory infections. All three species of rhinovirus were detected in fecal samples of children with diarrhea in previous study, so the detection here was not surprising at all. HPeV infections are mainly asymptomatic or associated with mild respiratory and gastrointestinal diseases. HPeV infection in children with acute gastroenteritis has a prevalence rate from 1.8% to 29.4% worldwide (Baumgarte et al., 2008; Chuchaona et al., 2015). HPeV1 is the most predominant genotype followed by HPeV3 and other genotypes. Here four HPeVs belonging to HPeV1 showed that HPeV1 was prevalent in Jiangsu of China. Astroviruses belong to the family Astroviridae. The virions are small about 28–30 nm in diameter, non-enveloped, and contain a +ssRNA genome about 6.4–7.7 kb in length (Rivera et al., 2010). The family Astroviridae is divided into two genera, Mamastrovirus (33 species) and Avastrovirus (7 species) (Donato et al., 2017). HAstV belong to the genus Mamastrovirus. At present, HAstVs were divided into eight genotypes (HAstV-1 to HAstV-8) based on the nucleotide acids of 348-bp partial ORF2 coding region (Zhou et al., 2019). Other diarrhea-related viruses like astrovirus and HBoV were detected in this study. HAstVs are among the most important viruses causing acute gastroenteritis worldwide (Bányai et al., 2018). HAstV-1 appears to be the predominant circulating genotype, followed by HAstV-5, HAstV-4, HAstV-2, HAstV-8 and HAstV-3 in Shanghai of China during 2015–2016 (Wu et al., 2020). The outbreak of acute gastroenteritis caused by HAstV-1 occurred in different provinces of China (Hou et al., 2016; Tan et al., 2019). Here, HAstV-1 was detected in two libraries and shared the highest nt identity of 98.94% to the strain Hu/HY/2019/CHN (MW863310) isolated from Sichuan of China in 2019. Acute gastroenteritis outbreaks caused by HAstV-1 didn't occur during our sampling, so HAstV-1 detected in this study may cause sporadic cases. Members of family Parvoviridae are small, non-enveloped –ssDNA viruses with the genome of 4–6 kb in length. The family Parvoviridae has two subfamilies including Densovirinae and Parvovirinae, which consists of 12 genera (Cotmore et al., 2019). HBoV belongs to genus Bocavirus of the subfamily Parvovirinae. HBoVs are small, non-enveloped –ssDNA viruses of about 5,300 nt, consisting of three ORFs encoding two nonstructural protein (NS1 and NP1) and two viral capsid proteins (VP1 and VP2). At present, Four HBoVs (HBoV1 to HBoV4) were identified from respiratory samples and stool samples (Kapoor et al., 2009, 2010; Arthur et al., 2009). HBoV is an emerging virus associated with diarrhea in young children. In China, children diarrhea caused by HBoVs was already reported (Jin et al., 2011). Here, two genotypes of HBoV (HBoV1 and HBoV2) were detected in two different libraries. The data in the present study showed that the number of sequence reads belonging to HBoV2 was over ten times than that of HBoV1, suggesting that HBoV2 was the predominant type of bocavirus prevalent in this area. HBoV1 as a respiratory pathogen was usually detected in respiratory samples. HBoV1 detected here may come from saliva of patients who swallow it into the intestine. In addition, viruses not associated with diarrhea like picobirnavirus and anellovirius were detected in this study. Those viruses also detected in previous study in fecal virome of diarrheic patients (Sun et al., 2016; Yinda et al., 2019). Further research is needed to verify if there is any relevance between those viruses and children diarrhea.

Conclusions

In conclusion, this study provides an overview of the fecal virome of diarrheic children and significantly increases our understanding about the prevalence status of diarrhea-related viruses. This study provides valuable information for the prevention and treatment of viral diarrhea in this area.

Data availability

The genome and fragments of viruses obtained in this study were deposited in GenBank with the accession numbers: MZ546174–MZ546192 and MZ576278–MZ576282. The raw sequence reads from metagenomic library were deposited in the Sequence Read Archive of GenBank database under accession number: SRX4189498.

Ethics statement

The patient's parents signed the informed consent form, which was approved by the medical ethics committee of the Affiliated Hospital of Jiangsu University (reference code: ujsah2016219).

Author contributions

Shixing Yang: investigation, data curation, writing-original draft. Yumin He: investigation, data curation. Ju Zhang: data curation. Dianqi Zhang: data curation. Yan Wang: data curation. Xiang Lu: data curation. Xiaochun Wang: data curation. Quan Shen: data curation. Likai Ji: data curation. Hongyan Lu: conceptualization, supervision. Wen Zhang: conceptualization, supervision, writing-review and editing.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

65 in total

Review 1. Evolution of virulence in picornaviruses.

Authors: S Tracy; N M Chapman; K M Drescher; K Kono; W Tapprich
Journal: Curr Top Microbiol Immunol Date: 2006 Impact factor: 4.291

2. Epidemiology and Genetic Characterization of Classical Human Astrovirus Infection in Shanghai, 2015-2016.

Authors: Limeng Wu; Zheng Teng; Qingneng Lin; Jing Liu; Huanyu Wu; Xiaozhou Kuang; Xiaoqing Cui; Wei Wang; Xiaoxian Cui; Zheng-An Yuan; Xi Zhang; Youhua Xie
Journal: Front Microbiol Date: 2020-09-25 Impact factor: 5.640

3. Prevalence of rotavirus and rapid changes in circulating rotavirus strains among children with acute diarrhea in China, 2009-2015.

Authors: Jianxing Yu; Shengjie Lai; Qibin Geng; Chuchu Ye; Zike Zhang; Yaming Zheng; Liping Wang; Zhaojun Duan; Jing Zhang; Shuyu Wu; Umesh Parashar; Weizhong Yang; Qiaohong Liao; Zhongjie Li
Journal: J Infect Date: 2018-07-12 Impact factor: 6.072

4. Clinical and molecular analyses of norovirus-associated sporadic acute gastroenteritis: the emergence of GII.17 over GII.4, Huzhou, China, 2015.

Authors: Peng Zhang; Liping Chen; Yun Fu; Lei Ji; Xiaofang Wu; Deshun Xu; Jiankang Han
Journal: BMC Infect Dis Date: 2016-11-29 Impact factor: 3.090