| Literature DB >> 25144530 |
Yajing Wang1, Hui Wang2, Kunhan Xu3, Peixiang Ni4, Huan Zhang5, Jinmin Ma6, Huanming Yang7, Feng Xu5.
Abstract
It is commonly accepted that there are many unknown viruses on the planet. For the known viruses, do we know their prevalence, even in our experimental systems? Here we report a virus survey using recently published small (s)RNA sequencing datasets. The sRNA reads were assembled and contigs were screened for virus homologues against the NCBI nucleotide (nt) database using the BLASTn program. To our surprise, approximately 30% (28 out of 94) of publications had highly scored viral sequences in their datasets. Among them, only two publications reported virus infections. Though viral vectors were used in some of the publications, virus sequences without any identifiable source appeared in more than 20 publications. By determining the distributions of viral reads and the antiviral RNA interference (RNAi) pathways using the sRNA profiles, we showed evidence that many of the viruses identified were indeed infecting and generated host RNAi responses. As virus infections affect many aspects of host molecular biology and metabolism, the presence and impact of viruses needs to be actively investigated in experimental systems.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25144530 PMCID: PMC4140767 DOI: 10.1371/journal.pone.0105348
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1GEO libraries containing viral sequences.
Nested sets represent numbers of articles (Table S1) with Virus Reported, Virus Detected and No Virus Detected for each host species examined in this study.
An overview of overlooked viruses in published sRNA libraries used in this study.
| Classification | ModelOrganism | Number ofGEO Library(Cell line) | Library withOverlooked Viruses(Cell line) | OverlookedVirus | TotalContigs | ViralContigs |
| Plant | A. thaliana | 182 | 32 | 5 | 1,167,645 | 145 |
| G. max | 13 | 4 | 3 | 16,335 | 88 | |
| O. sativa | 63 | 17 | 4 | 300,900 | 59 | |
| T. aestivum | 14 | 5 | 2 | 52,576 | 57 | |
| Z. mays | 14 | 2 | 2 | 257,357 | 9 | |
| Invertebrate | C. elegans | 87 | 0 | 0 | 416,472 | 0 |
| D. melanogaster | 24(15) | 15(13) | 5 | 141,598 | 57 | |
| Vertebrate | D. rerio | 36 | 0 | 0 | 641,055 | 0 |
| M. musculus | 47(15) | 11(7) | 4 | 315,420 | 25 | |
| H. sapiens | 37(20) | 10(10) | 2 | 33,122 | 21 | |
| Total | 517(50) | 96(30) | 23 | 3,342,480 | 461 |
Figure 2Heat map of viruses detected in each organism.
The virus detection rate (DR) was calculated for each virus in each host species using the positive sample number divided by the total number. An asterisk is used to mark the only animal sample (M. musculus, GSM947964) that was positive for a plant virus (Cotton leafroll dwarf virus).
Figure 3Length distributions of sRNAs matched to the virus contig sequences.
Heat maps show the proportions of vsRNAs with certain length (X-axis: 17–36 nt, Y-axis: virus name_read count_abundance in CPM_dataset_host abbreviations. AT: A. thaliana, DM: D. melanogaster, GM: G. max, HS: H. sapiens, MM: M. musculus, OS: O. sativa, TA: T. aestivum and ZM: Z. mays). Panel A: Monocot host species (TA, ZM, OS); Panel B: Dicot host species (AT, GM, An asterisk was used to mark the only animal sample, M. musculus, GSM947964, which was positive of a plant virus, Cotton leafroll dwarf virus); Panel C: Invertebrate host species (DM); Panel D: Vertebrate host species (MM, HS); Panel E: Phages in plant species (AT); Panel F: Phages in animal species (DM, MM, HS).