| Literature DB >> 26877136 |
Syun-Ichi Urayama1, Yoshihiro Takaki, Takuro Nunoura.
Abstract
Knowledge of the distribution and diversity of RNA viruses is still limited in spite of their possible environmental and epidemiological impacts because RNA virus-specific metagenomic methods have not yet been developed. We herein constructed an effective metagenomic method for RNA viruses by targeting long double-stranded (ds)RNA in cellular organisms, which is a hallmark of infection, or the replication of dsRNA and single-stranded (ss)RNA viruses, except for retroviruses. This novel dsRNA targeting metagenomic method is characterized by an extremely high recovery rate of viral RNA sequences, the retrieval of terminal sequences, and uniform read coverage, which has not previously been reported in other metagenomic methods targeting RNA viruses. This method revealed a previously unidentified viral RNA diversity of more than 20 complete RNA viral genomes including dsRNA and ssRNA viruses associated with an environmental diatom colony. Our approach will be a powerful tool for cataloging RNA viruses associated with organisms of interest.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26877136 PMCID: PMC4791113 DOI: 10.1264/jsme2.ME15171
Source DB: PubMed Journal: Microbes Environ ISSN: 1342-6311 Impact factor: 2.912
Fig. 1Schematic work flow of FLDS. 1. Fragmentation of dsRNA by ultrasound. 2. Ligation of a loop primer on 3′-terminal ends and reverse transcription. 3. Selective duplex formation of cDNA from dsRNA, and PCR amplification. Details of the FLDS method are described in the Materials and Methods section.
Fig. 2Agarose gel electrophoresis of purified nucleic acids from a diatom colony. Nucleic acids were stained with ethidium bromide. Lane M, 300 ng of HindIII-digested λ DNA; lane 1, total nucleic acids extracted from 5 mg (wet weight) of the diatom colony; lane 2, purified dsRNA extracted from 1 g (wet weight) of the diatom colony.
List of complete composite genomes of RNA viruses and full-length virus-like RNAs obtained from a diatom colony obtained using FLDS.
| RNA virus species | Accession | Description | Size (nt) | Num. of mapped reads | Average coverage | BlastX analysis | ||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Top Hit for each CDS, Virus family | E-value | Protein | ||||||
| DCADSRV-1 | AP014890 | segment 1 | 1,734 | 1,301,278 | 191,942 | — | — | — |
| AP014891 | segment 2 | 1,562 | 1,717,396 | 279,580 | Fox | 1 × 10−33 | RdRp | |
|
| ||||||||
| DCADSRV-2 | AP014892 | 4,026 | 1,337,570 | 83,876 | 5 × 10−15 | RdRp | ||
|
| ||||||||
| DCADSRV-3 | AP014893 | 4,911 | 14,544 | 703 | 2 × 10−63 | RdRp | ||
|
| ||||||||
| DCADSRV-4 | AP014894 | Genome type A | 4,982 | 12,325 | 591 | 4 × 10−69 | RdRp | |
| DCADSRV-4 | AP014895 | Genome type B | 4,979 | 1,074 | 52 | 5 × 10−69 | RdRp | |
|
| ||||||||
| DCADSRV-5 | AP014896 | 5,252 | 7,863 | 359 | 3 × 10−74 | RdRp | ||
|
| ||||||||
| DCADSRV-6 | AP014897 | 4,939 | 2,720 | 131 | 2 × 10−66 | RdRp | ||
|
| ||||||||
| DCADSRV-7 | AP014898 | 5,327 | 1,957 | 87 | 3 × 10−123 | RdRp | ||
| 2 × 10−56 | CP | |||||||
|
| ||||||||
| DCADSRV-8 | AP014899 | 4,660 | 1,163 | 60 | 8 × 10−57 | RdRp | ||
|
| ||||||||
| DCADSRV-9 | AP014900 | Genome type A | 4,844 | 1,198 | 60 | 1 × 10−65 | RdRp | |
| DCADSRV-9 | AP014901 | Genome type B | 4,845 | 364 | 18 | 2 × 10−66 | RdRp | |
|
| ||||||||
| DCADSRV-10 | AP014902 | 5,082 | 1,244 | 59 | 2 × 10−108 | RdRp | ||
| 6 × 10−50 | CP | |||||||
|
| ||||||||
| DCADSRV-11 | AP014903 | 5,160 | 1,173 | 55 | 4 × 10−128 | RdRp | ||
| 8 × 10−64 | CP | |||||||
|
| ||||||||
| DCADSRV-12 | AP014904 | 5,941 | 1,219 | 49 | 1 × 10−40 | RdRp | ||
|
| ||||||||
| DCADSRV-13 | AP014905 | 4,671 | 820 | 42 | 4 × 10−58 | RdRp | ||
|
| ||||||||
| DCADSRV-14 | AP014906 | segment 1 | 1,576 | 438 | 67 | Persimmon cryptic virus | 3 × 10−97 | RdRp |
| AP014907 | segment 2 | 1,490 | 274 | 43 | — | — | — | |
|
| ||||||||
| DCADSRV-15 | AP014908 | 12,172 | 1,482 | 29 | 1 × 10−115 | Polyprotein | ||
|
| ||||||||
| DCASSRV-1 | AP014912 | 11,413 | 1,011 | 21 | Border disease virus—BD31 | 4 × 10−15 | Polyprotein | |
|
| ||||||||
| DCASSRV-2 | AP014913 | 4,586 | 4,153 | 224 | 5 × 10−20 | RdRp | ||
|
| ||||||||
| DCADSRV-16 | AP014909 | 6,635 | 8,735 | 310 | 4 × 10−10 | RdRp | ||
|
| ||||||||
| DCADSRV-17 | AP014910 | Genome type A | 5,907 | 5,325 | 218 | dsRNA virus environmental sample | 7 × 10−14 | RdRp |
| DCADSRV-17 | AP014911 | Genome type B | 5,909 | 1,564 | 63 | 1 × 10−13 | RdRp | |
|
| ||||||||
| DCAVLRS-1 | AP014914 | Interrupted RdRp | 4,567 | 57,802 | 3,039 | 3 × 10−11 | RdRp | |
|
| ||||||||
| DCAVLRS-2 | AP014915 | Interrupted RdRp | 4,786 | 41,181 | 2,100 | 2 × 10−11 | RdRp | |
|
| ||||||||
| DCAVLRS-3 | AP014916 | CP only | 3,458 | 13,140 | 876 | 2 × 10−41 | CP | |
|
| ||||||||
| DCAVLRS-4 | AP014917 | RdRp only | 3,190 | 3,995 | 294 | 2 × 10−123 | RdRp | |
|
| ||||||||
| DCAVLRS-5 | AP014918 | CP only | 3,262 | 1,331 | 96 | 5 × 10−47 | CP | |
|
| ||||||||
| DCAVLRS-6 | AP014919 | RdRp only | 3,325 | 891 | 65 | 6 × 10−102 | RdRp | |
|
| ||||||||
| DCAVLRS-7 | AP014920 | Interrupted RdRp | 1,986 | 164 | 20 | 4 × 10−63 | RdRp | |
The classification was based on the shared 5′ terminal sequences in paired segments, whereas CDSs in the segments that did not show significant similarities with genes in databases.
Classification of next-generation sequencing reads obtained by FLDS and total RNA-seq.
| FLDS | total RNA-seq | |||
|---|---|---|---|---|
|
| ||||
| Num. of reads | rate (%) | Num. of reads rate | (%) | |
| Trimmed | 4,631,738 | 100.0 | 6,979,561 | 100.0 |
| Major viral reads | 4,549,629 | 98.2 | 24,036 | 0.3 |
| Unmapped reads (include minor viral reads) | 82,109 | 1.7 | 6,955,525 | 99.6 |
Fig. 3Comparison of mapped read frequencies for each viral contig between FLDS and total RNA-seq. Plots indicate each viral contig. The rhombus and triangle plots show dsRNA and ssRNA viral contigs, respectively. 100–10−7 represent the frequencies of reads in each library. Dotted lines with 1×, 10×, 100×, or 1000× show a higher viral read frequency than that with an RNA-seq analysis. Reads mapped with nine contigs found in FLDS were not found in total RNA-seq.
Fig. 4Comparison of coverage uniformity between FLDS and RNA-seq. DsRNA segments with an average depth of > 200 in RNA-seq were used for the analysis. (A) Coefficient of variation (the ratio of the standard deviation to the mean coverage). Values were plotted on viral dsRNA segments of DCADSRV-1 segment 1 (square), DCADSRV-1 segment 2 (triangle), and DCADSRV-2 (circle), and were plotted on the Y axis. (B–D) Genomic coverage of each viral segment from the FLDS (upper graph) and RNA-seq (lower graph) analysis.