Literature DB >> 26877136

FLDS: A Comprehensive dsRNA Sequencing Method for Intracellular RNA Virus Surveillance.

Syun-Ichi Urayama¹, Yoshihiro Takaki, Takuro Nunoura.

Abstract

Knowledge of the distribution and diversity of RNA viruses is still limited in spite of their possible environmental and epidemiological impacts because RNA virus-specific metagenomic methods have not yet been developed. We herein constructed an effective metagenomic method for RNA viruses by targeting long double-stranded (ds)RNA in cellular organisms, which is a hallmark of infection, or the replication of dsRNA and single-stranded (ss)RNA viruses, except for retroviruses. This novel dsRNA targeting metagenomic method is characterized by an extremely high recovery rate of viral RNA sequences, the retrieval of terminal sequences, and uniform read coverage, which has not previously been reported in other metagenomic methods targeting RNA viruses. This method revealed a previously unidentified viral RNA diversity of more than 20 complete RNA viral genomes including dsRNA and ssRNA viruses associated with an environmental diatom colony. Our approach will be a powerful tool for cataloging RNA viruses associated with organisms of interest.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
RNA, Double-Stranded

Year: 2016 PMID： 26877136 PMCID： PMC4791113 DOI： 10.1264/jsme2.ME15171

Source DB: PubMed Journal: Microbes Environ ISSN： 1342-6311 Impact factor: 2.912

Viruses are the universal genetic elements associated with all three domains of life (22), and virus-host interactions impact on the status of life and surrounding ecosystems (41). Historically, viruses are most often recognized as pathogens (38), and, thus, have been studied in the field of medical and crop science. Recent advances in high-throughput sequencing technologies have enabled us to identify not only viruses associated with diseases, but also those present in natural environments including oceans (41) and soil (12). Although these sequencing technologies have opened a new era in virus identification (24), a limited number of methods have been established for virus enrichment and library construction. The diversity and distribution of viruses in non-viral nucleic acid-dominant environments, such as the intracellular environments in which viruses actually replicate, still remain unclear due to technical difficulties (16). The development of a new procedure for effective virus enrichment and library construction is required in order to understand the full spectrum of diverse viruses. RNA sequencing (RNA-seq) is a popular method in RNA virus metagenomics and is widely used for RNA virus identification (35). Purification and library construction methods have been established for RNA viruses at the extracellular stage (7, 10, 40). However, the viral read ratio of intracellular RNA viruses (RNA viruses at the intracellular stage) in the RNA-seq library is typically < 1% because mRNA and rRNA are dominant in the total RNA fraction extracted from biological samples (25). Therefore, the enrichment of viral RNA is essential for maximizing sensitivity in the identification of novel viruses. The physical enrichment of viral particles and nuclease digestion of non-viral nucleotides has been employed to increase the viral read ratio; however, a relatively low abundance of viral reads is still observed in most studies (39). These techniques are only applicable to specific RNA viruses because not all RNA viruses form viral particles (21). In addition, difficulties are associated with capturing terminal RNA sequences in an efficient and effective manner (32) and obtaining uniform coverage using the RNA-seq method. Sample preparation methods for effective viral RNA-seq are still inadequate and the sequence information generated is biased and incomplete. In an attempt to resolve these issues, an environmental viral metagenomic approach targeting intracellular long double-stranded RNA (dsRNA) has recently been examined (2, 6, 9, 37). Intracellular dsRNA consists of the genomes of dsRNA viruses and replicative intermediates of single-stranded RNA (ssRNA) viruses, and, thus, long dsRNA is known as an RNA virus-specific molecule and molecular marker for RNA virus infection and replication (28). Therefore, a metagenomic analysis targeting intracellular long dsRNA theoretically retrieves dsRNA and ssRNA viruses, except for ssRNA retroviruses, which do not form dsRNA in the replicative stage. In addition, it is possible to eliminate non-viral nucleic acids such as mRNA and rRNA, which dominate RNA-seq reads, by DNase I, S1 nuclease, RNase, or column chromatography (44). However, previous studies have reported technical issues with the purification of dsRNA and library construction. Random priming for the reverse transcription of dsRNA does not enable the terminal sequences of the dsRNA molecule to be determined or eliminate significant contamination by non-viral sequences. The heterogeneous sequencing depth in certain viral genome segments is also an issue associated with this method (2, 6, 9, 37). Although the full-length cDNAs of dsRNA viruses may be obtained using loop primers that are ligated to the dsRNA terminal ends for reverse transcription (8), this method is only useful for short dsRNA viruses. Therefore, to the best of our knowledge, this method has not yet been applied to a viral metagenomic analysis. We herein established a novel strategy to obtain full-length RNA virus sequences with extremely high efficiency by applying a short dsRNA full-length cloning method (8) for physically fragmented dsRNAs. The improved method, named FLDS (fragmented and loop primer ligated dsRNA sequencing), was applied to a diatom colony in a tide pool and revealed previously unidentified RNA viruses. Our results indicate that the diversity of environmental RNA viruses has been underestimated due to the technical limitations in identifying entire RNA viromes in cellular organisms, and this technique will be a powerful tool for cataloging RNA viruses associated with organisms of interest.

Materials and Methods

Model and environmental samples

Mycelial plugs of Magnaporthe oryzae strain S-0412-II 1a, naturally infected with Magnaporthe oryzae chrysovirus 1 strain A (MoCV1-A) (45) were incubated in 0.5% yeast extract and 2% glucose liquid broth (YG broth) with reciprocal shaking (60 rpm) at 25°C for 2 weeks in the laboratory of Prof. Teraoka (Tokyo University of Agriculture and Technology). Colonies of a diatom on tidal rocks in Tokyo Bay (35.3405° N, 139.6396° E) were sampled in April 2014. After washing with distilled water, the colonies were stored at −80°C.

Purification and fragmentation of dsRNA

DsRNA was purified as described by Okada et al. with a few modifications (31, 46). Briefly, the microbial sample was disrupted in liquid nitrogen in a mortar and total nucleic acids were manually extracted. DsRNA was purified twice through a micro-spin column (empty Bio-spin column; Bio-Rad Laboratories, Inc., Hercules, CA, USA) containing cellulose powder (Cellulose D; ADVANTEC, Tokyo, Japan) to obtain pure dsRNA. The dsRNA eluted from cellulose powder in MQ water was treated with DNaseI (amplification grade, Invitrogen, Carlsbad, CA, USA) and S1 nuclease (Invitrogen) in nuclease buffer (57 mM CH3COONa, 9.5 mM MgCl2, 1.9 mM ZnSO4, and 189 mM NaCl) and was then incubated at 37°C for 2 h. The final concentrations of CH3COONa, MgCl2, ZnSO4, and NaCl were adjusted to 90 mM, 15 mM, 3 mM, and 300 mM, respectively. DsRNA was purified using an RNeasy Mini Kit (Qiagen, Valencia, CA). A one-tenth volume of 10 × ShortCut buffer and 10 × MnCl provided with ShortCut RNase III (NEB Japan, Tokyo, Japan) was added to the dsRNA solution and fragmented by ultrasound at 4°C in Snap-Cap microTUBEs using a Covaris S220 (Woburn, MA, USA). The fragmentation conditions were as follows; run time 35 s, peak power 140.0 W, duty factor 2.0%, and 200 cycles/burst. Fragmented dsRNA was divided into two equal volumes, and maintained at 37°C with or without ShortCut RNase III (NEB). DsRNAs were then purified using a ZymoClean Gel RNA Recovery Kit (ZymoResearch, Orange, CA). Note that dsRNA purification from M. oryzae was carried out in the laboratory of Prof. Teraoka.

cDNA synthesis and amplification for dsRNA

The PC3-T7 loop primer (5′-p-GGA TCC CGG GAA TTC GGT AAT ACG ACT CAC TAT ATT TTT ATA GTG AGT CGT ATT A-OH-3′) was ligated to fragmented dsRNA as described by Potgieter et al. (34), and dsRNA was then purified using the MinElute Gel Extraction Kit (Qiagen). After the addition of DMSO at a final concentration of 15% (v/v), dsRNA was denatured at 95°C for 3 min and snap-frozen in ice-water slurry. RNA was reverse transcribed into cDNA from the ligated loop primer region using the Superscript III First-Strand Synthesis System (Invitrogen). After excess and hybrid RNAs were removed (34), cDNA was desalted and concentrated using the MinElute PCR cleanup kit (Qiagen). Primary cDNA strands were re-annealed by lowering the temperature from 95 to 50°C, as described previously (30). Second strand DNA polymerization was performed using KOD-plus Neo (Toyobo, Osaka, Japan) with a primer complementary to the partial sequence of the PC3-T7 loop primer, PC2 (5′-CCG AAT TCC CGG GAT CC-3′) (34). After heat activation of KOD-plus Neo in the reaction mixture provided at 96°C for 2 min, template cDNA was added and incubated at 68°C for 5 min. After the reaction, cDNA was amplified under the following conditions: 96°C for 2 min, 25 (for MoCV1-A) or 18 (for diatom colony) cycles of 98°C for 10 s, and 68°C for 2 min. Small cDNA and primer dimers were removed using the 1.25 × SPRIselect reagent kit (Beckman Coulter, Brea, CA, USA) according to the Left Side Size Selection procedure in the manufacturer’s protocol.

Total RNA extraction, cDNA synthesis, and library construction from an environmental sample

Total RNA was isolated from a diatom colony using the TRIzol Plus RNA Purification Kit (Invitrogen) according to the manufacturer’s protocol. The RNA fraction was treated with DNase I (Takara, Otsu, Japan). Double-stranded cDNA was synthesized from 2 μg of total RNA with random primers (9-mers) using a PrimeScript Double Strand cDNA Synthesis Kit (Takara). The resultant cDNA was quantified using a Qubit dsDNA HS Kit.

Illumina sequencing

Ultrasound was used to fragment cDNA in Snap-Cap microTUBEs at 4°C using a Covaris S220 (Woburn, MA, USA). The fragmentation conditions were as follows; run time 55 s, peak power 175.0 W, duty factor 5.0% and 200 cycles/burst. The Illumina library was constructed with KAPA Hyper Prep Kit Illumina platforms (Kapa Biosystems, Woburn, MA, USA). The quantity of the library was evaluated using the KAPA library quantification kit (Kapa Biosystems). Each 300 bp of the paired-end sequences of each fragment were determined with the Illumina MiSeq platform (San Diego, CA, USA).

Data assembly and processing

Raw sequence reads were processed with the CLC Genomics Workbench (CLC Bio, Aarhus, Denmark). Adaptor and primer sequences were trimmed, and low quality sequence regions were removed with default parameters. PhiX sequences derived from control libraries and experimentally contaminated sequences (< 0.05% of total reads) were also removed using a mapping tool. The consensus sequences of viral contigs were obtained de novo exclusively with the CLC Genomics Workbench (CLC Bio), and assemblies were manually examined and extended using the Tablet viewer (27). Using the mapping tool, each contig was confirmed to be constructed with at least 3 × sequence coverage, 10 × average coverage, and 1,000 bp in length. In cases of dominant reads (more than 10 reads) that stopped in the same position around the ends of contigs, the position was recognized as a terminal end. The predicted terminal ends of the viral genome segments were also confirmed by the presence of adjacent PCR primer sequences next to the predicted terminal sequence, except for cases of contigs with a poly(A) tail. Contigs with 70–90% nucleotide identity with other contigs were classified as the genome types of the same species. Contigs with > 90% nucleotide identity were assigned as the same genome type and only major contigs were used in further analyses. Sequences were compared against the NCBI non-redundant nucleotide and amino acid (aa) databases using BLASTN-plus and BLASTX-plus, respectively (5), and then classified by MEGAN 5.7.1. (18). A sequence analysis was performed using Genetyx-MAC software version 17.0.0 (Genetyx Corp., Tokyo, Japan) and Genetyx software version 9.1.0 (Genetyx). Most full-length small subunit rRNA sequences in the diatom colony were reconstructed from RNA-seq reads with EMIRGE (26).

Phylogenetic analysis

Multiple alignments based on the deduced aa sequences of putative RNA-dependent RNA polymerase (RdRp) genes in dsRNA contigs were obtained using ClustalX 2.0 (23) and MEGA5 software (42). Phylogenetic analyses were conducted using MrBayes 3.2.3 (36) with the model of aa substitution, RtREV+I+G+F, selected by ProtTest2.4 (1), as judged by the Akaike information criterion (33). Bayesian analyses with the covarion parameter were run with one run and four chains for 1,000,000 generations.

Data accession

The data sets supporting the results of this study are available in the GenBank database repository (accession nos. DDBJ: AP014890–AP014920) and Short Read Archive database (accession no. DDBJ: DRA003723 and DRA003724).

Results

Application of FLDS to a segmented dsRNA virus

The novel dsRNA purification and library construction method, named FLDS, consists of cellulose column chromatography, the physical fragmentation of dsRNA, cDNA synthesis using a loop primer, and the PCR amplification of cDNA (Fig. 1). The purification of dsRNA was achieved by the repeated affinity purification of dsRNA using cellulose powder and the enzymatic removal of ssRNA and DNA. Purified dsRNAs were fragmented using ultrasound to retrieve all types of dsRNA viruses in order to apply the previously reported full-length dsRNA cloning method using a loop primer (8). The full-length dsRNA cloning method requires overlapped cDNAs synthesized from both terminal ends for further cDNA amplification, and was only applicable to short dsRNA molecules. Reverse transcription was initiated from the ligated loop primer on both ends of the dsRNA fragment. cDNA was then thermally denatured to allow annealing of single-stranded cDNA with the complementary sequence in the 3′ terminal region. The single-stranded regions of annealed cDNA were filled in with DNA polymerase. The double-stranded cDNA derived from dsRNA was amplified by PCR with a single primer (PC2) in order to obtain sufficient cDNA to construct a sequencing library.

Fig. 1

Schematic work flow of FLDS. 1. Fragmentation of dsRNA by ultrasound. 2. Ligation of a loop primer on 3′-terminal ends and reverse transcription. 3. Selective duplex formation of cDNA from dsRNA, and PCR amplification. Details of the FLDS method are described in the Materials and Methods section.

Mycelial MoCV1-A was used to test the feasibility of this method. Since PCR amplicons were not observed in the dsRNA-specific RNaseIII-treated sample prior to reverse transcription, most of the amplicons (cDNA) were likely to have been derived from dsRNA (Fig. S1). The results of the sequencing analysis indicated that 99.1% of total reads were derived from the MoCV1-A genome (Table S1). Five contigs obtained by de novo assembly were identical to the entire region of the MoCV1-A genome segments attained using a conventional cloning and sequencing method (44, 45) with > 99.9% identity (Table S2). Read mapping on MoCV1-A genomes (Fig. S2) showed that the sequence coverage of terminal regions was generally higher than that of the central regions of each segment with few exceptions. No obvious relationship was observed between read coverage and GC content (Fig. S2). These results indicated that FLDS effectively enriched dsRNA reads, thereby allowing the retrieval of complete genome sequences including terminal regions without the requirement for the additional rapid amplification of cDNA ends (RACE).

FLDS analysis in an environmental diatom colony

Gel electrophoresis showed that the total long dsRNA fraction from the diatom colony contained at least ten dsRNA segments, whereas genomic DNA and rRNA were the predominant in total nucleic acids (Fig. 2). Total dsRNA extracted from 1 g of the diatom colony was analyzed using the FLDS method. PCR amplicons were not observed in the dsRNA-specific RNaseIII-treated sample prior to reverse transcription (Fig. S3). As a result of de novo assembly and manual extension, we obtained 42 composite viral contigs (Table 1 and Table S3). More than 98.2% of reads were mapped to these 42 contigs (Table 2) as in the case of the model experiment described above. Both terminal ends of 31 of the viral contigs were identified and recognized as full-length viral genome segments. The terminal sequences of the full-length segments were used to identify segment compositions for some of the viral species because terminal sequences are highly conserved between segments in some dsRNA viral genomes for viral RNA replication and/or encapsidation (19).

Fig. 2

Agarose gel electrophoresis of purified nucleic acids from a diatom colony. Nucleic acids were stained with ethidium bromide. Lane M, 300 ng of HindIII-digested λ DNA; lane 1, total nucleic acids extracted from 5 mg (wet weight) of the diatom colony; lane 2, purified dsRNA extracted from 1 g (wet weight) of the diatom colony.

Table 1

List of complete composite genomes of RNA viruses and full-length virus-like RNAs obtained from a diatom colony obtained using FLDS.

RNA virus species	Accession	Description	Size (nt)	Num. of mapped reads	Average coverage	BlastX analysis

						Top Hit for each CDS, Virus family	E-value	Protein
DCADSRV-1a)	AP014890	segment 1	1,734	1,301,278	191,942	—	—	—
DCADSRV-1a)	AP014891	segment 2	1,562	1,717,396	279,580	Fox PicobirnavirusPicobirnaviridae	1 × 10⁻³³	RdRp

DCADSRV-2	AP014892		4,026	1,337,570	83,876	Ustilaginoidea virens nonsegmented virus1Not assigned	5 × 10⁻¹⁵	RdRp

DCADSRV-3	AP014893		4,911	14,544	703	Ustilaginoidea virens RNA virus1Totiviridae	2 × 10⁻⁶³	RdRp

DCADSRV-4	AP014894	Genome type A	4,982	12,325	591	Aspergillus mycovirus 178Totiviridae	4 × 10⁻⁶⁹	RdRp
DCADSRV-4	AP014895	Genome type B	4,979	1,074	52	Ustilaginoidea virens RNA virus1Totiviridae	5 × 10⁻⁶⁹	RdRp

DCADSRV-5	AP014896		5,252	7,863	359	Aspergillus foetidus slow virus 1Totiviridae	3 × 10⁻⁷⁴	RdRp

DCADSRV-6	AP014897		4,939	2,720	131	Aspergillus mycovirus 178Totiviridae	2 × 10⁻⁶⁶	RdRp

DCADSRV-7	AP014898		5,327	1,957	87	Gremmeniella abietina RNA virus L1Totiviridae	3 × 10⁻¹²³	RdRp
DCADSRV-7	AP014898		5,327	1,957	87	Ustilaginoidea virens RNA virus 3Totiviridae	2 × 10⁻⁵⁶	CP

DCADSRV-8	AP014899		4,660	1,163	60	Aspergillus foetidus slow virus 1Totiviridae	8 × 10⁻⁵⁷	RdRp

DCADSRV-9	AP014900	Genome type A	4,844	1,198	60	Magnaporthe oryzae virus 2Totiviridae	1 × 10⁻⁶⁵	RdRp
DCADSRV-9	AP014901	Genome type B	4,845	364	18	Aspergillus foetidus slow virus 1Totiviridae	2 × 10⁻⁶⁶	RdRp

DCADSRV-10	AP014902		5,082	1,244	59	Rosellinia necatrix victorivirus 1Totiviridae	2 × 10⁻¹⁰⁸	RdRp
DCADSRV-10	AP014902		5,082	1,244	59	Ustilaginoidea virens RNA virus 1Totiviridae	6 × 10⁻⁵⁰	CP

DCADSRV-11	AP014903		5,160	1,173	55	Ustilaginoidea virens RNA virus1Totiviridae	4 × 10⁻¹²⁸	RdRp
DCADSRV-11	AP014903		5,160	1,173	55	Ustilaginoidea virens RNA virus 1Totiviridae	8 × 10⁻⁶⁴	CP

DCADSRV-12	AP014904		5,941	1,219	49	Beauveria bassiana RNA virus 1Totiviridae	1 × 10⁻⁴⁰	RdRp

DCADSRV-13	AP014905		4,671	820	42	Aspergillus foetidus slow virus 1Totiviridae	4 × 10⁻⁵⁸	RdRp

DCADSRV-14a)	AP014906	segment 1	1,576	438	67	Persimmon cryptic virusPartitiviridae	3 × 10⁻⁹⁷	RdRp
DCADSRV-14a)	AP014907	segment 2	1,490	274	43	—	—	—

DCADSRV-15	AP014908		12,172	1,482	29	Chalara endornavirus CeEV1Endornaviridae	1 × 10⁻¹¹⁵	Polyprotein

DCASSRV-1	AP014912		11,413	1,011	21	Border disease virus—BD31Flaviviridae	4 × 10⁻¹⁵	Polyprotein

DCASSRV-2	AP014913		4,586	4,153	224	Tuber excavatum mitovirusNarnaviridae	5 × 10⁻²⁰	RdRp

DCADSRV-16	AP014909		6,635	8,735	310	Rhizoctonia fumigata mycovirusNot assigned	4 × 10⁻¹⁰	RdRp

DCADSRV-17	AP014910	Genome type A	5,907	5,325	218	dsRNA virus environmental sampleNot assigned	7 × 10⁻¹⁴	RdRp
DCADSRV-17	AP014911	Genome type B	5,909	1,564	63	Botrytis porri RNA virus 1Not assigned	1 × 10⁻¹³	RdRp

DCAVLRS-1	AP014914	Interrupted RdRp	4,567	57,802	3,039	Ustilaginoidea virens nonsegmented virus 1Not assigned	3 × 10⁻¹¹	RdRp

DCAVLRS-2	AP014915	Interrupted RdRp	4,786	41,181	2,100	Ustilaginoidea virens nonsegmented virus 1Not assigned	2 × 10⁻¹¹	RdRp

DCAVLRS-3	AP014916	CP only	3,458	13,140	876	Ustilaginoidea virens RNA virus 1Totiviridae	2 × 10⁻⁴¹	CP

DCAVLRS-4	AP014917	RdRp only	3,190	3,995	294	Magnaporthe oryzae virus 2Totiviridae	2 × 10⁻¹²³	RdRp

DCAVLRS-5	AP014918	CP only	3,262	1,331	96	Phomopsis vexans RNA virusTotiviridae	5 × 10⁻⁴⁷	CP

DCAVLRS-6	AP014919	RdRp only	3,325	891	65	Ustilaginoidea virens RNA virus 3Totiviridae	6 × 10⁻¹⁰²	RdRp

DCAVLRS-7	AP014920	Interrupted RdRp	1,986	164	20	Flammulina velutipes browning virusPartitiviridae	4 × 10⁻⁶³	RdRp

The classification was based on the shared 5′ terminal sequences in paired segments, whereas CDSs in the segments that did not show significant similarities with genes in databases.

Table 2

Classification of next-generation sequencing reads obtained by FLDS and total RNA-seq.

	FLDS		total RNA-seq

	Num. of reads	rate (%)	Num. of reads rate	(%)
Trimmed	4,631,738	100.0	6,979,561	100.0
Major viral reads	4,549,629	98.2	24,036	0.3
Unmapped reads (include minor viral reads)	82,109	1.7	6,955,525	99.6

Based on aa sequence similarities (E-value ≥ 1 × 10−5) in the predicted protein-encoding sequences (CDSs), the number of genome segments in related viruses, and terminal conserved sequences in each segment of a single virus, we identified 22 viral putative composite genomes out of 31 full-length viral segments. Sequence similarities between the 22 putative viral composite genomes were used to classify them into 19 putative viral species, and each of the two genome types was identified in three species (Table 1). Seventeen dsRNA and two ssRNA viral species were identified and named Diatom Colony-Associated dsRNA virus 1–17 (DCADSRV-1–17) and Diatom Colony-Associated ssRNA virus 1–2 (DCASSRV-1–2) (Table 1). Since ssRNA viruses form an RNA duplex as an intermediate in genome replication, these contigs were most likely derived from replicating ssRNA viruses (11) and not from contaminant ssRNA. An additional seven full-length viral segments with predicted CDSs were also identified; however, we were unable to determine the combination of their segments or reconstruct viral genomes based on information from previously reported viruses. Thus, these viral segments were assigned as Diatom Colony-Associated Virus-Like RNA Segments (DCAVLRS-1–7).

Comparison between FLDS and total RNA-seq

Total RNA from the diatom colony was also investigated using shotgun RNA-seq in order to determine the active organisms of the colony and the abundance of viral RNA genomes in total RNA. Sequence reads derived from rRNA were identified using EMIRGE (26). The results of the analysis revealed that 56% of all trimmed reads were rRNA sequences, while 37.2% of all reads showed more than 99% identity to 18S and 23S rRNA from the diatom Achnanthes brevipes. In addition, 4.1 and 6.2% of reads belonged to the other diatom genus Cylindrotheca and chlorophyte genus Cladophora, respectively. The relative abundance of the rRNA reads was shown in Table S4. Only 0.3% of reads from total RNA-seq was mapped on the major viral contigs obtained using FLDS with a read mapping algorithm in the CLC workbench (Table 2). Comparisons of the relative read frequencies of each major viral contig between total RNA-seq and FLDS revealed that FLDS achieved 0.8–4372.3-fold enrichment for each viral contig (653.2 mean) (Fig. 3). FLDS also had apparent advantages in uniform read coverage and efficiency for retrieving terminal sequences (Fig. 4). Sequence reads for ssRNA viruses in FLDS were also more abundant than when RNA-seq was used for four out of five ssRNA contigs. In addition, by de novo assembly, only six partial viral contigs were obtained using RNA-seq, and no viral contigs specific for total RNA-seq were found. Accordingly, we concluded that FLDS is more efficient than total RNA-seq for the detection and identification of RNA viruses, with the exception of retroviruses, which theoretically cannot be identified using FLDS.

Fig. 3

Comparison of mapped read frequencies for each viral contig between FLDS and total RNA-seq. Plots indicate each viral contig. The rhombus and triangle plots show dsRNA and ssRNA viral contigs, respectively. 100–10−7 represent the frequencies of reads in each library. Dotted lines with 1×, 10×, 100×, or 1000× show a higher viral read frequency than that with an RNA-seq analysis. Reads mapped with nine contigs found in FLDS were not found in total RNA-seq.

Fig. 4

Comparison of coverage uniformity between FLDS and RNA-seq. DsRNA segments with an average depth of > 200 in RNA-seq were used for the analysis. (A) Coefficient of variation (the ratio of the standard deviation to the mean coverage). Values were plotted on viral dsRNA segments of DCADSRV-1 segment 1 (square), DCADSRV-1 segment 2 (triangle), and DCADSRV-2 (circle), and were plotted on the Y axis. (B–D) Genomic coverage of each viral segment from the FLDS (upper graph) and RNA-seq (lower graph) analysis.

Phylogenetic analysis and characterization of viral RNA genomes

A phylogenetic analysis of viral RNA replicases (RNA-dependent RNA polymerase; RdRp) presented the phylogenetic relationship between viral genomes from a diatom colony and known RNA viruses (Fig. S4). Viruses belonging to the family Totiviridae harbor non-segmented dsRNA genomes and form isometric virions that infect either fungi or protozoa (21). Thirteen composite genomes of Totiviridae-related viruses were identified and classified into four clades distinct from the five characterized genera of Totiviridae (clades a–d in Fig. S4A). Clade c was the sister clade of the proposed genus “Trichomonasvirus” and clade d included Ustilaginoidea virens RNA virus 1 (UvRV1). In general, −1 ribosomal frameshift signals [the XXXYYYZ motif (4), in which XXX may be any three identical nucleotides, YYY may be either AAA or UUU, and Z may be A, U, or C] or +1 ribosomal frameshift signals [CCCUUUU (14) or UCCUUUCGU (47)] were located in the upstream region of the 2nd CDS, and were used in the expression of overlapping viral genes such as the pol (RdRp) of Totivirus and Leishmaniavirus. These regions were examined in an attempt to better classify the identified viruses. However, as in the case of UvRV1, −1 or +1 ribosomal frameshift signals were not found in any of the Totiviridae genomes obtained in this study. CDSs in the predicted Totiviridae virus-like segments DCAVLRS-3 and DCAVLRS-4 showed significant similarities with the gag (coat protein; CP) and pol (RdRp) of known totiviruses, respectively. Totiviridae genomes consist of a single genome segment that encodes the two essential CDSs, whereas DCAVLRS-3 and -4 lacked pol and gag, respectively. These two segments harbored nine identical 5′-terminal nucleotide sequences, which were distinguishable from the other identified terminal viral sequences. Genomic features implied that DCAVLRS-3 and -4 may be parts of a bisegmented viral genome. RdRp in DCADSRV-1 segment 2 showed significant homology with that in fox Picobirnavirus, a member of the Picobirnaviridae, although DCADSRV-1 was phylogenetically distinct from the known Picobirnaviridae viruses (Fig. S4B). Picobirnaviruses are small, non-enveloped, bisegmented dsRNA viruses that infect animals and humans (21). The genome structure of DCADSRV-1 was similar to that of the known Picobirnaviridae (21). DCADSRV-14 was classified into the genus Deltapartitivirus of the family Partitiviridae based on the predicted RdRp sequence (Fig. S4C) (29). To date, all of the alphacryptoviruses have been identified from plants including the angiosperm, gymnosperm, and chlorophytes (29). rRNA sequences belonging to the Streptophyta, including land plants, have not yet been detected by an RNA-seq analysis, whereas Cladophora sp. of the Chlorophyta, a sister division of Streptophyta, were detected (Table S4). The CDSs of DCADSRV-15 and a few viral contigs showed significant homology with viruses belonging to the Endornaviridae (dsRNA), Naranviridae (ssRNA), or Hypoviridae (ssRNA), whose virion formation has not yet been observed. In the ssRNA viral population, RdRp in DCASSRV-1 presented a close relationship with Border disease virus—BD31 (E-value = 4 × 10−15), a member of the genus Pestivirus of the family Flaviviridae, which consists of the arthropodborne pathogens of humans and other animals. The genome size and CDS structure of DCASSRV-1 (11.4 kb) were similar to those of Flaviviridae (9.6–12.3 kb) (21), and the phylogenetic tree of RdRp indicated that DCASSRV-1 was not classified into the three known Flaviviridae genera (Fig. S4D). A phylogenetic analysis of RdRp in DCASSRV-2 suggested that the RNA virus was classified into the genus Mitovirus, which has a non-segmented ssRNA genome, infects the mitochondria of fungi, and lacks viral particles (Fig. S4E). The presence of multiple UGA codons suggested that the putative coding strand of DCASSRV-2 was likely to be translated in mitochondria. The genome size of DCASSRV-2 (4.5 kb) was larger than those of the known mitoviruses (2.3–3.6 kb) (17).

Discussion

This study revealed the presence of novel RNA viruses associated with a diatom colony and inferred the unexpected evolutionary relationship between environmental viruses and pathogenic animal viruses. Among the RNA viral genomes obtained in this study, some dominant populations showed greater similarities to fungal viruses than to known diatom viruses; however, several ssDNA and ssRNA viruses have already been identified from marine diatoms (20, 43). We cannot exclude the possibility that these viral genomes were derived from fungi associated with a diatom colony, but it is more likely that they came from the major components of a diatom colony because of their high abundance in the RNA viral metagenomic library. Since extracellular viral particles have been a major target of virus surveillance and isolation, information on intracellular viruses in microorganisms is very limited (13, 37). Therefore, the accumulation of knowledge on intracellular RNA viruses infecting diverse host organisms is essential for understanding the evolution and distribution of RNA viruses. FLDS revealed 22 full-length and some partial composite viral RNA genomes associated with a diatom colony by de novo assembly. These were classified into five dsRNA (Totiviridae, Endornaviridae, Picobirnaviridae, Cystoviridae, and Partitiviridae) and four ssRNA (Flaviviridae, Narnaviridae, Virgaviridae, and Hypoviridae) virus families. To the best of our knowledge, this is the largest number of full-length genome sequences of novel RNA viruses identified in one metagenomic library. The viral RNA community successfully detected in this study consisted of dsRNA viruses with or without virion formation and ssRNA viruses detected as replicative intermediates. Our results suggest that FLDS has the potential to detect a wide range of RNA viruses, excluding retroviruses. Several studies have been performed using metagenomic analyses targeting dsRNA with Next-Generation Sequencing technology. In these studies, viral read abundance reached a maximum of 52.7% (2). In contrast, FLDS provided extremely high viral read abundance. The improvement in viral read rates with FLDS was likely derived from [1] a combination of repeating cellulose powder column chromatography and subsequent enzymatic treatment, [2] the fragmentation and efficient thermal denaturation of dsRNA prior to cDNA synthesis, and [3] the selective duplex formation of dsRNA-derived cDNA prior to PCR amplification. Furthermore, FLDS also presented advantages in reconstructing complete genome sequences including terminal regions, which are difficult to obtain using RNA-seq and random priming methods (3). The complete sequences of viral RNA segments are beneficial for the identification of RNA virus segments, particularly in cases in which coding CDSs did not show significant similarities with viral CDSs in databases. The application of a fulllength dsRNA cloning method using a PC3-T7 loop primer (8) to fragmented dsRNA enabled us to determine the terminal regions of long dsRNA genomes. Since T4 RNA ligase requires a 5′ phosphoryl-terminated nucleic acid donor (PC3-T7 loop primer) and 3′ hydroxyl-terminated nucleic acid acceptor for ligation activity, dsRNA fragments with 3′ terminal phosphate were not used as substrates. The terminal structures of dsRNA fragmented by ultrasound have not been reported. However, in the case of dsDNA fragmented by ultrasound, double-strand breaks occur preferentially in 5′-CpG-3′ dinucleotides, and the phosphate group is at the 5′ side of G in the products (15). In this study, fragmented dsRNAs were successfully converted into cDNA and amplified. Taking this into consideration, dsRNA fragmentation using ultrasound with Covaris S220 also produced 3′ hydroxyl-terminated fragments. Furthermore, the lack of any modifications to the 3′ hydroxyl-terminal of viral RNA genomes (21) also allowed us to retrieve the terminal regions of the RNA viral genome. Total RNA-seq is considered to be a less-biased method for identifying RNA viruses despite the very low abundance of viral reads in general. In the present study, FLDS enriched the viral RNA reads by > 300-fold that with total RNA-seq (Table 2). Notably, FLDS produced significantly more ssRNA viral reads than total RNA-seq; however, FLDS only has the ability to detect ssRNA viruses at the replicative stage. Moreover, FLDS showed more uniform read coverage than RNA-seq. These results indicate that FLDS is more effective than total RNA-seq for revealing all RNA viruses in cellular organisms.

41 in total

Review 1. Genome packaging in influenza A virus.

Authors: Edward C Hutchinson; Johann C von Kirchbach; Julia R Gog; Paul Digard
Journal: J Gen Virol Date: 2009-12-02 Impact factor: 3.891

Review 2. 50-plus years of fungal viruses.

Authors: Said A Ghabrial; José R Castón; Daohong Jiang; Max L Nibert; Nobuhiro Suzuki
Journal: Virology Date: 2015-03-13 Impact factor: 3.616

Review 3. Taxonomic reorganization of family Partitiviridae and other recent progress in partitivirus research.

Authors: Max L Nibert; Said A Ghabrial; Edgar Maiss; Till Lesker; Eeva J Vainio; Daohong Jiang; Nobuhiro Suzuki
Journal: Virus Res Date: 2014-04-21 Impact factor: 3.303

4. Discovery of two novel viruses expands the diversity of single-stranded DNA and single-stranded RNA viruses infecting a cosmopolitan marine diatom.

Authors: Kei Kimura; Yuji Tomaru
Journal: Appl Environ Microbiol Date: 2014-12-01 Impact factor: 4.792

5. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169

6. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples.

Authors: Christian B Matranga; Kristian G Andersen; Sarah Winnicki; Michele Busby; Adrianne D Gladden; Ryan Tewhey; Matthew Stremlau; Aaron Berlin; Stephen K Gire; Eleina England; Lina M Moses; Tarjei S Mikkelsen; Ikponmwonsa Odia; Philomena E Ehiane; Onikepe Folarin; Augustine Goba; S Humarr Kahn; Donald S Grant; Anna Honko; Lisa Hensley; Christian Happi; Robert F Garry; Christine M Malboeuf; Bruce W Birren; Andreas Gnirke; Joshua Z Levin; Pardis C Sabeti
Journal: Genome Biol Date: 2014 Impact factor: 13.583

7. Comparison of tissue sample processing methods for harvesting the viral metagenome and a snapshot of the RNA viral community in a turkey gut.

Authors: Jigna D Shah; Joshua Baller; Ying Zhang; Kevin Silverstein; Zheng Xing; Carol J Cardona
Journal: J Virol Methods Date: 2014-09-01 Impact factor: 2.014

8. Evolution and phylogenetic analysis of full-length VP3 genes of Eastern Mediterranean bluetongue virus isolates.

Authors: Kyriaki Nomikou; Chrysostomos I Dovas; Sushila Maan; Simon J Anthony; Alan R Samuel; Maria Papanastassopoulou; Narender S Maan; Olga Mangana; Peter P C Mertens
Journal: PLoS One Date: 2009-07-30 Impact factor: 3.240

9. Metagenomic analysis of RNA viruses in a fresh water lake.

Authors: Appolinaire Djikeng; Ryan Kuzmickas; Norman G Anderson; David J Spiro
Journal: PLoS One Date: 2009-09-29 Impact factor: 3.240

10. Mutational analysis of the "slippery-sequence" component of a coronavirus ribosomal frameshifting signal.

Authors: I Brierley; A J Jenner; S C Inglis
Journal: J Mol Biol Date: 1992-09-20 Impact factor: 5.469

23 in total

1. RNA viromes from terrestrial sites across China expand environmental viral diversity.

Authors: Yan-Mei Chen; Sabrina Sadiq; Jun-Hua Tian; Xiao Chen; Xian-Dan Lin; Jin-Jin Shen; Hao Chen; Zong-Yu Hao; Michelle Wille; Zhuo-Cheng Zhou; Jun Wu; Feng Li; Hong-Wei Wang; Wei-Di Yang; Qi-Yi Xu; Wen Wang; Wen-Hua Gao; Edward C Holmes; Yong-Zhen Zhang
Journal: Nat Microbiol Date: 2022-07-28 Impact factor: 30.964

10. Characterization of Cronartium ribicola dsRNAs reveals novel members of the family Totiviridae and viral association with fungal virulence.

Authors: Jun-Jun Liu; Yu Xiang; Richard A Sniezko; Anna W Schoettle; Holly Williams; Arezoo Zamany
Journal: Virol J Date: 2019-10-17 Impact factor: 4.099