| Literature DB >> 26040701 |
Eric Roberto Guimarães Rocha Aguiar1, Roenick Proveti Olmo1, Simona Paro2, Flavia Viana Ferreira3, Isaque João da Silva de Faria4, Yaovi Mathias Honore Todjro4, Francisco Pereira Lobo5, Erna Geessien Kroon3, Carine Meignin6, Derek Gatherer7, Jean-Luc Imler8, João Trindade Marques9.
Abstract
Virus surveillance in vector insects is potentially of great benefit to public health. Large-scale sequencing of small and long RNAs has previously been used to detect viruses, but without any formal comparison of different strategies. Furthermore, the identification of viral sequences largely depends on similarity searches against reference databases. Here, we developed a sequence-independent strategy based on virus-derived small RNAs produced by the host response, such as the RNA interference pathway. In insects, we compared sequences of small and long RNAs, demonstrating that viral sequences are enriched in the small RNA fraction. We also noted that the small RNA size profile is a unique signature for each virus and can be used to identify novel viral sequences without known relatives in reference databases. Using this strategy, we characterized six novel viruses in the viromes of laboratory fruit flies and wild populations of two insect vectors: mosquitoes and sandflies. We also show that the small RNA profile could be used to infer viral tropism for ovaries among other aspects of virus biology. Additionally, our results suggest that virus detection utilizing small RNAs can also be applied to vertebrates, although not as efficiently as to plants and insects.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26040701 PMCID: PMC4513865 DOI: 10.1093/nar/gkv587
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of the pipeline for virus detection based on long and small RNAs. Different RNA fractions were utilized for the construction of small and long RNA libraries. Sequenced reads were processed to enrich for potential virus sequences. Processed reads were then utilized for contig assembly and extension. Contigs were characterized using both sequence-based and pattern-based strategies. Viral contigs were further validated by RT-PCR and Sanger sequencing. See text for details.
Figure 2.Small RNA sequencing identifies viral sequences more efficiently than long RNAs. (A) Comparison of number of contigs and size of largest contig in each small RNA library using different size ranges of small RNAs in the assembly step. (B) Proportion of contigs assembled in each library with significant similarity to reference sequences. The origin of contigs is classified by taxon and includes unknown sequences. (C) Size distribution of viral (red), non-viral (blue) and unknown contigs (grey) for each library. P-values for the difference between viral and non-viral contig sizes are indicated (Student t-test). (D) Viral RNA sequences were detected by RT-PCR from total RNA extracted from three separate pools of Drosophila, Aedes and Lutzomyia populations. Sanger sequencing of PCR products showed high identity to the sequence determined by our metagenomics approach as shown in the right column (not done; nd). (E) Comparison of processing time, number of contigs and frequency distribution of contig sizes for small and long RNA libraries shown in grey and black, respectively. (F) Coverage of PCLV and HTV genome segments by contigs assembled in each small and long RNA libraries from mosquitoes. Biological replicate samples are shown in blue, green and red.
Summary of viruses identified in Drosophila melanogaster, Aedes aegypti and Lutzomyia longipalpis
| Host | Virus family | Virus | Largest contig (nt) | Segment status1 | # contig (sum of libraries) | ID strategy | Best hit | Accession number (size of reference in nt) | |
|---|---|---|---|---|---|---|---|---|---|
| PCLV | 3936 | CC | 4 | blastx | glycoprotein precursor [Phasi Charoen-like virus] | 0E + 00 | AIF71031.1 (3852) | ||
| PCLV | 6807 | CC | 23 | blastx | RdRP [Phasi Charoen-like virus] | 0E + 00 | AIF71030.1 (6783) | ||
| PCLV | 1332 | CC | 3 | blastx | nucleocapsid [Phasi Charoen-like virus] | 2E − 72 | AIF71032.1 (1398) | ||
| HTV | 1609 | CC | 8 | blastx | structural protein precursor [Drosophila A virus] | 2E − 65 | YP_003038596.1 (1326) | ||
| HTV | 2793 | CC | 13 | blastx | putative RdRP [Laem Singh virus] | 8E − 34 | AAZ95951.1 (507) | ||
| LPRV1 | 3762 | CC | 11 | blastx | RdRP [Choristoneura occidentalis cypovirus 16] | 3E − 173 | ACA53380.1 (3675) | ||
| LPRV1 | 3687 | CC | 5 | blastx | VP3 [Inachis io cypovirus 2] | 1E − 81 | YP_009002593.1 (3450) | ||
| LPRV1 | 3200 | CC | 2 | blastx | VP4 [Inachis io cypovirus 2] | 4E − 63 | YP_009002588.1 (3201) | ||
| LPRV1 | 1842 | CC | 2 | blastx | VP5 [Inachis io cypovirus 2] | 2E − 16 | YP_009002589.1 (1899) | ||
| LPRV1 | 841 | CC | 1 | blastx | polyhedrin [Simulium ubiquitum cypovirus] | 6E − 69 | ABH85367.1 (836) | ||
| LPRV1 | 3685 | CC | 1 | blastx | VP2 [Inachis io cypovirus 2] | 5E − 24 | YP_009002587.1 (3649) | ||
| LPRV1 | 1547 | HQ | 2 | phmmer | unknown [Choristoneura occidentalis cypovirus 16] | 900E − 03 | ABW87641.1 (1946) | ||
| LPRV1 | 2237 | CC | 3 | blastx | unknown [Choristoneura occidentalis cypovirus 16] | 200E − 01 | ABW87640.1 (2214) | ||
| LPRV1 | 2231 | CC | 1 | pattern-based | |||||
| LPRV1 | 1345 | CC | 1 | pattern-based | |||||
| LPRV1 | 688 | HQ | 1 | pattern-based | |||||
| LPRV1 | 680 | HQ | 1 | pattern-based | |||||
| LPRV2 | 3680 | CC | 1 | blastx | RdRP [Bombyx mori cypovirus 1] | 0E+00 | AAK20302.1 (3854) | ||
| LPRV2 | 1116 | CC | 1 | blastx | polyhedrin [Heliothis armigera cypovirus 14] | 4E-11 | AAY34355.1 (956) | ||
| LPRV2 | 2043 + 779 + 1392 | SD | 3 | blastx | VP1 protein [Dendrolimus punctatus cypovirus 1] | 4E-70 | AAN84544.1 (4164) | ||
| LPRV2 | 964 | HQ | 1 | blastx | hypothetical protein LdcV14s9gp1 [Cypovirus 14] | 2E-09 | NP_149143.1 (1141) | ||
| LPRV2 | 678 +1035 + 1617 | SD | 3 | blastx | VP3 [Bombyx mori cypovirus 1] | 5E-14 | ADB95943.1 (3262) | ||
| LPRV2 | 443 + 579 + 769 | SD | 3 | blastx | viral structural protein 4 [Bombyx mori cypovirus 1] | 2E-10 | ACT78457.1 (1796) | ||
| LPRV2 | 1516 | HQ | 1 | blastx | VP2 protein [Dendrolimus punctatus cypovirus 1] | 8E-53 | AAN86620.1 (3846) | ||
| LPRV2 | 599 | HQ | 4 | blastx | unknown [Operophtera brumata cypovirus 18] | 4E-10 | ABB17215.1 (2883) | ||
| LPRV2 | 286 | HQ | 2 | blastx | putative VP5 [Dendrolimus punctatus cypovirus 1] | 3E-02 | AAO61786.1 (1501) | ||
| LPRV2 | 641 | SD | 1 | pattern-based | |||||
| LPRV2 | 1212 | SD | 1 | pattern-based | |||||
| LPRV2 | 1174 | CC | 1 | pattern-based | |||||
| LPRV2 | 976 | SD | 1 | pattern-based | |||||
| LPRV2 | 535 | SD | 1 | pattern-based | |||||
| LPNV | 2054 | CC | 5 | blastx | capsid protein [Nudaurelia capensis beta virus] | 1E − 42 | NP_048060.1 (1836) | ||
| LPNV | 3189 | CC | 23 | blastx | RdRP [Nodamura virus] | 9E − 82 | NP_077730.1 (3129) | ||
| DUV | 1905 + 452 | SD | 2 | blastx | protein P1 (RdRP) [Acyrthosiphon pisum virus] | 2E −63 | NP_620557.1 (10 035) | ||
| DRV | 635 + 175 | SD | 2 | blastx | RdRP [Fiji disease virus] | 8E − 05 | YP_249762.1 (4532) |
1Segment status defined as described by Ladner et al. (39): SD: Standard Draft, HQ: High quality, CC: Coding complete, C: Complete, F: Finished.
Figure 3.Small RNA size profile can classify uncharacterized viral contigs. (A) Small RNA size profile of previously characterized virus segments identified by sequence similarity searches. Blue and red represent small RNAs in the positive and negative strands, respectively. (B) Hierarchical clustering of viral contig sequences assembled in fruit fly, mosquito and sandfly libraries. Clustering was based on Pearson correlation of small RNA size profile shown as a heatmap. Clusters with more than one contig are indicated on the left vertical bar and numbered according to the order in which they appear from top to bottom. Clusters were defined by Pearson correlation above 0.8. (C) Contig Aae.92 and the segment corresponding to the HTV RdRP that grouped together by similarity of the small RNA size profile in panel (B) show perfect correlation of expression in individual mosquitoes as determined by RT-PCR. Results are representative of 46 individual mosquitoes that were analysed. The endogenous gene Rpl32 was used as control for the RT-PCR.
Figure 4.Small RNA pattern-based analysis identifies viral contigs without known relatives in reference databases. (A) Hierarchical clustering of viral and unknown contig sequences assembled in fruit fly, mosquito and sandfly libraries. Clustering was based on Pearson correlation of the small RNA size profile shown as a heatmap. Clusters with more than one contig are indicated on the left vertical bar and numbered according to the order in which they appear from top to bottom. Clusters were defined by Pearson correlation above 0.8. (B) Detection by RT-PCR in two separate pools of sandflies shows that contig sequences in Clusters 2 and 17 mimic the expression of RdRP segments of LPRV1 or LPRV2, respectively. The same pools of Lutzomyia longipalpis (pool1 and pool3) analysed in Figure 2D were used.
Figure 5.The presence of virus-derived piRNAs with a ping-pong signature is indicative of ovary infection. (A) About 24–29 nt small RNAs derived from PCLV show a 10 nt overlap between sense and antisense strands and U enrichment at position 1 and A enrichment at position 10 consistent with piRNAs generated by the ping-pong amplification mechanism found in the insect germline. (B) Both PCLV and HTV are detected in individual mosquitoes but only PCLV is present in ovaries as determined by RT-PCR. Results are representative of eight ovaries of individual mosquitoes that were analysed. The endogenous gene Rpl32 was used as control for the RT-PCR.
Figure 6.Virus detection based on large-scale sequencing of small RNAs is applicable to animals and plants. (A) Percentage of contigs assembled from published small RNA libraries from insects, plants and vertebrate animals with significant similarity against reference sequences. The origin of contigs is classified by taxon and includes unknown sequences. (B) Size distribution of contigs corresponding to viral (red), non-viral (blue) or unknown sequences (grey) in each library. P-values for the difference between contig sizes are indicated (Student t-test). (C) Hypothetical genome organization of MNV based on ORF and small RNA analysis of contigs AaeS.81, AaeS.82 and AaeS.83 identified in this study. (D) Hierarchical clustering of viral and unknown contig sequences assembled in published libraries. Clustering was based on Pearson correlation of the small RNA size profile shown as a heatmap. A single cluster with more than one contig is indicated on the left vertical bar as defined by correlation above 0.8. A sub-cluster highlighted in red contains small RNA profiles of three contigs that show Pearson correlation above 0.998. (E) Coverage of SARS-CoV, EMCV, TuMV and SGIV genomes by contigs assembled in RNA libraries from mouse lungs, ES cells, Arabidopsis and fish GP cells, respectively. (F) Size distribution of contigs and raw sequenced reads derived from SARS-CoV in long (black) or small (grey) RNA libraries from infected mouse lungs. (G) Number of raw reads and contigs sequences derived from viruses in long and small RNA libraries prepared from SARS-CoV infected mouse lungs. The number above bars indicates the percentage of viral reads and contigs sequences relative to the total. Fold enrichment or depletion of virus sequences comparing contigs to raw reads is shown.