| Literature DB >> 27507588 |
Yeonhwa Jo1, Hoseong Choi1, Sang-Min Kim2, Sun-Lim Kim2, Bong Choon Lee2, Won Kyong Cho3,4.
Abstract
BACKGROUND: Next-generation sequencing (NGS) provides many possibilities for plant virology research. In this study, we performed integrated analyses using plant transcriptome data for plant virus identification using Apple stem grooving virus (ASGV) as an exemplar virus. We used 15 publicly available transcriptome libraries from three different studies, two mRNA-Seq studies and a small RNA-Seq study.Entities:
Keywords: Apple stem grooving virus; De novo genome assembly; RNA-Seq; Recombination; Single nucleotide variation; Transcriptome
Mesh:
Substances:
Year: 2016 PMID: 27507588 PMCID: PMC4977635 DOI: 10.1186/s12864-016-2994-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Blast results to identify ASGV-associated contigs from ASGV-infected apple mRNA-Seq data
| Query id | Subject ids | Identity (%) | Alignment length | Mismatches | Gap opens | Query start | Query end | Subject start | Subject end | Evalue | Bit score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TR9237 | NC_001749.2 | 83.29 | 1532 | 253 | 3 | 3 | 1533 | 12 | 1541 | 0 | 1408 |
| TR10325 | NC_001749.2 | 97.15 | 492 | 14 | 0 | 1 | 492 | 515 | 24 | 0 | 832 |
| TR12482 | NC_001749.2 | 98.1 | 473 | 9 | 0 | 1 | 473 | 945 | 473 | 0 | 824 |
| TR1643| | NC_001749.2 | 97.71 | 1356 | 31 | 0 | 1 | 1356 | 1111 | 2466 | 0 | 2333 |
| TR3149 | NC_001749.2 | 98.28 | 232 | 4 | 0 | 1 | 232 | 1157 | 926 | 9.00E-113 | 407 |
| TR1643 | NC_001749.2 | 98.3 | 766 | 13 | 0 | 1 | 766 | 1701 | 2466 | 0 | 1343 |
| TR9237 | NC_001749.2 | 83.74 | 4705 | 741 | 23 | 1743 | 6435 | 1752 | 6444 | 0 | 4429 |
| TR408 | NC_001749.2 | 97.26 | 2225 | 61 | 0 | 1 | 2225 | 2440 | 4664 | 0 | 3771 |
| TR8087 | NC_001749.2 | 78.45 | 2237 | 452 | 30 | 1 | 2222 | 4660 | 2439 | 0 | 1434 |
| TR2 | NC_001749.2 | 83.26 | 723 | 119 | 2 | 1 | 722 | 5349 | 4628 | 0 | 664 |
| TR9341 | NC_001749.2 | 85.78 | 232 | 33 | 0 | 1 | 232 | 5389 | 5620 | 2.00E-64 | 246 |
| TR5938 | NC_001749.2 | 98.29 | 234 | 4 | 0 | 1 | 234 | 5647 | 5880 | 7.00E-114 | 411 |
| TR12218 | NC_001749.2 | 97.46 | 1065 | 27 | 0 | 1 | 1065 | 5683 | 4619 | 0 | 1818 |
| TR9237 | NC_001749.2 | 95.6 | 432 | 19 | 0 | 1 | 432 | 6013 | 6444 | 0 | 693 |
The MEGABLAST results with the best hits were listed. Subject IDs indicates the identity number of an individual assembled contig. Subject ids indicate the best matched viral genome
Fig. 1Identification of de novo viral genome assembly and SNVs for ASGV from RNA-Seq data. a Genome structure of ASGV isolate Fuji. The conserved domains were identified by the SMART program (http://smart.embl-heidelberg.de/). Abbreviations: MT (Methyltransferase), Hel (Helicase), RNA-dependent RNA polymerase (RdRP), Movement protein (MP), and Coat protein (CP). b Alignment of raw data against genome of ASGV isolate Fuji by BWA was visualized by Tablet program. c Positions of identified SNVs in ASGV-infected apple transcriptome were visualized by Tablet program. d Identified sequence reads from ASGV-free apple sample, which were associated with ASGV by BWA alignment. e BLAST results showing sequence reads from ASGV-free apple sample matched to ASGV genome. f Alignment of raw data using ASGV-infected sRNA data from cultivar GD against reference ASGV genome by BWA was visualized by Tablet program. g Positions of identified SNVs in ASGV-infected apple sRNA transcriptome were visualized by Tablet program. h Alignment of raw data from pear sample against genome ASGV isolate Cuiguan by BWA was visualized by Tablet program. i Positions of identified SNVs of ASGV in pear mRNA transcriptome were visualized by Tablet program
Identification of viruses from raw mRNA-Seq data of pear transcriptome by BLAST search
| Name of virus | Accession no. | Size of genome | Read count |
|---|---|---|---|
|
| NC_001749.2 | 6,495 bp | 4274 |
|
| NC_001747.1 | 5,987 bp | 66 |
|
| NC_024686.1 | 6,835 bp | 52 |
|
| NC_018714.1 | 9,266 bp | 39 |
|
| NC_003462.2 | 9,332 bp | 17 |
|
| NC_014821.1 | 9,311 bp | 4 |
|
| NC_003347.1 | 7,564 bp | 4 |
|
| NC_001948.1 | 8,744 bp | 2 |
|
| NC_001409.1 | 7,555 bp | 1 |
|
| NC_015782.1 | 7,275 bp | 1 |
|
| NC_003224.1 | 9,591 bp | 1 |
Comparison of de novo transcriptome assemblers for assembly of viral contigs
| Trinity (SRR1089477) | Velvet (SRR1089477) | Trinity (SRR1269627) | Velvet (SRR1269627) | |
|---|---|---|---|---|
| No. of total contigs | 33858 | 195149 | 15592 | 371745 |
| No. of viral contigs | 14 | 57 | 9 | 37 |
| % of viral contigs | 0.04 | 0.03 | 0.06 | 0.01 |
SRR1089477 and SRR1269627 were data derived from pear and apple samples, respectively
Fig. 2Comparison of two de novo assemblers based on number and sizes of assembled contigs and phylogenetic tree of 21 ASGV isolates. Size distribution of identified viral contigs from ASGV-infected apple sample assembled by Trinity (a) and Velvet (b). Size distribution of identified viral contigs from pear sample assembled by Trinity (c) and Velvet (d). The green-, blue-, and red-colored bars indicate ASPV, AGCAV, and ASGV, respectively. The sizes of only the longest and the shortest contigs in each transcriptome are indicated. (e) The phylogenetic tree was constructed based on the genome sequences of 13 ASGV isolates and 8 CTLV isolates. We followed the original annotations for CTLV and PBNLSV, which were highly homologous to ASGV. The accession number and the name of each isolate were indicated. Detailed information for each isolate can be found in Additional file 10
Fig. 3Identification of recombination events by RDP4 program. a The positions of identified recombinants were depicted with the respective names of parental sequences. The individual genome of the ASGV isolate was indicated by a different colored bar. The identified recombination events, including number 1 (b) and number 2 (c), were rechecked by the RDP4 program and visualized by plot data with pairwise identity information. Detailed information on the identified recombination events is provided in Table 4
Recombination analysis of 21 ASGV genomes using RDP4 program
| In alignment | In recombinant sequence | Relative to CTLV_Lily | Detection methods | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Event | Begin | End | Begin | End | Begin | End | Recombinant sequence(s) | Minor parental sequence(s) | Major parental sequence(s) | RDP | GENECONV | Bootscan | Maxchi | Chimaera | SiSscan | PhylPro | LARD | 3Seq |
| 1 | 6492 | 1926 | 6487 | 1924 | 6488 | 1925 | ASGV_YTG | Unknown (ASGV_HH), Unknown (ASGV_CHN) | ASGV_Fuji | 1.30E-31 | 3.39E-09 | 8.78E-30 | 1.16E-10 | 9.12E-16 | 9.16E-27 | NS | NS | 3.17E-17 |
| 2 | 6492 | 2488 | 6488 | 2487 | 6488 | 2487 | ASGV_HH, ASGV_CHN, CTLV_MTH[P] | ASGV_Fuji, CTLV_MTH | ASGV_Li-23, CTLV_Lily | 3.40E-19 | 3.89E-04 | 1.47E-16 | 1.29E-16 | 1.83E-10 | 1.06E-27 | NS | NS | 1.48E-13 |
| 3 | 4112 | 4552 | 4110 | 4550 | 4110 | 4550 | PBNLSV | ASGV_Li-23, CTLV_Lily | ASGV_KFP | 1.20E-12 | 2.22E-08 | 8.19E-11 | 2.20E-05 | 2.75E-05 | 1.41E-10 | NS | NS | NS |
| 4 | 5844 | 6224 | 5842 | 6221 | 5842 | 6221 | PBNLSV | ASGV_Li-23, CTLV_Lily | Unknown (CTLV_Pk), Unknown (CTLV_Kumquat1), Unknown (CTLV_LCd-NA-1), Unknown (CTLV_SO), Unknown (CTLV_XHC), Unknown (ASGV_Matsuco) | 6.90E-07 | 1.12E-08 | 1.61E-07 | 6.45E-04 | 6.44E-04 | 1.30E-06 | NS | NS | NS |
| 5 | 5819 | 6412a | 5816 | 6408a | 5817 | 6409a | ASGV_YTG | ASGV_HH, ASGV_CHN | ASGV_P-209, ASGV_T47 | 2.42E-08 | 2.14E-03 | 1.58E-09 | 2.89E-03 | 1.65E-02 | 1.97E-07 | NS | NS | NS |
| 6 | 4412 | 4946 | 4409 | 4943 | 4410 | 4944 | CTLV_LCd-NA-1 | CTLV_SO, CTLV_Kumquat1, CTLV_XHC | Unknown (ASGV_Matsuco) | 1.20E-02 | 2.91E-04 | 1.01E-02 | 4.02E-04 | 5.76E-03 | 4.04E-03 | NS | NS | NS |
Minor Parent Parent contributing smaller fraction of sequence, Major Parent Parent contributing larger fraction of sequence, Unknown Only one parent and a recombinant need be in the alignment for a recombination event to be detectable. The sequence listed as unknown was used to infer the existence of a missing parental sequence, NS No significant P-value was recorded for this recombination event using this method
aThe actual breakpoint position is undetermined (it was most likely overprinted by a subsequent recombination event)