Literature DB >> 29018311

De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data.

Yeonhwa Jo1, Hoseong Choi1, Miah Bae1, Sang-Min Kim2, Sun-Lim Kim2, Bong Choon Lee2, Won Kyong Cho1, Kook-Hyung Kim1.   

Abstract

Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

Entities:  

Keywords:  de novo genome assembly; single nucleotide variation; soybean mosaic virus

Year:  2017        PMID: 29018311      PMCID: PMC5624490          DOI: 10.5423/PPJ.OA.03.2017.0060

Source DB:  PubMed          Journal:  Plant Pathol J        ISSN: 1598-2254            Impact factor:   1.795


Soybean (Glycine max (L.) Merr.) is the most important legume crop, representing 50% of the global legume crop area and 68% of global legume production (Herridge et al., 2008). Soybean is consumed as health food, providing a rich source of proteins, and as well as vegetative oil production (Messina, 1999; Pimentel and Patzek, 2005). Moreover, soybean plays an important role for dinitrogen (N2) fixation, which is an important natural process (Herridge et al., 2008). Several diseases in soybean, such as cyst, brown spot, charcoal rot, and sclerotinia stem rot, lead to yield losses in major soybean-producing countries (Wrather et al., 2001). In addition, soybean can be infected by diverse viruses. Although a small numbers of viruses infecting soybean cause serious economic problems in soybean production, it is always important to control and to manage viral diseases in soybeans (Hill and Whitham, 2014). The best known soybean virus is Soybean mosaic virus (SMV), a member of the family Potyviridae, causing soybean mosaic disease. In addition, bean pod mottle virus (BPMV), soybean vein necrosis virus, tobacco ringspot virus, soybean dwarf virus, peanut mottle virus, peanut stunt virus, and alfalfa mosaic virus are important viruses infecting soybeans (Hill and Whitham, 2014). Many plant viruses have been identified based on viral disease symptoms and several detection methods. However, virus infection in plants does not always cause disease symptoms, and many plants showing viral disease symptoms are very often co-infected by different viruses. Recent advances in next generation sequencing (NGS) technology lead to identification of numerous known as well as novel viruses by means of metagenomics (Barba et al., 2014; Massart et al., 2014). Not only NGS data for virus detection but also many plant transcriptome data contain virus sequences, which might be amplified along with infected host transcripts (Burger and Maree, 2015; Jo et al., 2016). The identification of virus sequences in the plant transcriptome is no longer surprising, because most plant viruses are RNA viruses and many of them carry poly(A) tail, which is easily amplified by oligo d(T) primers for cDNA synthesis. Recently, we carried out a large-scale screening to identify viruses infecting soybean in the world using available soybean transcriptome data. Of them, we found that a soybean transcriptome for soybean seed development analysis contains many virus sequences. In this study, we conducted a bioinformatics analyses for virus identification, virus genome assembly, phylogenetic analysis, and single nucleotide variations of the SMV.

Materials and Methods

Plant materials, library preparation, and next generation sequencing

The plant material used for RNA-Seq was soybean cultivar Heinong44. Plants were grown in the experimental station in Beijing from May to August according to the previous study (Song et al., 2013). Total RNAs were extracted from seeds at six different developmental stages, which were classified according to the seed weight. The cDNA was synthesized using poly(A)-containing RNAs. A single RNA-Seq library was constructed and sequenced by single-end sequencing using the Illumina HiSeq 2000 system. The raw data is available in the SRA database (http://www.ncbi.nlm.nih.gov/sra/SRR1777405).

Raw data processing and de novo transcriptome assembly

All bioinformatics analyses were performed in the Linux (Linux Mint version 17)-installed workstation (four 16-core CPUs and 256 GB ram). We downloaded the raw data from the SRA database using the SRA toolkit (Leinonen et al., 2011). The raw SRA data were converted to FASTQ files using the SRA toolkit. For the de novo assembly of transcriptomes, we used Trinity version 2.0.6 (Haas et al., 2013). De novo transcriptome assembly was performed according to the manuals provided by developers with default parameters.

Identification of viruses and sequence alignment

To identify virus-associated contigs, we conducted blast search using standalone BLAST version 2.1.19 installed in the Linux system (Madden, 2013). All assembled contigs were subjected to MEGABLAST search, which is optimized for highly similar sequences, against complete reference sequences for viruses and viroids (http://www.ncbi.nlm.nih.gov/genome/viruses/) with E value 1e-5 as a cutoff. In addition, all raw data were converted to FASTA files using the SRA toolkit and subjected to a MEGABLAST search against the viral reference database with E value 1e-5 as a cutoff. We used the Burrows–Wheeler Aligner (BWA) software for sequence alignment on the reference virus genome with default parameters (Li and Durbin, 2009).

De novo assembly of SMV genomes

The 79 SMV-associated contigs identified by the BLAST were retrieved by the BLASTCMD program in the standalone BLAST system. To assemble SMV genomes, the identified viral contigs were aligned against the SMV reference genome (NC_002634.1) using ClustalW implemented in the MEGA6 program (Tamura et al., 2013) The nearly complete consensus genome of SMV was manually obtained. Raw data were again aligned on the assembled consensus SMV genome to confirm sequences by BWA. The poly(A) tail at the 3′ end of the assembled SMV genome was removed. We obtained a nearly complete consensus genome for SMV China (accession number NC_002634.1) from soybean transcriptome.

Identification of SNVs in soybean transcriptome

In order to analyze SNVs of SMV China in the soybean transcriptome, the raw data were aligned on the consensus genome of SMV China using the BWA program with default parameters. The aligned SAM files by BWA were converted into BAM files by SAMtools (Li et al., 2009). For SNV calling, we sorted the BAM files and then generated the VCF file format using mpileup (Danecek et al., 2011). BCFtools implemented in SAMtools was finally used to call SNVs. The positions of identified SNVs on the SMV genome were visualized by the Tablet program (Milne et al., 2010).

Construction of phylogenetic trees

In order to reveal phylogenetic relationships of the obtained consensus genome for SMV China with known SMV isolates, we generated three phylogenetic trees. The complete SMV isolate China genome sequence as well as two polyprotein sequences were blasted against NCBI nucleotide and non-redundant protein databases. Best-matched sequences were retrieved for the construction of phylogenetic tree. The obtained sequences were aligned by the ClustalW program with default parameters. After alignment, we deleted unnecessary sequences. The manually edited aligned sequences were subjected to construction of a phylogenetic tree using the MEGA6 program. The phylogenetic tree was constructed by the neighbor-joining method, with 1,000 bootstrap replicates.

Results

De novo soybean transcriptome assembly and identification of viruses in the soybean seeds

We screened available soybean transcriptome data deposited in NCBI’s Sequence Read Archive (SRA) database in order to identify viruses infecting soybean. Of screened soybean transcriptomes, a transcriptome conducting a gene expression profile during soybean seed development contains several virus-associated sequences (accession number SRR1777405) (Song et al., 2013). In order to identify virus-associated contigs, we de novo assembled the transcriptome of soybean using Trinity program, resulting in 116,108 transcripts (contigs) with 710 bp for contig N50 (Table 1). Next, we blasted 116,108 transcripts against the viral reference database. After removing redundant sequences and endogenous viral sequences, we identified 83 contigs-associated with viruses (Table 2). Most contigs (79 contigs) were associated with SMV. The lengths of SMV-associated contigs ranged from 224 to 3,636 nt (Fig. 1A). Four contigs were associated with BPMV, lettuce infectious yellow virus (LICV), lettuce chlorosis virus (LCV), and cucumber mosaic virus (CMV), respectively. The lengths of contigs associated with the four viruses ranged from 232 nt (LCV RNA2) to 1,015 nt (bean common mosaic virus) (Fig. 1A). Other than a contig-associated with LICV (1E-08), virus-associated contigs display reliable E values indicating significance of blast results (Table 2).
Table 1

Summary of de novo soybean transcriptome assembly using Trinity

Accession numberSRR1777405a
Total trinity transcripts116108
Percent GC43.97
Contig N50710 bp
Median contig length428 bp
Average contig580.18 bp
Total assembled bases67363642 bp

We assembled raw data from two different libraries using Trinity program.

The statistics of assembled contigs were calculated by TrinityStats.pl in the Trinity program.

Table 2

Summary of blast results to identify virus-associated contigs

Query idSubject idName of virusIdentity (%)Alignment lengthMismatchesGap opensQuery startQuery endSubject startSubject endE valueBit score
TR2274|c0_g1_i1NC_002634.1Soybean mosaic virus93.132331602234857188033.00E-93342
TR3618|c0_g1_i1NC_002634.1Soybean mosaic virus91.022562301256134215972.00E-94346
TR3618|c0_g1_i2NC_002634.1Soybean mosaic virus90.582762601276134216172.00E-100366
TR3858|c0_g1_i1NC_002634.1Soybean mosaic virus97.3526470126491011732.00E-125449
TR3858|c0_g1_i2NC_002634.1Soybean mosaic virus96.623580123593911731.00E-107390
TR4672|c0_g1_i1NC_002634.1Soybean mosaic virus96.55261901261903692962.00E-120433
TR4672|c0_g1_i2NC_002634.1Soybean mosaic virus97.7261601261903692962.00E-125449
TR5077|c1_g1_i1NC_002634.1Soybean mosaic virus94.192581503260468049379.00E-109394
TR5077|c1_g1_i2NC_002634.1Soybean mosaic virus91.472582203260468049374.00E-97355
TR5102|c0_g1_i1NC_002634.1Soybean mosaic virus91.962241801224755273296.00E-85315
TR5869|c0_g1_i1NC_002634.1Soybean mosaic virus91.982121705216724370326.00E-80298
TR5869|c0_g2_i1NC_002634.1Soybean mosaic virus92.452121605216724370321.00E-81303
TR5869|c0_g3_i1NC_002634.1Soybean mosaic virus92.922121505216724370323.00E-83309
TR5869|c0_g4_i1NC_002634.1Soybean mosaic virus92.922121505216724370323.00E-83309
TR7406|c0_g1_i1NC_002634.1Soybean mosaic virus94.642801501280267729566.00E-121435
TR7406|c0_g1_i2NC_002634.1Soybean mosaic virus92.122411901241267729171.00E-92340
TR7406|c0_g1_i3NC_002634.1Soybean mosaic virus94.162741601274267729505.00E-116418
TR7406|c0_g1_i4NC_002634.1Soybean mosaic virus93.362411601241267729171.00E-97357
TR8100|c0_g1_i1NC_002634.1Soybean mosaic virus97.862345012245606062934.00E-112405
TR9520|c0_g1_i1NC_002634.1Soybean mosaic virus95.063851901385826878842.00E-172606
TR9520|c0_g1_i2NC_002634.1Soybean mosaic virus96.65239804242812278846.00E-110398
TR9520|c0_g1_i3NC_002634.1Soybean mosaic virus94.663561901356826879132.00E-156553
TR9520|c0_g1_i4NC_002634.1Soybean mosaic virus94.383562001356826879139.00E-155547
TR9520|c0_g1_i5NC_002634.1Soybean mosaic virus95.063851901385826878842.00E-172606
TR9520|c0_g1_i6NC_002634.1Soybean mosaic virus96.19210804213812279138.00E-94344
TR9520|c0_g1_i7NC_002634.1Soybean mosaic virus96.883851201385826878840645
TR13605|c0_g1_i1NC_002634.1Soybean mosaic virus92.2540031010409866590648.00E-161568
TR13605|c0_g1_i2NC_002634.1Soybean mosaic virus94.7540021010409866590642.00E-177623
TR15892|c0_g1_i1NC_002634.1Soybean mosaic virus92.642311702232584556152.00E-90333
TR20496|c0_g1_i1NC_002634.1Soybean mosaic virus96.88224701224208718643.00E-103375
TR22770|c0_g1_i1NC_002634.1Soybean mosaic virus91.672402001240641366522.00E-90333
TR22770|c0_g1_i2NC_002634.1Soybean mosaic virus92.532812102282637266522.00E-111403
TR25078|c0_g1_i1NC_002634.1Soybean mosaic virus88.542532901253873084781.00E-82307
TR25078|c0_g2_i1NC_002634.1Soybean mosaic virus94.7224613016261862783822.00E-105383
TR25078|c0_g2_i2NC_002634.1Soybean mosaic virus93.73492201349873083822.00E-147523
TR25078|c0_g2_i3NC_002634.1Soybean mosaic virus95.721878043229856883825.00E-81302
TR25078|c0_g2_i4NC_002634.1Soybean mosaic virus90.912532301253873084781.00E-92340
TR32819|c0_g1_i1NC_002634.1Soybean mosaic virus91.72652202266251522516.00E-101368
TR32819|c0_g2_i1NC_002634.1Soybean mosaic virus92.082652102266251522511.00E-102374
TR34507|c0_g1_i1NC_002634.1Soybean mosaic virus87.273774444378352331491.00E-118427
TR37651|c0_g1_i1NC_002634.1Soybean mosaic virus87.6121824322184106252.00E-65250
TR37651|c0_g3_i1NC_002634.1Soybean mosaic virus87.2748757424874108921.00E-155551
TR37706|c0_g2_i1NC_002634.1Soybean mosaic virus90.512742421273112814009.00E-99361
TR41793|c1_g1_i1NC_002634.1Soybean mosaic virus92.893942801394748378762.00E-162573
TR41793|c1_g1_i2NC_002634.1Soybean mosaic virus93.154382911437748379200641
TR41793|c1_g1_i3NC_002634.1Soybean mosaic virus91.5521318023235748676988.00E-79294
TR41793|c1_g1_i4NC_002634.1Soybean mosaic virus93.934452701445748379270673
TR41793|c1_g1_i5NC_002634.1Soybean mosaic virus91.592261901226747376982.00E-84313
TR41793|c1_g1_i6NC_002634.1Soybean mosaic virus93.034453101445748379270651
TR41793|c1_g1_i7NC_002634.1Soybean mosaic virus91.174193701419747378912.00E-161569
TR44246|c0_g1_i2NC_002634.1Soybean mosaic virus87.9157181872424776332.00E-45183
TR44822|c4_g1_i1NC_002634.1Soybean mosaic virus97.8346010024618433840795
TR44822|c4_g1_i2NC_002634.1Soybean mosaic virus97.6576518027668437901314
TR44822|c4_g1_i3NC_002634.1Soybean mosaic virus97.27622141262384322501051
TR44822|c4_g2_i1NC_002634.1Soybean mosaic virus90.131256122211255199173701631
TR44822|c4_g2_i2NC_002634.1Soybean mosaic virus91.4682070018201918109901127
TR44822|c4_g2_i3NC_002634.1Soybean mosaic virus92.754833501483199115090699
TR44822|c4_g2_i4NC_002634.1Soybean mosaic virus88.672562901256161713622.00E-84313
TR44822|c4_g2_i5NC_002634.1Soybean mosaic virus93.924615019264185316084.00E-102372
TR44822|c4_g2_i6NC_002634.1Soybean mosaic virus94.8123112019249185316239.00E-99361
TR44822|c4_g2_i7NC_002634.1Soybean mosaic virus94.154102401410191815095.00E-178625
TR44822|c5_g1_i1NC_002634.1Soybean mosaic virus95.9899440029955991698401615
TR44822|c5_g1_i2NC_002634.1Soybean mosaic virus94.1135992074235965991958805467
TR44822|c5_g2_i1NC_002634.1Soybean mosaic virus93.32241504227812483476.00E-90331
TR44822|c5_g1_i3NC_002634.1Soybean mosaic virus96.215011902502599164910821
TR44822|c5_g1_i4NC_002634.1Soybean mosaic virus92.812922102293599162821.00E-117424
TR44822|c6_g1_i1NC_002634.1Soybean mosaic virus95.071015500110156049503501598
TR44822|c6_g2_i1NC_002634.1Soybean mosaic virus97.642125010221493047196.00E-100364
TR44822|c6_g2_i2NC_002634.1Soybean mosaic virus95.832401001240505148124.00E-107388
TR44822|c6_g2_i3NC_002634.1Soybean mosaic virus97.521372340113725146377502346
TR44822|c6_g2_i4NC_002634.1Soybean mosaic virus96.592931001293514648542.00E-136486
TR44822|c6_g3_i1NC_002634.1Soybean mosaic virus95.869127256942746205701114
TR44822|c6_g3_i2NC_002634.1Soybean mosaic virus96.6987729028782822194601459
TR44822|c6_g4_i1NC_002634.1Soybean mosaic virus95.391149530111493889274101829
TR44822|c6_g4_i2NC_002634.1Soybean mosaic virus94.893331701333359832665.00E-147521
TR45256|c0_g1_i1NC_002634.1Soybean mosaic virus93.492611704264689766374.00E-107388
TR45256|c0_g1_i2NC_002634.1Soybean mosaic virus94.322291302230686566375.00E-96351
TR47685|c0_g1_i1NC_002634.1Soybean mosaic virus92.532812101281508253622.00E-111403
TR47685|c0_g2_i1NC_002634.1Soybean mosaic virus92.532812101281508253622.00E-111403
TR44246|c0_g1_i1NC_003397.1Bean common mosaic virus81.864086864908944588622.00E-91339
TR19277|c0_g2_i1NC_003617.1Lettuce infectious yellows virus RNA175.34146296467607683766941.00E-0863.9
TR45572|c0_g2_i1NC_012910.1Lettuce chlorosis virus RNA287.9619122115205855583661.00E-57224
TR29303|c0_g1_i1NC_002034.1Cucumber mosaic virus RNA191.282982604301133416311.00E-112407
Fig. 1

De novo assembly of SMV isolate in China using transcriptome data. (A) Size distribution of virus-associated contigs. Red-colored bar indicates SMV-associated contigs. Four viruses with respective contig length were indicated. (B) Alignment of 79 SMV-associated contigs on the assembled genome of SMV isolate in China using BWA program. Black bar indicates the reference SMV genome. Sequence alignment was visualized by Tablet program. (C) Genome organization of SMV isolate in China. The nucleotide positions of two proteins, GP1 and GP2, were indicated.

De novo genome assembly of SMV from a soybean transcriptome

Of identified viruses, SMV was severely infected in the soybean seeds. Fortunately, 79 contigs associated with SMV mostly covered the SMV reference genome (Table 2). A total of 79 contigs associated with SMV were mapped on the SMV reference genome (accession number NC_002634.1) (Eggenberger et al., 1989) (Fig. 1B). After sequence alignment followed by manual modification, we assembled a nearly complete consensus genome of SMV referred as SMV China (Fig. 1C). The SMV China is composed of 9,507 nucleotides (nt) encoding two proteins such as GP1 and GP2. GP1 encodes a polyprotein (nt 54 to 9,254) which is further cleaved into ten mature proteins such as P1 (P1 proteinase), HC-Pro (helper component proteinase), P3 (P3 protein), 6K1 (6K1 protein), CI (cylindrical inclusion), 6K2, NIa-VPg (Nuclear inclusion protein a-genome linked viral protein), NIa-Pro, NIb (nuclear inclusion protein b), and coat protein (CP) while GP2 encodes PIPO (pretty interesting potyviridae ORF) protein (nt 2,804 to 3,031) (Fig. 1C).

Phylogenetic relationships of the SMV isolate China

In order to find genetic relationships of the assembled SMV China with known SMV isolates, we constructed phylogenetic trees. The phylogenetic tree using SMV complete genome sequences showed two groups of SMV isolates (Fig. 2A). The SMV China belongs to group B along with two SMV isolates from South Korea. Using polyprotein sequences, the SMV China in group C was distantly related with other SMV isolates (Fig. 2B). The phylogenetic tree using PIPO protein sequences confirmed that SMV China is a member of SMV belonging to group A, which contains seven viruses including BPMV (Fig. 2C). Based on phylogenetic analyses, it seems that the consensus genome of SMV China is genetically close to the SMV isolates from South Korea.
Fig. 2

Phylogenetic relationship of the assembled SMV isolate China with known SMV isolates. Phylogenetic trees of SMV isolates using complete genomes (A), polyproteins (B), and PIPO sequences (C). The respective genome and protein sequences were blasted against NCBI database and highly matched sequences were used for construction of phylogenetic trees using MEGA6 program using neighbor-joining method with 1000 bootstrap replications. Kimura 2-parameter and Poisson substitution model were used for nucleotide and protein sequences, respectively.

Single nucleotide variations of SMV in the soybean seeds

It is well known that RNA viruses exhibit quasispecies nature, exhibiting several variants in the infected host. Therefore, we examined single nucleotide variations (SNVs) for SMV in the soybean seeds. The identified SMV China was used as a reference. After BWA alignment of raw data against SMV China, SNVs were identified using SAMtools (Fig. 3A). The SNVs in this study was derived from a population of different isolates. As a result, we identified 780 SNVs (Supplementary Table 1). SNVs were evenly distributed along the SMV genome (Fig. 3B). Most SNVs were Single nucleotide polymorphisms (SNPs) except one InDel (CAGG to CAGGAGG) at nt 640 of SMV China (Table S1). Four SNVs, C-U (190 SNVs), U-C (180 SNVs), A-G (168 SNVs), and G-A (155 SNVs), were frequently identified (Fig. 3C). Based on SNV results, the mutation rate for SMV in the soybean seeds was 8.2045%, indicating a high level of mutations for the SMV RNA genome. In addition, we calculated the ratio of Ts/Tv (Transition versus Transversion). The Ts/Tv ratio for SMV China was 8.06 (693/86).
Fig. 3

SNVs of SMV in the soybean seed transcriptome. (A) Raw data were mapped on the genome sequence of SMV isolate China using BWA and visualized by Tablet program. (B) The positions of identified single nucleotide variations on the SMV were visualized by Tablet program. Detailed information for SNVs can be found in Supplementary Table 1. (C) The numbers of identified SNVs of SMV in the soybean seed transcriptome.

The amount of viral RNA in the soybean transcriptome

It might be of interest to examine viral RNAs in the analyzed soybean transcriptome. Of 116,108 contigs, virus-associated contigs account for 0.068% (79 contigs). The length of total assembled contigs was 67,363,642 bp and the total length of virus-associated contigs 36,022 bp, accounting for 0.0535%. The amount of virus-associated reads accounts for 0.0529% (39,403/74,431,152) of reads. Moreover, we calculated SMV copy numbers within the soybean transcriptome resulting in 414 SMV virus copies, which is highly correlated with sequence coverage of SMV genome. This result indicates high variability of SMV genome.

Discussion

Development of NGS provides various DNA as well as RNA sequencing data (Metzker, 2010). The main purposes of DNA and RNA sequencing is elucidation of the genome and transcriptome of target eukaryotic and prokaryotic organisms (Morozova and Marra, 2008). In case of bacteria, metagenomics using 16s rRNA sequences that are highly conserved in bacteria species is intensively performed to study bacterial communities under specific conditions (Wang and Qian, 2009). However, viruses do not have any conserved sequences like bacteria, and genomes of viruses are mostly very small (Edwards and Rohwer, 2005). Therefore, virus-specific sequencing usually requires a purification step for NGS. For example, extraction of double-stranded RNAs from virus-infected organisms followed by NGS is one of the efficient approaches to identify viruses (Yanagisawa et al., 2016). Moreover, sequencing of small RNAs is an alternative technique for virus identification and genome assembly (Vodovar et al., 2011). In addition, RNA-Seq is also a good technique to identify viruses that have a poly(A) tail. However, several recent studies demonstrated that viruses and viroids without a poly(A) tail can be detected by RNA-Seq (Burger and Maree, 2015; Jo et al., 2016). In this study, we identified several viruses infecting soybean. This transcriptome was initially conducted for expression profiling of soybean seed development. Thus, this transcriptome is not derived from a single condition but from six developmental seed stages in which several seeds might be included for total RNA extraction. Although we identified five viruses that might infect soybean, four viruses other than SMV were identified based on only one single contig, and their presence should be validated by other methods. In many cases, the partial viral sequence or contig is homologous to a closely related virus, not the target virus. Thus, it is possible that the identified virus-associated contigs might be not from the infected viruses but from other viruses which share similar viral sequences. SMV is seed-borne and transmitted by aphids (Domier et al., 2011). Soybean seeds infected by SMV often display a discolored and mottled seed. In addition, BCMV is known as a seed-borne virus (Refugee et al., 1987). Seed-borne viruses can be actually infected in embryo, such as BCMV, or carried on the seed coat (Jafarpour et al., 1979). In addition, seed transmission of CMV has been identified in several plants such as pepper, spinach, and lupin (Ali and Kobayashi, 2010; Wylie et al., 1993; Yang et al., 1997). Based on previous knowledge on seed-borne viruses, the identification of SMV, BCMV, and CMV in the soybean seed is not surprising. In addition, the infection of LCV in green bean (Phaseolus vulgaris L.) has been recently reported (Ruiz et al., 2014). However, the infection of LIYV and LCV, which are members in the genus Crinivirus, in the soybean seed should be validated. The soybean transcriptome was derived not from a single soybean seed but from a mixture of soybeans which were further divided into six developmental stages of seeds. The lengths of assembled contigs-associated with SMV in this study might be shorter than virus-associated contigs from a single plant due to the transcriptome containing several variants of SMV. Therefore, the assembled genome of SMV China is a consensus sequence of several SMV variants. Although the portion of SMV-associated sequences accounted for about 0.05% in the total transcriptome, the coverage of SMV genome in this study was about 414, and its coverage was also visualized by the alignment of raw data on the genome of SMV China. As a result, we could de novo assemble SMV genome based on enough sequence data associated with SMV. Based on the assembled SMV genome, we could also identify SNVs for SMV. As we expected, we found several SNVs that resulted from a mixture of SMV infected diverse seed samples. However, we could not reveal the exact number of variants. Furthermore, the identification of SNVs in SMV demonstrated that not a specific region of SMV but several regions of SMV genome were highly mutated. The presence of several SMV variants in the soybean seeds is a very interesting finding, indicating that SMV is highly replicated in the developing seeds; this might be correlated with some disease symptoms in the soybean seeds caused by SMV. It might be of interest to examine replication rates of SMV in different developmental stages and tissues; this could provide evidence of the quasispecies nature of SMV in the near future. Phylogenetic analyses suggested that the identified SMV isolate China was very different from other known SMV isolates based on polypeptide sequences. However, SMV isolate China seems to be highly correlated with two SMV isolates from South Korea, suggesting the phylogenetic correlation between geographical regions and SMV isolates. Our SNV analysis in the soybean seeds indicates a high level of quasispecies nature for SMV. Mutations were not in a specific region but in most regions of SMV genome. Furthermore, we found that A-G and C-U conversions and vice and versa were frequent. Taken together, our bioinformatics analyses using soybean seed transcriptomes identified five viruses infecting the soybean seeds. Of these five viruses, we de novo assembled the genome of SMV isolate China and analyzed SNVs revealing quasispecies nature of SMV in the soybean seeds for the first time. Our approaches and analyses in this study are valuable for the virus-associated studies using NGS-based transcriptome data.
  24 in total

Review 1.  Applications of next-generation sequencing technologies in functional genomics.

Authors:  Olena Morozova; Marco A Marra
Journal:  Genomics       Date:  2008-08-24       Impact factor: 5.736

2.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Authors:  Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2013-10-16       Impact factor: 16.240

3.  In silico reconstruction of viral genomes from small RNAs improves virus-derived small interfering RNA profiling.

Authors:  Nicolas Vodovar; Bertsy Goic; Hervé Blanc; Maria-Carla Saleh
Journal:  J Virol       Date:  2011-08-31       Impact factor: 5.103

Review 4.  Sequencing technologies - the next generation.

Authors:  Michael L Metzker
Journal:  Nat Rev Genet       Date:  2009-12-08       Impact factor: 53.242

5.  Multiple loci condition seed transmission of soybean mosaic virus (SMV) and SMV-induced seed coat mottling in soybean.

Authors:  Leslie L Domier; Houston A Hobbs; Nancy K McCoppin; Charles R Bowen; Todd A Steinlage; Sungyul Chang; Yi Wang; Glen L Hartman
Journal:  Phytopathology       Date:  2011-06       Impact factor: 4.025

Review 6.  Legumes and soybeans: overview of their nutritional profiles and health effects.

Authors:  M J Messina
Journal:  Am J Clin Nutr       Date:  1999-09       Impact factor: 7.045

7.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

8.  Tablet--next generation sequence assembly visualization.

Authors:  Iain Milne; Micha Bayer; Linda Cardle; Paul Shaw; Gordon Stephen; Frank Wright; David Marshall
Journal:  Bioinformatics       Date:  2009-12-04       Impact factor: 6.937

9.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

10.  Soybean GmbZIP123 gene enhances lipid content in the seeds of transgenic Arabidopsis plants.

Authors:  Qing-Xin Song; Qing-Tian Li; Yun-Feng Liu; Feng-Xia Zhang; Biao Ma; Wan-Ke Zhang; Wei-Qun Man; Wei-Guang Du; Guo-Dong Wang; Shou-Yi Chen; Jin-Song Zhang
Journal:  J Exp Bot       Date:  2013-08-20       Impact factor: 6.992

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.