| Literature DB >> 32343733 |
Mohammad Sadat-Hosseini1,2, Mohammad Reza Bakhtiarizadeh3, Naser Boroomand4, Masoud Tohidfar5, Kourosh Vahdati1.
Abstract
Transcriptome resources can facilitate to increase yield and quality of walnuts. Finding the best transcriptome assembly has not been the subject of walnuts research as yet. This research generated 240,179,782 reads from 11 walnut leaves according to cDNA libraries. The reads provided a complete de novo transcriptome assembly. Fifteen different transcriptome assemblies were constructed from five different well-known assemblers used in scientific literature with different k-mer lengths (Bridger, BinPacker, SOAPdenovo-Trans, Trinity and SPAdes) as well as two merging approaches (EvidentialGene and Transfuse). Based on the four quality metrics of assembly, the results indicated an efficiency in the process of merging the assemblies after being generated by de novo assemblers. Finally, EvidentialGene was recognized as the best assembler for the de novo assembly of the leaf transcriptome in walnut. Among a total number of 183,191 transcripts which were generated by EvidentialGene, there were 109,413 transcripts capable of protein potential (59.72%) and 104,926 were recognized as ORFs (57.27%). In addition, 79,185 transcripts were predicted to exist with at least one hit to the Pfam database. A number of 3,931 transcription factors were identified by BLAST searching against PlnTFDB. Furthermore, 6,591 of the predicted peptide sequences contained signaling peptides, while 92,704 contained transmembrane domains. Comparison of the assembled transcripts with transcripts of the walnut and published genome assembly for the 'Chandler' cultivar using the BLAST algorithm led to identify a total number of 27,304 and 19,178 homologue transcripts, respectively. De novo transcriptomes in walnut leaves can be developed for the future studies in functional genomics and genetic studies of walnuts.Entities:
Year: 2020 PMID: 32343733 PMCID: PMC7188282 DOI: 10.1371/journal.pone.0232005
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of transcriptome sequencing of J. regia obtained from Illumina HiSeq-2000 platform.
QC (Quality control).
| Sample | Raw result | Trimming results | %GC | Deletion (%) |
|---|---|---|---|---|
| Total Sequences | Total Sequences | |||
| 1 | 22326005 | 21434277 | 45 | 3.99 |
| 2 | 22371442 | 21554690 | 46 | 3.65 |
| 3 | 22020370 | 21032721 | 46 | 4.48 |
| 4 | 19106437 | 18278891 | 46 | 4.33 |
| 5 | 22344235 | 21550592 | 46 | 3.55 |
| 6 | 21758291 | 20881689 | 46 | 4.02 |
| 7 | 21637929 | 20755888 | 46 | 4.07 |
| 8 | 22179079 | 21452220 | 45 | 3.27 |
| 9 | 21804304 | 20912738 | 46 | 4.08 |
| 10 | 22187321 | 21514931 | 46 | 3.03 |
| 11 | 22444369 | 21749718 | 44 | 3.09 |
| 240179782 | 231118355 | 45.63 | 3.77 |
Fig 1Diagram of the workflow for the walnut leaf transcriptome sequencing, de novo assembly and functional annotation.
First, mRNA was extracted from leaves of J. regia, followed by cDNA preparation and construction of the library. Sequencing was done using a paired-end strategy (read length: 150 bp) on an Illumina HiSeq 2000 platform. After quality control and trimming, the de novo assembly was constructed via BinPaker, Bridger, SOAPdenovo-Trans, Trinity, SPAdes, EvidentialGene and Transfuse. Then, CAP3 was used for producing longer consensus transcripts and for reducing the redundancy of contigs obtained via all assemblers. The quality of a de novo assembled leaf transcriptome was then evaluated by N50 length, the total number of contigs, the number of reads mapping back to the transcriptome (RMBT) and BUSCOs. Finally, the best performing assembly was annotated using different databases, including the UniProtKB database, Pfam database, Signal peptide, ORFs detection, NCBI non-redundant (nr) protein database and the transmembrane domain.
Statistics of the leaf transcriptome pre-assemblies and the final de novo assembly in self rooted Persian walnut ‘Chandler’.
| Assembler | BinPacker | Bridger | SOAPdenovo-Trans | SPAdes | Trinity | Transfuse | Evidential Gene | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| k-mer | 25 | 32 | 25 | 32 | 25 | 31 | 41 | 51 | 61 | 71 | 25 | 32 | |||
| NS | 185380 854 | 182751 763 | 157007 643 | 160526 566 | 57999046 | 58620636 | 58610033 | 55903363 | 50016501 | 43360309 | 141061136 | 289843659 | 295334827 | 536477927 | 193422472 |
| MCL (bp) | 584 | 581 | 447 | 441 | 160 | 177 | 191 | 206 | 237 | 294 | 428 | 673 | 723 | 854 | 602 |
| ALC (bp) | 1073.7 | 1072.7 | 925.7 | 927.7 | 247.4 | 284.5 | 326.8 | 353.5 | 405.6 | 488.4 | 911.7 | 1130.4 | 1194.6 | 1282.1 | 1055.9 |
| N50 (bp) | 1966 | 1967 | 1838 | 1854 | 343 | 394 | 436 | 510 | 600 | 736 | 1751 | 1981 | 2104 | 2151 | 1831 |
| TG | 132929 | 135394 | 142382 | 142382 | 234442 | 206075 | 179386 | 158125 | 123329 | 88787 | 133757 | 129613 | 129020 | 379406 | 183191 |
| TT | 172656 | 170356 | 169608 | 173037 | 234442 | 206075 | 179386 | 158125 | 123329 | 88787 | 154730 | 256396 | 247229 | 418444 | 183191 |
| BUSC Os (%) | 91.3 | 92.3 | 90.9 | 91.9 | 58.2 | 58.9 | 59.9 | 60.0 | 60.0 | 59.8 | 85.9 | 88.2 | 90.0 | 94.8 | 94.3 |
NS (Number of sequences), MCL (Median contig length), ALC (Average length contig) TG (Total gene), TT (Total transcripts).
Fig 2The distribution of contig size and number for all assemblies.
A) BinPaker (k-mer; 25 and 32), B) B) Bridger C) SPAdes, D) EvidentialGene, E) Transfuse, F) Trinity (k-mer; 25 and 32), G) SOAP deNOVO-Trans (k-mer; 25, 31, 41, 51, 61 and 71).
Fig 3N50 index for each assembler with the k-mer size of walnut leaf transcriptome.
Percentage of reads mapped back to the walnut leaf transcriptome (RMBT).
| Assembler | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample | BinPacker/25 | BinPacker/32 | Bridger/25 | Bridger/32 | Evidentialgene | SOAPdenovo-Trans/25 | SOAPdenovo-Trans/31 | SOAPdenovo-Trans/41 | SOAPdenovo-Trans/51 | SOAPdenovo-Trans/61 | SOAPdenovo-Trans/71 | SPAdes | Transfuse | Trinity/25 | Trinity/32 |
| 94.04% | 96.72% | 94.05% | 96.74% | 95.57% | 75.78% | 83.74% | 90.60% | 92.80% | 93.06% | 93.62% | 97.88% | 99.64% | 98.36% | 98.91% | |
| 94.27% | 96.80% | 94.29% | 96.84% | 95.81% | 74.58% | 82.48% | 89.10% | 91.39% | 92.13% | 93.04% | 98.02% | 99.50% | 98.30% | 98.92% | |
| 94.26% | 96.65% | 94.27% | 96.68% | 96.19% | 75.24% | 82.43% | 88.53% | 90.62% | 91.25% | 91.90% | 97.92% | 99.39% | 98.19% | 98.75% | |
| 94.30% | 96.72% | 94.31% | 96.74% | 96.36% | 75.33% | 82.50% | 88.55% | 90.67% | 91.33% | 92.02% | 97.97% | 99.39% | 98.16% | 98.77% | |
| 94.33% | 96.91% | 94.35% | 96.93% | 95.88% | 73.02% | 81.34% | 88.40% | 90.76% | 91.51% | 92.35% | 97.98% | 99.51% | 98.34% | 98.92% | |
| 94.32% | 96.77% | 94.33% | 96.79% | 96.45% | 75.15% | 82.49% | 88.69% | 90.91% | 91.56% | 92.25% | 98.04% | 99.47% | 98.22% | 98.85% | |
| 94.14% | 96.71% | 94.16% | 96.72% | 96.34% | 74.54% | 82.04% | 88.49% | 90.72% | 91.36% | 92.05% | 97.98% | 99.48% | 98.35% | 98.87% | |
| 93.76% | 96.43% | 93.80% | 96.46% | 95.75% | 72.20% | 80.02% | 86.80% | 89.09% | 89.71% | 90.37% | 97.98% | 99.50% | 98.09% | 98.84% | |
| 93.98% | 96.66% | 94.01% | 96.67% | 96.50% | 73.08% | 80.15% | 86.41% | 88.40% | 88.68% | 89.24% | 97.87% | 99.41% | 98.30% | 98.68% | |
| 94.14% | 96.70% | 94.16% | 96.71% | 95.67% | 74.17% | 81.70% | 88.41% | 90.56% | 91.21% | 91.92% | 98.01% | 99.52% | 98.12% | 98.88% | |
| 93.63% | 96.80% | 93.67% | 96.79% | 96.12% | 69.18% | 76.23% | 82.31% | 84.25% | 83.99% | 84.78% | 97.72% | 99.50% | 98.45% | 98.73% | |
| 94.11% | 96.72% | 94.13% | 96.73% | 96.06% | 73.84% | 81.37% | 87.84% | 90.02% | 90.53% | 91.23% | 97.94% | 99.48% | 98.26% | 98.83% |
Fig 4Identifying BUSCOs in each assembler with different k-mer sizes of walnut leaf transcriptome.
Functional annotation summary of walnut leaf transcriptome.
| Category | No. of transcripts) |
|---|---|
| Total transcripts | 183,191 |
| Predicted proteins | 109,413 |
| Predicted ORFs | 104,926 |
| Pfam | 79,185 |
| SignalP | 6,591 |
| Rfam | 882 |
| Transcription factor | 3931 |
| ATNDW | 27,304 |
| ATGAPCh | 19,178 |
*Assembled transcripts searched against all nucleotides of walnut
** Assembled transcripts searched against the published ‘Chandler’ genome assembly