| Literature DB >> 31694521 |
Fang-Dong Li1,2, Wei Tong1, En-Hua Xia3, Chao-Ling Wei4.
Abstract
BACKGROUND: Tea is the oldest and among the world's most popular non-alcoholic beverages, which has important economic, health and cultural values. Tea is commonly produced from the leaves of tea plants (Camellia sinensis), which belong to the genus Camellia of family Theaceae. In the last decade, many studies have generated the transcriptomes of tea plants at different developmental stages or under abiotic and/or biotic stresses to investigate the genetic basis of secondary metabolites that determine tea quality. However, these results exhibited large differences, particularly in the total number of reconstructed transcripts and the quality of the assembled transcriptomes. These differences largely result from limited knowledge regarding the optimized sequencing depth and assembler for transcriptome assembly of structurally complex plant species genomes.Entities:
Keywords: Camellia sinensis; Sequencing depth; Tea plant; Transcriptome; de novo assembly
Mesh:
Substances:
Year: 2019 PMID: 31694521 PMCID: PMC6836513 DOI: 10.1186/s12859-019-3166-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics of the transcriptome assemblies using five representative assemblers with 32 Gb sequencing data
| Assembler | No. transcript | Total length (bp) | Maximum transcript (bp) | Average length (bp) | N50 (bp) |
|---|---|---|---|---|---|
| BinPacker | 423,768 | 368,682,397 | 21,031 | 870.01 | 1348 |
| Bridger | 380,605 | 364,039,338 | 22,041 | 956.48 | 1539 |
| SOAP | 238,364 | 99,912,984 | 14,281 | 419.16 | 443 |
| Trans-ABySS | 404,455 | 191,731,319 | 17,014 | 474.05 | 527 |
| Trinity | 404,125 | 225,905,337 | 14,565 | 559 | 701 |
Fig. 1Overview of the transcriptome assemblies using five state-of-the-art assemblers. a Length distributions. b BUSCO completeness assessment. M: Missing BUSCOs; F: Fragmented BUSCOs; C: Complete BUSCOs
Fig. 2Quality evaluation of transcriptome assemblies using genome alignment. a The total number of constructed full-length transcripts. b Percentage of the transcripts with sequence identity ≥50%
Fig. 3Full-length genes reconstructed by each assembler at different expression quintiles
Summary of the Bridger assemblies using different k-mer values
| No. transcripts | Maximum length (bp) | Average length (bp) | N50 (bp) | BUSCOa | |||
|---|---|---|---|---|---|---|---|
| C | F | M | |||||
| 19 | 273,893 | 27,833 | 1003.51 | 1531 | 56.10% | 22.30% | 21.60% |
| 21 | 350,330 | 28,766 | 987.98 | 1554 | 86.70% | 6.70% | 6.60% |
| 23 | 370,677 | 18,579 | 948.81 | 1524 | 91.20% | 4.40% | 4.40% |
| 25 | 380,605 | 22,041 | 956.48 | 1539 | 92.50% | 3.80% | 3.70% |
| 27 | 383,138 | 20,166 | 951.48 | 1524 | 93.00% | 3.30% | 3.70% |
| 29 | 383,689 | 19,347 | 950.33 | 1513 | 86.70% | 6.70% | 6.60% |
| 32 | 390,761 | 19,996 | 896.6 | 1430 | 92.70% | 3.30% | 4.00% |
aM Missing BUSCOs, F Fragmented BUSCOs, C Complete BUSCOs
Fig. 4Statistic of the transcriptome assemblies using Bridger with different amount of sequencing data from replicate 1. a Total number and sequence length of the transcripts. The x-axis represents the total amount of the sequencing data ranging from 4 to 84 Gb; b BUSCO evaluation of the completeness of transcriptome assemblies. The x-axis indicates the percentage of each type of BUSCO, while the y-axis displays the transcriptome assembled using different amount of sequencing data. M: Missing BUSCOs; F: Fragmented BUSCOs; C: Complete BUSCOs