| Literature DB >> 34307686 |
Yanqin Xu1, Shuyun Tian1, Renqing Li1, Xiaofang Huang1, Fengqin Li1, Fei Ge1, Wenzhen Huang1, Yin Zhou2,3.
Abstract
Sarcandra glabra has significant metabolically active bioingredients of pharmaceutical importance. The deficiency of molecular markers for S. glabra is a hindrance in molecular breeding for genetic improvement. In this study, 57.756 million pair-end reads were generated by transcriptome sequencing in S. glabra (Thunb.) Nakai and its subspecies S. glabra ssp. brachystachys. A total of 141,954 unigenes with 646.63 bp average length were assembled. A total of 25,620 simple sequence repeats, 726,476 single nucleotide polymorphisms, and 42,939 insertions and deletions were identified, and the associated unigenes and differentially expressed genes were characterized. This work enhanced the molecular marker resources and will facilitate molecular breeding and gene mining in S. glabra spp.Entities:
Year: 2021 PMID: 34307686 PMCID: PMC8282378 DOI: 10.1155/2021/9990910
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Comparison of key characteristics of Sarcandra glabra (Thunb.) Nakai and S. glabra subsp. brachystachys (Blume) Verdcourt.
| Species |
|
|
|---|---|---|
| Leaf | Leathery, margin sharply coarsely serrate except basally | Papery, margin dully serrate except basally |
| Stamen | Baculate to terete, thecae shorter than connective | Ovoid, thecae almost as long as the connective |
| Stigma | Subcapitate or minutely spotted | Minutely spotted |
| Fruit | Globose, shiny red or yellowish red at maturity | Ovoid, orange-red at maturity |
The summary for Sarcandra glabra de novo transcriptome assembly.
| Sample name | ∗CSH | ∗SG | ||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 1 | 2 | 3 | |
| Raw reads | ||||||
| Total raw reads | 52,327,874 | 50,982,388 | 49,476,364 | 48,520,562 | 57,756,352 | 55,998,238 |
| Total bases | 7,849,181,100 | 7,647,358,200 | 7,421,454,600 | 7,278,084,300 | 8,663,452,800 | 8,399,735,700 |
| GC content | 46.03% | 46.41% | 45.76% | 45.38% | 45.76% | 46.15% |
| Q20 | 97.62% | 97.63% | 97.60% | 97.63% | 97.61% | 97.56% |
| Q30 | 92.72% | 92.74% | 92.66% | 92.69% | 92.64% | 92.54% |
| Clean read | ||||||
| Total reads | 52,317,712 | 50,972,536 | 49,465,478 | 48,512,702 | 57,745,284 | 55,984,978 |
| Total bases | 7,847,656,800 | 7,645,880,400 | 7,419,821,700 | 7,276,905,300 | 8,661,792,600 | 8,397,746,700 |
| GC content | 46.03% | 46.41% | 45.76% | 45.38% | 45.77% | 46.15% |
| Q20 | 97.63% | 97.63% | 97.60% | 97.64% | 97.61% | 97.56% |
| Q30 | 92.72% | 92.74% | 92.67% | 92.70% | 92.64% | 92.54% |
∗CSH stands for Sarcandra glabra ssp. brachystachys, while SG represents Sarcandra glabra (Thunb.) Nakai.
Characteristic descriptive and the functional annotation of de novo transcriptome assembly of Sarcandra glabra.
| Descriptive | Value |
|---|---|
| Total length (bp) | 91,791,960 |
| Total unigene | 141,954 |
| GC contents (%) | 41.91 |
| N50 (bp) | 989 |
| N90 (bp) | 264 |
| Average (bp) | 646.63 |
| Median (bp) | 363 |
| Minimum (bp) | 201 |
| Maximum (bp) | 17087 |
| Contigs of size < 600 bp | 77907 |
| Contigs of size ≥ 600 bp | 22669 |
| Contigs of size ≥ 1000 bp | 15536 |
| Contigs of size ≥ 2,000 bp | 362 |
| ∗Complete BUSCOs | 291 (67.8%) |
| Complete and single-copy BUSCOs | 243 (56.6%) |
| Complete and duplicated BUSCOs | 48 (11.2%) |
| Fragmented BUSCOs | 66 (15.4%) |
| Missing BUSCOs | 72 (16.8%) |
| Total BUSCO groups searched | 429 (100%) |
| Total annotations | 58,436 (41.17%) |
| 1: UniProt | 35,606 (25.08%) |
| 2: Pfam | 28,948 (20.39%) |
| 3: GO | 34,857 (24.56%) |
| 4: KEGG | 21,192 (14.93%) |
| 5: COG pathway | 14,575 (10.27%) |
| 6: EggNOG | 23,086 (16.26%) |
| 7: NR | 56,297 (39.66%) |
∗Complete BUSCOs: the detected gene length within the 95% confidence interval of the average length in the BUSCO homologous group, it may with single or multiple copies, while the incomplete BUSCOs are denoted as fragmented, and undetected BUSCO homologous group is denoted as missing.
Figure 1The clean reads after the sequence assembly: (a) length-based distribution of assembled reads and (b) distribution of N50 on the bases of expression clustering from 0-100.
Figure 2Classification of differentially expressed gene for functional annotations: (a) on the basis of Gene Ontology database (GO) and (b) through pathway significant enrichment database Kyoto Encyclopedia of Genes and Genomes (KEGG).
SSR motif repeat distribution in transcriptome data of Sarcandra glabra.
| Number of repeats | SSR motifs | Compound motifs | Total | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mono | Di | Tri | Tetra | Penta | Hexa | c | c∗ | ||
| 5 | 0 | 0 | 2113 | 224 | 48 | 65 | 2450 | ||
| 6 | 0 | 1721 | 1004 | 62 | 8 | 41 | 2836 | ||
| 7 | 0 | 1085 | 559 | 6 | 7 | 15 | 1672 | ||
| 8 | 0 | 897 | 363 | 7 | 5 | 6 | 1278 | ||
| 9 | 0 | 665 | 99 | 1 | 0 | 1 | 766 | ||
| 10 | 5133 | 505 | 104 | 6 | 0 | 1 | 5,749 | ||
| 11 | 2717 | 1043 | 93 | 0 | 0 | 1 | 3,854 | ||
| 12 | 1808 | 267 | 67 | 0 | 0 | 0 | 2142 | ||
| 13 | 1096 | 116 | 63 | 1 | 0 | 0 | 1276 | ||
| 14 | 880 | 141 | 45 | 0 | 0 | 0 | 1066 | ||
| 15 | 687 | 138 | 31 | 0 | 0 | 0 | 856 | ||
| 16 | 541 | 142 | 35 | 1 | 0 | 0 | 719 | ||
| 17 | 390 | 138 | 11 | 0 | 0 | 0 | 539 | ||
| 18 | 266 | 170 | 4 | 0 | 0 | 0 | 440 | ||
| 19 | 217 | 157 | 6 | 0 | 0 | 0 | 380 | ||
| 20 | 173 | 170 | 2 | 0 | 0 | 0 | 345 | ||
| >20 | 486 | 1,312 | 9 | 0 | 0 | 0 | 1807 | ||
| Total | 14394 | 8667 | 4608 | 308 | 68 | 130 | 3218 | 90 | 31483 |
Figure 3Morphological appearance of (a, c) inflorescence and (b, d) leaves of (a, b) Sarcandra glabra (Thunb.) Nakai and (c, d) subspecies S. glabra ssp. brachystachys.
| Type of SNP variants | Type of InDel variants | ||||
|---|---|---|---|---|---|
| Type | Count | Ratio | Type | Count | Ratio |
| 3′UTR variant | 172,232 | 23.05% | 3′UTR variant | 14,938 | 32.21% |
| 5′UTR premature start codon gain variant | 20,696 | 2.77% | 5′UTR variant | 12,609 | 27.19% |
| 5′UTR variant | 123,713 | 16.56% | Conservative in-frame deletion | 1,195 | 2.58% |
| Initiator codon variant | 94 | 0.01% | Conservative in-frame insertion | 789 | 1.70% |
| Intergenic region | 154,861 | 20.73% | Disruptive in-frame deletion | 1,312 | 2.83% |
| Missense variant | 142,027 | 19.01% | Disruptive in-frame insertion | 871 | 1.88% |
| Splice region variant | 25 | 0.00% | Frame-shift variant | 6,919 | 14.92% |
| Start lost | 520 | 0.07% | Intergenic region | 5,782 | 12.47% |
| Stop gained | 3,865 | 0.52% | Splice region variant | 2 | 0.00% |
| Stop lost | 1,027 | 0.14% | Start lost | 471 | 1.02% |
| Stop retained variant | 318 | 0.04% | Stop gained | 237 | 0.51% |
| Synonymous variant | 127,819 | 17.11% | Stop lost | 1,257 | 2.71% |
| Region wise | Region wise | ||||
|---|---|---|---|---|---|
| Exon | 275,669 | 36.89% | Exon | 11,086 | 24.96% |
| Intergenic | 154,861 | 20.73% | Intergenic | 5,782 | 13.02% |
| Splice sites | 22 | 0.00% | Splice sites | 1 | 0.00% |
| 3′UTR | 172,232 | 23.05% | 3′UTR | 14,938 | 33.63% |
| 5′UTR | 144,409 | 19.33% | 5′UTR | 12,609 | 28.39% |