| Literature DB >> 28475810 |
Yuanyuan Fu1,2,3, Liangwei Li2,3, Shijie Hao2, Rui Guan2,3, Guangyi Fan2,3,4, Chengcheng Shi2, Haibo Wan2,3, Wenbin Chen2, He Zhang2,3, Guocheng Liu2, Jihua Wang5, Lulin Ma5, Jianling You6, Xuemei Ni2, Zhen Yue2, Xun Xu2, Xiao Sun1, Xin Liu2, Simon Ming-Yuen Lee4.
Abstract
Rhodiola crenulata, a well-known medicinal Tibetan herb, is mainly grown in high-altitude regions of the Tibet, Yunnan, and Sichuan provinces in China. In the past few years, increasing numbers of studies have been published on the potential pharmacological activities of R. crenulata, strengthening our understanding into its putitive active ingredient composition, pharmacological activity, and mechanism of action. These findings also provide strong evidence supporting the important medicinal and economical value of R. crenulata. Consequently, some Rhodiola species are becoming endangered because of overexploitation and environmental destruction. However, little is known about the genetic and genomic information of any Rhodiola species. Here we report the first draft assembly ofthe R. crenulata genome, which was 344.5 Mb (25.7 Mb Ns), accounting for 82% of the estimated genome size, with a scaffold N50 length of 144.7 kb and a contig N50 length of 25.4 kb. The R. crenulata genome is not only highly heterozygous but also highly repetitive, with ratios of 1.12% and 66.15%, respectively, based on the k-mer analysis. Furthermore, 226.6 Mb of transposable elements were detected, of which 77.03% were long terminal repeats. In total, 31 517 protein-coding genes were identified, capturing 86.72% of expected plant genes in BUSCO. Additionally, 79.73% of protein-coding genes were functionally annotated. R. crenulata is an important medicinal plant and also a potentially interesting model species for studying the adaptability of Rhodiola species to extreme environments. The genomic sequences of R. crenulata will be useful for understanding the evolutionary mechanism of the stress resistance gene and the biosynthesis pathways of the different medicinal ingredients, for example, salidroside in R. crenulata.Entities:
Keywords: Rhodiola crenulata; annotation; genome assembly; genomics
Mesh:
Substances:
Year: 2017 PMID: 28475810 PMCID: PMC5530320 DOI: 10.1093/gigascience/gix033
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Example of R. crenulata (image from Shifeng Li).
Statistics of the final assembly using Platanus and Gapcloser.
| Type | Scaffold | Contig |
|---|---|---|
| Total number | 150 003 | 161 878 |
| Total length (bp) | 344 513 827 | 318 807 120 |
| N50 length (bp) | 144 749 | 25 360 |
| N90 length (bp) | 1003 | 877 |
| Max length (bp) | 1 309 315 | 300 573 |
| GC content (%) | 39.68 | 39.68 |
Figure 2:An overview of the annotation workflow. The workflow begins with assembled genomic sequences, and it produces results of the repeat annotation, protein-coding gene prediction, and functional annotation. (a) Repeat annotation: repeats in the genome are detected in two different methods: de novo and homolog based. In the de novo method, RepeatScout, LTR-FINDER, and RepeatModeler are used to build de novo repeat libraries and further classified by RepeatMasker. In the homolog-based method, RepeatMasker and RepeatProteinMask are performed to search TEs by aligning sequences against existing libraries. (b) Gene prediction: before the gene prediction, TEs are totally masked. Augustus and GlimmerHMM are used to perform de novo prediction; BLAT and GeneWise are executed to predict gene models based on homologous protein sequences. (c) GLEAN is performed to obtain a consensus gene set. (d) In combination with the clean RNA sequenced reads, a more comprehensive gene set is integrated finally. (e) Estimation of the completeness of the gene set using BUSCO. (f) Functional annotation.
Figure 3:Summary statistics of the repeats and gene models. (a) The lengths of different types of TEs and proportions in the genome. LTR is the most predominant element. (b) The numbers of predicted genes and average lengths of CDS, exon, and intron predicted in different methods. The green, blue, and purple bars represent the CDS, exon, and intron, respectively. The gene numbers in each de novo or homolog-based method are listed in parentheses.
Statistics of the BUSCO assessment.
| Gene Set | Assembly | |||
|---|---|---|---|---|
| Types of BUSCOs | Number | Percentage | Number | Percentage |
| Complete single-copy BUSCOs | 829 | 86.72 | 876 | 91.63 |
| Fragmented BUSCOs | 37 | 3.87 | 35 | 3.66 |
| Missing BUSCOs | 90 | 9.41 | 45 | 4.71 |
| Total BUSCO groups searched | 956 | 100 | 956 | 100 |