| Literature DB >> 30400862 |
Chao Hu1,2,3, Hongxing Yang1,2, Kai Jiang1,2, Ling Wang1,2, Boyun Yang4, Tungyu Hsieh1,2,5, Siren Lan6, Weichang Huang7,8,9.
Abstract
BACKGROUND: Calanthe masuca and C. sinica are two genetically closely related species in Orchidaceae. C. masuca is widely distributed in Asia, whereas C. sinica is restricted to Yunnan and Guangxi Provinces in southwest China. Both play important roles in horticulture and are under the pressure of population decline. Understanding their genetic background can greatly help us develop effective conservation strategies for these species. Simple sequence repeats (SSRs) are useful for genetic diversity analysis, presumably providing key information for the study and preservation of the wild populations of the two species we are interested in.Entities:
Keywords: Divergence time; Next-generation sequencing; Polymorphic microsatellite; Population genetics
Mesh:
Substances:
Year: 2018 PMID: 30400862 PMCID: PMC6219035 DOI: 10.1186/s12864-018-5161-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flowers of C. masuca and C. sinica. a The flowers of C. sinica. b The flowers of C. masuca
Summary of the assembled transcripts of C. sinica and C. masuca
|
|
| Transcripts | ||
|---|---|---|---|---|
| Unigenes | Transcripts | Unigenes | ||
| Number of sequences | 40,916 | 50,112 | 71,618 | 90,173 |
| Total nucleotide bases | 28,791,330 | 38,475,376 | 44,788,570 | 64,336,080 |
| GC content (%) | 44.44 | 44.27 | 43.06 | 43.00 |
| Maximum length (bp) | 12,973 | 12,973 | 16,684 | 16,684 |
| Average length (bp) | 703.67 | 767.97 | 625.3 | 713.47 |
| N50 | 1,196 | 1,296 | 1,086 | 1,285 |
Fig. 2Sequence length distribution of the assembled unigenes of C. masuca and C. sinica. The x-axis indicates sequence length, and the y-axis indicates the number of unigenes
The annotation list of C. masuca and C. sinica
| Pfam | KEGG | String | Swiss-Prot | NR | |
|---|---|---|---|---|---|
|
| 13,336 | 9,851 | 8,971 | 15,328 | 23,932 |
| (%) | 32.6 | 16.6 | 24.1 | 37.5 | 58.5 |
|
| 14,932 | 9,717 | 11,088 | 16,767 | 28,037 |
| (%) | 20.8 | 13.6 | 33.5 | 15.5 | 39.1 |
E-value and similarity distribution of unigenes annotated in the NR database
| E-value |
|
|
|---|---|---|
| 0 | 12,736 | 13,980 |
| 0 to 1E-30 | 3,315 | 3,542 |
| 1E-30 to 1E-20 | 2,996 | 3,422 |
| 1E-20 to 1E-10 | 3,281 | 4,499 |
| 1E-10 to 1E-5 | 1,604 | 2,594 |
| Similarity | (n) | (n) |
| 20% to 40% | 4 | 17 |
| 40% to 60% | 1,492 | 2,152 |
| 60% to 80% | 10,309 | 13,113 |
| 80% to 100% | 12,127 | 12,755 |
Fig. 3Clusters of orthologous group classification of C. masuca and C. sinica
Fig. 4Gene ontology (GO) annotations of unigenes. The results are summarized into three main categories: biological processes, cellular components and molecular function
Comparison of enriched GO terms among genes of different classes in C. masuca and C. sinica
| GO category | Gene group | shared | ||
|---|---|---|---|---|
| BP | SP | 20 | 20 | 30 |
| IP | 8 | 13 | 7 | |
| R | 12 | 11 | 20 | |
| ID | 9 | 7 | 9 | |
| SD | 7 | 9 | 11 | |
| MF | SP | 12 | 18 | 32 |
| IP | 3 | 6 | 13 | |
| R | 1 | 4 | 5 | |
| ID | 5 | 4 | 3 | |
| SD | 3 | 4 | 5 | |
| CC | SP | 9 | 10 | 13 |
| IP | 1 | 7 | 3 | |
| R | 1 | 2 | 2 | |
| ID | 3 | 1 | 2 | |
| SD | 3 | 0 | 2 |
Abbreviations: BP biological process, MF molecular function, CC cellular component. gene classes, SP strong purifying selection (Ks ≤ 0.5), IP intermediate purifying selection (0.5 < Ks ≤ 0.9), R relaxed or nearly relaxed selection (0.9 < Ks ≤ 1.1), ID intermediate Darwinian selection (1.1 < Ks ≤ 1.5), SD, strong Darwinian selection (Ks > 1.5). GO enrichment results were obtained using the R package topGO
Fig. 5Determination of divergence time between C. masuca and C. sinica. The distribution of minimal synonymous substitutions per synonymous substation sites (Ks) of the orthologous gene groups between C. masuca and C. sinica (a) and between Phalaenopsis equestris and Dendrobium catenatum (b). The divergence time between P. equestris and D. catenatum determined in the publication [37] was used as real-time scale proportional to the measure of Ks
Fig. 6The distribution of SSR type in the C. masuca and C. sinica transcriptomes. The SSR repeat type of mononucleotides was excluded
Fig. 7The distribution of polymorphism information content (PIC)