| Literature DB >> 31010109 |
Rui Zhang1,2, Fa-Guo Wang3, Jiao Zhang4,5, Hui Shang6,7, Li Liu8,9, Hao Wang10,11, Guo-Hua Zhao12,13, Hui Shen14,15, Yue-Hong Yan16,17.
Abstract
Whole-genome duplications (WGDs) are widespread in plants and frequently coincide with global climatic change events, such as the Cretaceous-Tertiary (KT) extinction event approximately 65 million years ago (mya). Ferns have larger genomes and higher chromosome numbers than seed plants, which likely resulted from multiple rounds of polyploidy. Here, we use diploid and triploid material from a model fern species, Ceratopteris thalictroides, for the detection of WGDs. High-quality RNA-seq data was used to infer the number of synonymous substitutions per synonymous site (Ks) between paralogs; Ks age distribution and absolute dating approach were used to determine the age of WGD events. Evidence of an ancient WGD event with a Ks peak value of approximately 1.2 was obtained for both samples; however, the Ks frequency distributions varied significantly. Importantly, we dated the WGD event at 51-53 mya, which coincides with the Paleocene-Eocene Thermal Maximum (PETM), when the Earth became warmer and wetter than any other period during the Cenozoic. Duplicate genes were preferentially retained for specific functions, such as environment response, further support that the duplicates may have promoted quick adaption to environmental changes and potentially resulted in evolutionary success, especially for pantropical species, such as C. thalictroides, which exhibits higher temperature tolerance.Entities:
Keywords: Ceratopteris thalictroides; evolution; synonymous substitutions; transcriptome; whole genome duplication
Mesh:
Substances:
Year: 2019 PMID: 31010109 PMCID: PMC6515051 DOI: 10.3390/ijms20081926
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Chromosomes of C. thalictroides in mitotic root-tip cells (scale bars=20 μm). (A) Metaphase chromosome of the diploid, 2n = 78; (B) Lined drawing of Figure A; (C) Metaphase chromosome of the triploid, 2n = 117; (D) Lined drawing of Figure C.
A summary of the sequencing and assembly for diploid and triploid samples of C. thalictroides and seven other fern species.
| Species | Total Reads (Clean) | Number of Contigs | Total Number of Unigenes | N50 (bp) | Mean Length (bp) |
|---|---|---|---|---|---|
| Triploid a | 35,528,634 | 69,929 | 60,823 | 787 | 576.00 |
| Diploid b | 31,741,082 | 74,728 | 83,202 | 1610 | 912.26 |
|
| 38,786,214 | 54,152 | 58,494 | 1663 | 951.92 |
|
| 40,967,322 | 69,931 | 74,564 | 1557 | 859.72 |
|
| 45,618,446 | 84,813 | 89,185 | 1582 | 831.56 |
|
| 51,851,066 | 49,449 | 52,782 | 1727 | 1012.63 |
|
| 43,422,574 | 46,189 | 50,594 | 1729 | 1043.2 |
|
| 46,808,646 | 113,778 | 130,549 | 1521 | 845.96 |
|
| 48,768,608 | 66,254 | 72,404 | 1580 | 904.62 |
Note: a refer to the triploid C. thalictroides [28], b refer to the diploid C. thalictroides and the other seven fern species [29].
BUSCO results (genome completeness) for diploid and triploid samples of C. thalictroides.
| Species | BUSCO Notation Assessment Results |
|---|---|
| Diploid | C: 63.7% [S:39.7%, D:24%], F:5.9%, M: 30.4%, n: 1440 |
| Triploid | C: 34.3% [S:30.5%, D:3.8%], F:12.2%, M: 53.5%, n: 1440 |
BUSCO was used to assess the transcriptome data quality with 1440 conservative orthologs in plant species as reference. Abbreviation: C—Complete Single-Copy BUSCOs; S—Complete and Single-Copy BUSCOs; D—Complete Duplicated BUSCOs; F—Fragmented BUSCOs; M—Missing BUSCOs; N—Total BUSCO groups searched.
Figure 2Frequency distributions of Ks values based on paralogous pairs of C. thalictroides. (A,C) Distributions of Ks values pairs of the diploid and triploid within 5. The x-axis represents the synonymous substitutions with a Ks cutoff of five in bins of 0.1, and the y-axis shows the number of retained duplicated paralogous gene pairs. (B,D) Mclust Gaussian mixture model analysis of (A,C), respectively. Optimal number of log-normal components overlaid on Ks distributions. The red line shows the sum of components.
Mixture modeling of the age distribution of C. thalictroides presented in Figure 2.
| No. of Duplicates | No. of Components | Bayesian Information Criterion | Mixture Means ( | Variance ( | Proportion |
|---|---|---|---|---|---|
| Diploid | |||||
| 8364 | 7 | 5998.011 | 0.128 | 0.0004 | 0.091 |
| 8364 | 7 | 5998.011 | 0.238 | 0.0048 | 0.074 |
| 8364 | 7 | 5998.011 | 0.499 | 0.0248 | 0.079 |
| 8364 | 7 | 5998.011 | 1.148 | 0.0461 | 0.278 |
| 8364 | 7 | 5998.011 | 1.526 | 0.1619 | 0.265 |
| 8364 | 7 | 5998.011 | 2.989 | 0.6015 | 0.176 |
| 8364 | 7 | 5998.011 | 4.561 | 0.0790 | 0.036 |
| Triploid | |||||
| 3088 | 5 | 3380.834 | 0.154 | 0.0016 | 0.075 |
| 3088 | 5 | 3380.834 | 0.338 | 0.0135 | 0.066 |
| 3088 | 5 | 3380.834 | 1.199 | 0.1591 | 0.464 |
| 3088 | 5 | 3380.834 | 2.268 | 0.5133 | 0.282 |
| 3088 | 5 | 3380.834 | 4.019 | 0.2962 | 0.112 |
Figure 3Absolute age distributions for the peak-based duplicates of C. thalictroides compared to the global climate changes during the Cenozoic. The vertical gray solid line represents its peak, viewed as the WGD age estimate, and the vertical gray dashed lines corresponded to 95% confidence intervals on the WGD age estimate. The parameters of statistically significant components identified using mclust were 52 mya ago and 0.76, which represent the inferred date and proportion, respectively. The global climate curve at the top comes from Zachos et al. (2001) with permission from AAAS [30].
Figure 4Phylogenetic tree of type II MADS-box proteins using MrBayes 3. The different clades are indicated by different colors, and C. thalictroides gene names are given in red, except for the four retained duplicates, denoted with blue, following WGD. Posterior probabilities are also indicated on the branches. The plant species included are as follows: Arabidopsis, Oryza sativa (Os), Selaginella moellendorffii (Sm), Physcomitrella patens (Pp), Chara globularis (Cg), and C. thalictroides (Ct).