| Literature DB >> 28903461 |
Bo Song1,2,3, David Morse4, Yue Song3,5, Yuan Fu3,5, Xin Lin6, Wenliang Wang3,5, Shifeng Cheng3,5, Wenbin Chen3,5, Xin Liu3,5, Senjie Lin6,7.
Abstract
Gene retroposition is an important mechanism of genome evolution but the role it plays in dinoflagellates, a critical player in marine ecosystems, is not known. Until recently, when the genomes of two coral-symbiotic dinoflagellate genomes, Symbiodinium kawagutii and S. minutum, were released, it has not been possible to systematically study these retrogenes. Here we examine the abundant retrogenes (∼23% of the total genes) in these species. The hallmark of retrogenes in the genome is the presence of DCCGTAGCCATTTTGGCTCAAG, a spliced leader (DinoSL) constitutively trans-spliced to the 5'-end of all nucleus-encoded mRNAs. Although the retrogenes have often lost part of the 22-nt DinoSL, the putative promoter motif from the DinoSL, TTT(T/G), is consistently retained in the upstream region of these genes, providing an explanation for the high survival rate of retrogenes in dinoflagellates. Our analysis of DinoSL sequence divergence revealed two major bursts of retroposition in the evolutionary history of Symbiodinium, occurring at ∼60 and ∼6 Ma. Reconstruction of the evolutionary trajectory of the Symbiodinium genomes mapped these 2 times to the origin and rapid radiation of this dinoflagellate lineage, respectively. GO analysis revealed differential functional enrichment of the retrogenes between the two episodes, with a broad impact on transport in the first bout and more localized influence on symbiosis-related processes such as cell adhesion in the second bout. This study provides the first evidence of large-scale retroposition as a major mechanism of genome evolution for any organism and sheds light on evolution of coral symbiosis.Entities:
Keywords: Symbiodinium; dinoflagellate; genome evolution; retrogene; spliced leader
Mesh:
Substances:
Year: 2017 PMID: 28903461 PMCID: PMC5585692 DOI: 10.1093/gbe/evx144
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Number of Symbiodinium Genes Harboring Different Number of DinoSL Relicts
| Number of DinoSLs Relicts | ||
|---|---|---|
| 1 | 7,452 | 8,327 |
| 2 | 1,009 | 921 |
| 3 | 91 | 82 |
| 4 | 7 | 6 |
| 5 | 1 | 2 |
| 6 | 3 | 0 |
| 7 | 1 | 0 |
| 8 | 0 | 1 |
| Total | 8,564 | 9,339 |
Fig. 1.—Retroposition in Symbiodinium genome sequences retaining DinoSL (CCGTAGCCATTTTGGCTCAAG) motifs. (A) Alignment of a retrogene (Skav200717) and its parent (Skav216614), both of which have undergone repeated retroposition events. Red colored sequences show DinoSL relicts. (B) Distance between retrogenes and their parents in S. kawagutii (purple line) and S. minutum (cyan line). (C) DinoSL motif “TTTT,” “TTTG,” and “TTGG” are preferentially retained in all the detected DinoSL relicts in both S. kawagutii (purple bars) and S. minutum (cyan bars). (D) Identities of DinoSL relicts of S. kawagutii (purple line) and S. minutum (cyan line) as a function of upstream distance from the start codon. Higher identities of DinoSLs are seen between −50 and −100 bp where the promoter is expected to be located.
Feature Comparison Between Retroposed and Normal Genes in Symbiodinium Genomes
| Mean | Statistic Significance | Mean | Statistic Significance | |||||
|---|---|---|---|---|---|---|---|---|
| Retrogene | Normal Gene | Cohn’s d | Retrogene | Normal Gene | Cohn’s d | |||
| No of Introns | 3.74 | 4.17 | 2.20E-16 | 0.11 | 16.74 | 17.1 | 0.06 | 0.02 |
| Length of CDS | 1108.56 | 1021.17 | 4.68E-10 | 0.08 | 1705.54 | 1642.87 | 3.33E-03 | 0.03 |
| GC of CDS | 49.67 | 48.92 | 2.20E-16 | 0.18 | 45.2 | 44.83 | 2.20E-16 | 0.11 |
| Complexity of CDS | 1.45 | 1.44 | 2.20E-16 | 0.18 | 1.45 | 1.44 | 6.91E-15 | 0.11 |
| RPKM | 0.51 | 0.36 | 2.20E-16 | 0.2 | 0.03 | 0.03 | 3.08E-10 | 0.08 |
| Solubility | 0.51 | 0.51 | 0.97 | 0 | 0.51 | 0.51 | 0.02 | 0.03 |
P values become misleadingly low when the sample size is sufficiently large giving a false indication of significance. Therefore, we also calculated the effect size measure Cohn’s d to determine the difference between groups. Cohn’s d of 0.2∼0.5 indicates small, 0.5∼0.8 indicates medium and >0.8 indicates large differences (Sullivan and Feinn 2012).
Fig. 2.—Two major retroposition events revealed by Ks frequencies of retrogenes in (A)Symbiodinium kawagutii (Ks = 3.02 and 0.47) and (B)S. minutum (Ks = 3.17 and 0.18). (C) Ks frequencies in a comparison of S. kawagutii versus S. minutum (Ks = 1.65). (D) Phylogenetic tree and age of different Symbiodinium strains (adapted from Pochon etal. 2006) named by their coral hosts. The empty box at each node represents ± one standard deviation around divergence age.
Number of Retrogenes in the Two Major Retroposition Bouts and the Correlation Between the Number of Family Members and Their Current Expression Levels
| Bout | ||||
|---|---|---|---|---|
| No. of Genes (Family) | Rho ( | No. of Genes (Family) | Rho ( | |
| 1 | 195 (88) | −0.06 (0.63) | 214 (145) | 0.11 (0.14) |
| 2 | 648 (361) | 0.24 (0.0001) | 385 (202) | 0.30 (0.0005) |
Number of genes that can be accurately assigned into these two events based on Ks values; numbers in parentheses depict the number of families these genes were clustered into.
Rho, the Spearman’s rank correlation coefficient between the copy number of retrogenes and their expression levels (RPKM) as recently measured for genes in each family (Lin etal. 2015).
Fig. 3.—Differential enrichment of sequences in biological processes during the first and second bout of retroposition. The enrichment P value is shown for GO processes at (A) the first or (B) the second episode only, or (C) for GO processes shared by both. Color scale represents the −log10 value of P values, with purple cells indicating higher enrichment. More details are shown in supplementary table S4, Supplementary Material online.
Fig. 4.—The retroposition process in dinoflagellate. (A) Gene retroposition process inferred from the characteristics of retrogenes. A gene transcript (step 1) is trans-spliced through which the DinoSL sequence is added to the 5′ terminus of a nascent pre-mRNA (step 2), followed by cis-splicing (step 3) before they were exported to cytoplasm for translation (step 4); Reverse transcription and genome integration (step 5) can occur at either of these steps inside nucleus (step 1, 2, 3 and 4); in dinoflagellates, the DinoSL sequence harboring core promoter motif “TTTT” and “TTTG” potentially serves as a promoter enabling the survival of retrogenes; therefore, most retrocopies stemmed from nascent transcripts (step 1) would be dead upon their birth due to the lack of promoter; the whole procedure is limited in the nucleus. (B) A scheme showing the self-enforcing model enabling the fixation of the increased gene activities.