| Literature DB >> 28830116 |
Meng-Yun Chen1, Dan Liang1, Peng Zhang1.
Abstract
The interordinal relationships of Laurasiatherian mammals are currently one of the most controversial questions in mammalian phylogenetics. Previous studies mainly relied on coding sequences (CDS) and seldom used noncoding sequences. Here, by data mining public genome data, we compiled an intron data set of 3,638 genes (all introns from a protein-coding gene are considered as a gene) (19,055,073 bp) and a CDS data set of 10,259 genes (20,994,285 bp), covering all major lineages of Laurasiatheria (except Pholidota). We found that the intron data contained stronger and more congruent phylogenetic signals than the CDS data. In agreement with this observation, concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies, whereas the CDS data set produced weakly supported and incongruent results. Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup, but the CDS data produced unstable results under the same conditions. Interestingly, gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical, suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data. Our final result of Laurasiatheria phylogeny is (Eulipotyphla,((Chiroptera, Perissodactyla),(Carnivora, Cetartiodactyla))), favoring a close relationship between Chiroptera and Perissodactyla. Our study 1) provides a well-supported phylogenetic framework for Laurasiatheria, representing a step towards ending the long-standing "hard" polytomy and 2) argues that intron within genome data is a promising data resource for resolving rapid radiation events across the tree of life.Entities:
Keywords: Laurasiatheria; data subsampling; intron; noncoding; phylogenomics; phylogeny
Mesh:
Year: 2017 PMID: 28830116 PMCID: PMC5737624 DOI: 10.1093/gbe/evx147
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Brief Information of All Data Sets Used for Phylogenetic Inference in This Study
| Data Set Names | No. of Species | No. of Genes | Alignment Length (bp) | Parsimony Informative Site | Missing Data (%) | Criteria of Gene Selection | Inferred Topologies |
|---|---|---|---|---|---|---|---|
| Total10259CDS | 22 | 10,259 | 20,994,285 | 5,379,346 | 10.6 | All genes of the CDS data set | |
| GC_CDS1 | 22 | 7,697 | 16,239,495 | 4,192,858 | 8.9 | Average GC%t of the third codon position <73.1% | |
| GC_CDS2 | 22 | 5,130 | 11,216,196 | 2,850,021 | 7.8 | Average GC% of the third codon position <60.4% | |
| GC_CDS3 | 22 | 2,850 | 6,460,929 | 1,583,067 | 7.1 | Average GC% of the third codon position <48% | |
| Rate_CDS1 | 22 | 5,127 | 10,101,696 | 2,192,851 | 10.9 | Evolutionary rate smaller than the median rate of all genes | |
| Rate_CDS2 | 22 | 5,132 | 10,892,589 | 3,186,495 | 10.3 | Evolutionary rate larger than the median rate of all genes | |
| Resolution_CDS1 | 22 | 5,698 | 15,118,602 | 4,084,517 | 10.5 | Average gene tree bootstrap 70 or more | |
| Resolution_CDS2 | 22 | 2,702 | 9,258,162 | 2,593,379 | 10.5 | Average gene tree bootstrap 80 or more | |
| Resolution_CDS3 | 22 | 450 | 2,524,608 | 745,109 | 10.8 | Average gene tree bootstrap 90 or more | |
| Completeness_CDS1 | 22 | 9,870 | 20,209,332 | 5,177,162 | 9.6 | Amount of missing data <30% | |
| Completeness_CDS2 | 22 | 8,808 | 18,088,938 | 4,609,905 | 7.9 | Amount of missing data <20% | |
| Completeness_CDS3 | 22 | 6,029 | 12,686,955 | 3,196,588 | 4.7 | Amount of missing data <10% | |
| X+E_out_CDS | 19 | 10,259 | 20,994,285 | 4,653,797 | 10.7 | All genes of the CDS data set but remove all Afrotheria sequences | |
| A+E_out_CDS | 20 | 10,259 | 20,994,285 | 5,061,736 | 9.6 | All genes of the CDS data set but remove all Xenarthra sequences | |
| A+X_out_CDS | 20 | 10,259 | 20,994,285 | 4,761,727 | 11.6 | All genes of the CDS data set but remove all Euarchontoglires sequences | |
| Total3638Intron | 22 | 3,638 | 19,055,073 | 6,628,387 | 43.2 | All genes of the Intron data set | |
| GC_Intron1 | 22 | 3,445 | 18,630,433 | 6,469,836 | 43.1 | Average GC content <56% | |
| GC_Intron2 | 22 | 3,125 | 17,631,768 | 6,121,790 | 42.7 | Average GC content <50% | |
| GC_Intron3 | 22 | 2,728 | 16,117,169 | 5,583,015 | 42.3 | Average GC content <44.6% | |
| Rate_Intron1 | 22 | 1,817 | 7,915,005 | 2,474,048 | 43.6 | Evolutionary rate smaller than the median rate of all genes | |
| Rate_Intron2 | 22 | 1,821 | 11,140,068 | 4,154,339 | 42.9 | Evolutionary rate larger than the median rate of all genes | |
| Resolution_ Intron1 | 22 | 3,279 | 18,080,775 | 6,339,204 | 42.7 | Average gene tree bootstrap 70 or more | |
| Resolution_ Intron2 | 22 | 2,537 | 15,031,270 | 5,320,830 | 42.0 | Average gene tree bootstrap 80 or more | |
| Resolution_ Intron3 | 22 | 1,068 | 7,225,682 | 2,579,450 | 41.1 | Average gene tree bootstrap 90 or more | |
| Completeness_Intron1 | 22 | 2,521 | 14,575,834 | 5,456,797 | 38.8 | Amount of missing data <50% | |
| Completeness_Intron2 | 22 | 1,313 | 7,976,863 | 3,209,477 | 33.4 | Amount of missing data <40% | |
| Completeness_Intron3 | 22 | 404 | 2,076,439 | 914,069 | 25.9 | Amount of missing data <30% | |
| X+E_out_Intron | 19 | 3,638 | 19,055,073 | 5,811,657 | 39.5 | All genes of the Intron data set but remove all Afrotheria sequences | |
| A+E_out_Intron | 20 | 3,638 | 19,055,073 | 6,037,174 | 41.7 | All genes of the Intron data set but remove all Xenarthra sequences | |
| A+X_out_Intron | 20 | 3,638 | 19,055,073 | 5,841,403 | 43.3 | All genes of the Intron data set but remove all Euarchontoglires sequences |
Inferred topologies corresponding to those reported in figure 4.
Fig. 1.—Characteristics of CDS and intron data sets (blue = CDS, red = intron). Boxplots show (A) variation in gene length, (B) GC content of each gene (among genes) and each species (among species), (C) relative evolutionary rates of loci (measured by the average pairwise distance for each gene), and (D) average bootstrap support values across all estimated gene trees. (E) Visualization of ML tree space using multidimensional scaling plot of 10,259 ML gene-trees from the CDS data set; each dot represents a tree inferred from one gene. Distances between dots represent Robinson–Foulds distances between gene trees. (F) Multidimensional scaling plot of 3,638 ML gene-trees from the intron data set. (G) Histogram of the average RF distance for a gene relative to all other genes, summarized from the CDS data set and the intron data set.
Fig. 4.—Phylogenetic inference robustness for the CDS and Intron data sets, which were resampled into 11 data subsets under different data subsampling criteria (see Materials and Methods for details). These data subsets were analyzed with both concatenated and species-tree inferences. (A) There are in total seven topologies found in these phylogenetic analyses (each color represents a specific tree topology). Bootstrap support for certain topologies from different data subsets are shown in charts (B) through (E).
Fig. 2.—Phylogenetic relationships of Laurasiatheria inferred from the CDS data set (10,259 genes; 20,994,285 sites). Phylogeny was inferred by concatenation ML and species tree analysis using the ASTRAL program. The ML phylogeny is shown on the left, and the ASTRAL species tree is shown on the right (outgroup not shown). Values next to branches are bootstrap values. Branches without support values all received a bootstrap value of 100%.
Fig. 3.—Phylogenetic relationships of Laurasiatheria inferred from the Intron data set (3,638 genes; 19,055,073 sites). Phylogeny was inferred by concatenation ML and species tree analysis using the ASTRAL program. Both analyses produced identical phylogenies for the interordinal relationships of Laurasiatherian mammals. All branches have a bootstrap value of 100% in both analyses. Branch lengths are from the ML analysis.
Fig. 5.—Effect of outgroup choices on phylogenetic inferences of the CDS data set (A) and the Intron data set (B). There are three outgroup combination schemes: “X + E” refers to the use of Xenarthra and Euarchontoglires as outgroup; “A + E” refers to the use of Afrotheria and Euarchontoglires as outgroup; “A + X” refers to the use of Afrotheria and Xenarthra as outgroup. Note that the change in outgroup has no effect on the phylogenetic inference of the Intron data set but can influence the phylogenetic inference of the CDS data set. Chiro: Chiroptera, Cetartio: Cetartiodactyla, Perisso: Perissodactyla, Carni: Carnivora.
Fig. 6.—Gene-support frequency statistics for the 15 alternative hypotheses (H1–H15) regarding the interrelationships of Chiroptera, Perissodactyla, Carnivora, and Cetartiodactyla. Intron data are displayed in blue, and CDS data are displayed in green. Genes whose gene trees do not support any of the 15 alternative hypotheses are considered “nonmatching.” Gene-tree statistics are based on “matching” genes only. The histograms on the left show the proportion of gene trees that support a given hypothesis.