| Literature DB >> 30102363 |
Yuttapong Thawornwattana1,2, Daniel Dalquen1, Ziheng Yang1,3.
Abstract
Deep coalescence and introgression make it challenging to infer phylogenetic relationships among closely related species that arose through radiative speciation events. Despite numerous phylogenetic analyses and the availability of whole genomes, the phylogeny in the Anopheles gambiae species complex has not been confidently resolved. Here we extract over 80, 000 coding and noncoding short segments (called loci) from the genomes of six members of the species complex and use a Bayesian method under the multispecies coalescent model to infer the species tree, which takes into account genealogical heterogeneity across the genome and uncertainty in the gene trees. We obtained a robust estimate of the species tree from the distal region of the X chromosome: (A. merus, ((A. melas, (A. arabiensis, A. quadriannulatus)), (A. gambiae, A. coluzzii))), with A. merus to be the earliest branching species. This species tree agrees with the chromosome inversion phylogeny and provides a parsimonious interpretation of inversion and introgression events. Simulation informed by the real data suggest that the coalescent approach is reliable while the sliding-window analysis used in a previous phylogenomic study generates artifactual species trees. Likelihood ratio test of gene flow revealed strong evidence of autosomal introgression from A. arabiensis into A. gambiae (at the average rate of ∼0.2 migrants per generation), but not in the opposite direction, and introgression of the 3 L chromosomal region from A. merus into A. quadriannulatus. Our results highlight the importance of accommodating incomplete lineage sorting and introgression in phylogenomic analyses of species that arose through recent radiative speciation events.Entities:
Mesh:
Year: 2018 PMID: 30102363 PMCID: PMC6188554 DOI: 10.1093/molbev/msy158
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.Posterior probabilities of species trees inferred using bpp for 100-locus blocks of (A) noncoding and (B) coding loci. The y-axis scales from 0 to 1. The x-axis provides approximate chromosomal coordinates of blocks, where the position for each block was taken to be the average of the starting positions in AgamP3 coordinates over all loci within the block.
. 2.Trees ii and xi with posterior estimates of population sizes (θs, numbers on the branches) and species divergence times (τs, the bottom horizontal axis; bars represent 95% HPD intervals) from bpp. Parameters for tree ii were estimated from all loci in chromosome 2L excluding 2La region, while those for tree xi were estimated from all loci in the Xag region of chromosome X. Divergence times were calculated assuming the mutation rate per site per generation for autosomal noncoding loci (A), with 11 generations per year, and 0.524 and 0.323 times (supplementary fig. S4, Supplementary Material online) as large for the coding autosomes (C) and coding Xag loci (D), respectively. Ma, million years ago.
Proportions of Inferred Trees From Data Sets of 100 Loci Simulated Using Trees ii and xi (With the Minimum, Median and Maximum Support Values for the Inferred Tree in Parentheses).
| Tree | RAxML (Subset 1) | RAxML (Subset 2) | |
|---|---|---|---|
| 2L data (6464 loci, 10 replicates) | |||
| i | 0.0062 (0.43, 0.65, 1.00) | 0.4308 (0.34, 0.76, 1.00) | 0.4492 (0.33, 0.76, 1.00) |
| ii* | 0.9877 (0.47, 0.99, 1.00) | 0.5139 (0.29, 0.77, 1.00) | 0.5062 (0.31, 0.78, 1.00) |
| iii | 0.0062 (0.48, 0.53, 0.81) | 0.0385 (0.51, 0.56, 0.59) | 0.0354 (0.36, 0.59, 0.94) |
| Xag data (1825 loci, 10 replicates) | |||
| ix | 0.1000 (0.42, 0.52, 0.78) | 0.1105 (0.36, 0.61, 0.99) | 0.1316 (0.38, 0.57, 0.96) |
| x | 0.0474 (0.41, 0.67, 1.00) | 0.4790 (0.46, 0.83, 1.00) | 0.4632 (0.36, 0.83, 1.00) |
| xi* | 0.8526 (0.38, 0.84, 1.00) | 0.4105 (0.33, 0.75, 1.00) | 0.4053 (0.38, 0.74, 1.00) |
Note.—Each data set is a block of 100 loci. For bpp the inferred tree is the MAP tree and the support value is the posterior probability, while for RAxML the inferred tree is the ML tree from the concatenated alignments and the support value is the minimum bootstrap support value for clades. RAxML also inferred other trees in a small fraction (about 1%) of 2L data sets. Trees are given in figure 1. The correct tree (indicated by *) is tree ii for 2L data and tree xi for Xag data (fig. 1).
. 3.Bpp analysis of the GAL and RQL triplets. Left panel: posterior probabilities of species trees. Middle and right panels: posterior means of the two divergence times in the MAP species tree across different regions of the genome.
. 4.Species trees A (top) and B (bottom) for the 2La region (fig. S27A and B in Fontaine et al. 2015), based on the assumed species trees xi and ix, respectively. The inversion orientations in the extant and ancestral species are given as “a”: fixed for the 2La orientation, “+”: fixed for the 2L+a orientation, and “a/+”: polymorphic for both orientations.
. 5.(A and B) Nucleotide diversity and (C and D) pairwise FST statistic between A. arabiensis (A) and different 2La karyotypes of A. gambiae (G) and A. coluzzii (C) calculated from genome-wide SNP data of natural populations from Fontaine et al. (2015). The 2La region is shaded. Sample sizes are n = 23 for A. gambiae (35% /, 22% /2La, 43% 2La/2La), n = 11 for A. coluzzii (73% /, 27% 2La/2La, no /2La) and n = 12 for A. arabiensis.
. 6.Estimated species phylogeny with introgression for the A. gambiae species complex. Branch lengths are based on the divergence time estimates (τs) from the Xag data (fig. 2).