| Literature DB >> 26608059 |
Yuyu Wang1, Xiaofan Zhou2, Ding Yang3, Antonis Rokas4.
Abstract
Comparison of individual gene trees in several recent phylogenomic studies from diverse lineages has revealed a surprising amount of topological conflict or incongruence, but we still know relatively little about its distribution across the tree of life. To further our understanding of incongruence, the factors that contribute to it and how it can be ameliorated, we examined its distribution in a clade of 20 Culicidae mosquito species through the reconstruction and analysis of the phylogenetic histories of 2,007 groups of orthologous genes. Levels of incongruence were generally low, the three exceptions being the internodes concerned with the branching of Anopheles christyi, with the branching of the subgenus Anopheles as well as the already reported incongruence within the Anopheles gambiae species complex. Two of these incongruence events (A. gambiae species complex and A. christyi) are likely due to biological factors, whereas the third (subgenus Anopheles) is likely due to analytical factors. Similar to previous studies, the use of genes or internodes with high bootstrap support or internode certainty values, both of which were positively correlated with gene alignment length, substantially reduced the observed incongruence. However, the clade support values of the internodes concerned with the branching of the subgenus Anopheles as well as within the A. gambiae species complex remained very low. Based on these results, we infer that the prevalence of incongruence in Culicidae mosquitoes is generally low, that it likely stems from both analytical and biological factors, and that it can be ameliorated through the selection of genes with strong phylogenetic signal. More generally, selection of genes with strong phylogenetic signal may be a general empirical solution for reducing incongruence and increasing the robustness of inference in phylogenomic studies.Entities:
Keywords: bipartition; bootstrap support (BS); gene tree; internode certainty (IC); maximum likelihood
Mesh:
Year: 2015 PMID: 26608059 PMCID: PMC4700963 DOI: 10.1093/gbe/evv235
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
The Effect of Using Genes and Bipartitions with Strong Phylogenetic Signal on the Culicidae Phylogeny
| Treatment | Treatment Details | Average GSF | TC | RTC | Number of Internodes with Increased GSF | Number of Internodes with Decreased GSF | Number of Internodes with Increased IC | Number of Internodes with Decreased IC |
|---|---|---|---|---|---|---|---|---|
| Default analysis | Default analysis | 87.86 | 10.80 | 0.64 | NA | NA | NA | NA |
| Selection of genes whose ML trees have high average BS | Genes with average BS ≥ 70% (1,818 genes) | 90.07 | 11.11 | 0.65 | 5 | 0 | 5 | 0 |
| Genes with average BS ≥ 80% (1,379 genes) | 92.14 | 11.70 | 0.69 | 11 | 0 | 10 | 0 | |
| Genes with average BS ≥ 90% (378 genes) | 95.29 | 12.77 | 0.75 | 13 | 0 | 14 | 0 | |
| Genes with average BS ≥ 95% (66 genes) | 96.43 | 13.02 | 0.77 | 15 | 0 | 14 | 1 | |
| Selection of genes whose ML trees have high TC | Using only the 1,818 genes with the highest TC | 89.93 | 11.14 | 0.66 | 4 | 0 | 5 | 0 |
| Using only the 1,379 genes with the highest TC | 92.07 | 11.68 | 0.69 | 11 | 0 | 9 | 0 | |
| Using only the 378 genes with the highest TC | 95.64 | 12.81 | 0.75 | 14 | 0 | 13 | 1 | |
| Using only the 66 genes with the highest TC | 96.21 | 12.91 | 0.76 | 15 | 0 | 14 | 1 | |
| Selection of bipartitions with high BS in the ML trees of genes | Using only bipartitions that have ≥ 70% BS | NA | 12.40 | 0.73 | NA | NA | 14 | 0 |
| Using only bipartitions that have ≥ 80% BS | NA | 12.88 | 0.76 | NA | NA | 14 | 0 | |
| Using only bipartitions that have ≥ 90% BS | NA | 13.24 | 0.78 | NA | NA | 13 | 0 | |
| Using only bipartitions that have ≥ 95% BS | NA | 13.34 | 0.78 | NA | NA | 13 | 0 |
Note.—The columns correspond to: the specific filtering of genes or bipartitions with strong phylogenetic signal tested (treatment and treatment details), the average GSF of the internodes of the Culicidae eMRC phylogeny (average GSF), the TC of the Culicidae eMRC phylogeny, the RTC of the Culicidae eMRC phylogeny, the numbers of internodes of the Culicidae eMRC phylogeny in which GSF increases or decreases by more than 3%, and the numbers of internodes of the Culicidae eMRC phylogeny in which IC increases or decreases by more than 0.03. As the maximum value of IC for a given internode is 1, the maximum value of TC for a given phylogeny is the number of internodes, which in this case is 17. In the analyses concerned with the use of bipartitions, only those bipartitions that displayed BS greater or equal to 70%, 80%, 90%, or 95% in the ML trees of the 2,007 genes were used to construct eMRC phylogenies, which were then compared with the default analysis. NA, not applicable.
FThe Culicidae species phylogeny recovered from the concatenation analysis of 2,007 genes using ML. Asterisks denote internodes that received 100% BS by the concatenation analysis. The same topology is also recovered by the eMRC phylogeny as well as by the coalescent phylogeny (ASTRAL and STAR) of the 2,007 individual gene trees. Black values near internodes correspond to GSF and IC, respectively. Bold red values correspond to support values of the coalescent phylogeny based on ASTRAL and STAR, respectively. The scale bar is in units of nucleotide substitutions per site.
FThe distribution of the agreement between the bipartitions present in the 2,007 individual gene trees and the concatenation phylogeny, measured using the normalized Robinson–Foulds tree distance. The phylogeny of the 20 Culicidae species analyzed in this study is unrooted and contains 17 nontrivial bipartitions.
FThe Culicidae species phylogeny recovered from the concatenation analysis of first 999 bp sequence of 1,340 genes using ML. Asterisks denote internodes that received 100% BS by the concatenation analysis. The same topology is also recovered by the eMRC phylogeny as well as the coalescent phylogeny (ASTRAL and STAR) of the 1,340 individual gene trees. Black values near internodes correspond to GSF and IC, respectively. Bold red values correspond to support values of the coalescent phylogeny based on ASTRAL and STAR, respectively. The scale bar is in units of nucleotide substitutions per site.
The Effect of Using Genes and Bipartitions with Strong Phylogenetic Signal on the Culicidae Phylogeny Based on the First 999 bp of Every Gene’s Alignment
| Treatment | Treatment Details | Average GSF | TC | RTC | Number of Internodes with Increased GSF | Number of Internodes with Decreased GSF | Number of Internodes with Increased IC | Number of Internodes with Decreased IC |
|---|---|---|---|---|---|---|---|---|
| 1,340 genes | First 999 bp sequence of every gene (1,340 genes) | 84.43 | 9.98 | 0.59 | NA | NA | NA | NA |
| Selection of genes whose ML trees have high average BS | Genes with average BS ≥ 70% (1,138 genes) | 87.36 | 10.34 | 0.61 | 8 | 0 | 5 | 0 |
| Genes with average BS ≥ 80% (603 genes) | 91.00 | 11.17 | 0.66 | 14 | 0 | 10 | 1 | |
| Genes with average BS ≥ 90% (45 genes) | 95.43 | 12.25 | 0.72 | 14 | 0 | 13 | 2 | |
| Selection of genes whose ML trees have high TC | Using only the 1,138 genes with the highest TC | 87.29 | 10.34 | 0.61 | 10 | 0 | 7 | 0 |
| Using only the 603 genes with the highest TC | 90.71 | 11.16 | 0.66 | 13 | 0 | 10 | 0 | |
| Using only the 45 genes with the highest TC | 94.86 | 12.23 | 0.72 | 14 | 0 | 13 | 1 | |
| Selection of bipartitions with high BS in the ML trees of genes | Using only bipartitions that have ≥ 70% BS | NA | 11.82 | 0.70 | NA | NA | 14 | 0 |
| Using only bipartitions that have ≥ 80% BS | NA | 12.53 | 0.74 | NA | NA | 15 | 0 | |
| Using only bipartitions that have ≥ 90% BS | NA | 13.10 | 0.77 | NA | NA | 15 | 0 | |
| Using only bipartitions that have ≥ 95% BS | NA | 13.33 | 0.78 | NA | NA | 15 | 0 |
Note.—The columns correspond to: the specific filtering of genes or bipartitions with strong phylogenetic signal tested (treatment and treatment details), the average GSF of the internodes of the Culicidae eMRC phylogeny (average GSF), the TC of the Culicidae eMRC phylogeny, the RTC of the Culicidae eMRC phylogeny, the numbers of internodes of the Culicidae eMRC phylogeny in which GSF increases or decreases by more than 3%, and the numbers of internodes of the Culicidae eMRC phylogeny in which IC increases or decreases by more than 0.03. As the maximum value of IC for a given internode is 1, the maximum value of TC for a given phylogeny is the number of internodes, which in this case is 17. In the analyses concerned with the use of bipartitions, only those bipartitions that displayed BS greater or equal to 70%, 80%, 90%, or 95% in the ML trees of the 1,340 genes were used to construct eMRC phylogenies, which were then compared with the default analysis. NA, not applicable.