Literature DB >> 32287391

An empirical test of the midpoint rooting method.

Pablo N Hess¹, Claudia A DE Moraes Russo¹.

Abstract

The outgroup method is widely used to root phylogenetic trees. An accurate root indication, however, strongly depends on the availability of a proper outgroup. An alternate rooting method is the midpoint rooting (MPR). In this case, the root is set at the midpoint between the two most divergent operational taxonomic units. Although the midpoint rooting algorithm has been extensively used, the efficiency of this method in retrieving the correct root remains untested. In the present study, we empirically tested the success rate of the MPR in obtaining the outgroup root for a given phylogenetic tree. This was carried out by eliminating outgroups in 50 selected data sets from 33 papers and rooting the trees with the midpoint method. We were thus able to compare the root position retrieved by each method. Data sets were separated into three categories with different root consistencies: data sets with a single outgroup taxon (54% success rate for MPR), data sets with multiple outgroup taxa that showed inconsistency in root position (82% success rate), and data sets with multiple outgroup taxa in which root position was consistent (94% success rate). Interestingly, the more consistent the outgroup root is, the more successful MPR appears to be. This is a strong indication that the MPR method is valuable, particularly for cases where a proper outgroup is unavailable.

Keywords: molecular clock; outgroup rooting; outgroups; phylogenetic trees; systematics; unrooted trees

Year: 2007 PMID： 32287391 PMCID： PMC7110036 DOI： 10.1111/j.1095-8312.2007.00864.x

Source DB: PubMed Journal: Biol J Linn Soc Lond ISSN： 0024-4066 Impact factor: 2.138

INTRODUCTION

Rooting evolutionary trees is usually considered a simple step in phylogenetic construction. Nonetheless, tree building algorithms produce unrooted phylogenetic trees because all the processes leading to the final tree are computed as reversible (Swofford ; Nei & Kumar, 2000; Sanderson & Shaffer, 2002). Despite its importance, however, rooting is often overlooked in phylogenetic constructions (Swofford ). The outgroup method is the most widely used in phylogenetic studies but the correct indication of the root position strongly depends on the availability of a proper outgroup (Hendy & Penny, 1989; Wheeler, 1990; Tarrío, Rodríguez-Trelles & Ayala, 2000). This apparently simple requisite may prove rather limiting when studying viruses (Stavrinides & Guttman, 2004), mostly because of extremely high and diverse evolutionary rates in these organisms. Higher taxonomic groups such as Angiosperms (Qiu ), birds, and mammals (Holland, Penny & Hendy, 2003) may also be subject to the lack of appropriate extant outgroups. Additionally, issues such as long-branch attraction (Felsenstein, 1978; Qiu ; Sanderson & Shaffer, 2002), differences in nucleotide composition between taxa (Tarrío ), and long-edge attraction (Hendy & Penny, 1989) represent major misleading factors for outgroup rooting. As previously suggested (Tarrío ; Sanderson & Shaffer, 2002), the midpoint rooting method (also known as MPR; Farris, 1972) might be useful in these situations because it does not depend on the existence of an outgroup. The MPR method places the root of the tree at the midpoint between the two most divergent operational taxonomic units (OTUs) (Swofford ; Nei & Kumar, 2000), as measured by the sum of branch lengths between these OTUs. The theoretical basis of MPR relies on the assumption that all OTUs in a given tree should display the same average evolutionary rate (Tarrío ; Huelsenbeck, Bollback & Levine, 2002). Although the midpoint rooting algorithm has been extensively used, the efficiency of this method in retrieving the correct root remains untested. By eliminating outgroups in data sets that are not problematic in regards to outgroup selection, and rooting the tree with the midpoint method, we were able to compare the root position retrieved by each method. Therefore, in the present study, we empirically tested the success rate of the MPR in obtaining the same root as the outgroup method for a given tree, and verified that it shows a surprisingly high performance.

MATERIAL AND METHODS

To evaluate the success rate of MPR in rooting trees, we selected data sets from the literature. As increases in taxonomic level usually reflect on evolutionary distances between sequences, which in turn lead to progressive violations of the molecular clock assumptions, we restricted our choice to papers focusing on low taxonomic levels of tetrapods (i.e. congeneric species and members of a single species). Furthermore, among those, we also selected papers that analysed mitochondrial genes, to minimize issues caused by paralogy and recombination (Avise ; Overton & Rhoads, 2004). tRNA-coding segments in the sequences were not used because their sequences were usually incomplete. Of the 33 papers selected, 13 utilized more than one gene, and thus the total number of individual-gene data sets amounted to 50. The number of OTUs in each data set varied from six to 169 (mean = 25.2, SD = 25.3). As previously explained, all data sets included at least one outgroup, and 28 of them provided more than one (for details, see Supplementary Material). DNA sequences were retrieved from the NCBI molecular database as indicated by the authors. Protein-coding sequence alignments were performed with ClustalW (Higgins, Thompson & Gibbs, 1994) implementation in DAMBE, version 4.2.13 (Xia & Xie, 2001), based on their respective amino acid products. Noncoding sequences, such as rRNA genes and the mitochondrial D-loop region, were also aligned with the ClustalW implementation present in the DAMBE software. All alignments were performed using default parameters, and they were visually inspected and corrected whenever appropriate. Phylogenetic and molecular evolutionary analyses were conducted using MEGA, version 2.1 (Kumar ). The Neighbour-joining method (Saitou & Nei, 1987) was used to reconstruct all phylogenetic trees because of its reliability and computer time limitations for other methods (Kuhner & Felsenstein, 1994; Russo, Takezaki & Nei, 1996; Rosenberg & Kumar, 2001). As expected, intra- and interspecies p-distance measures were small (mean = 0.104, SD = 0.053), a condition that favours the use of the Jukes–Cantor correction (Jukes & Cantor, 1969) due to its smaller variance when compared to more complex evolutionary models (Nei, 1991; Russo, 1997). A bootstrap test (Felsenstein, 1985) with 2000 replicates (Hedges, 1992) was performed on all phylogenetic trees to evaluate statistical branch support (Hillis & Bull, 1993; Sitnikova, Rzhetsky & Nei, 1995). Thus, we proceeded to the empirical test of the MPR, which required the assignment of an outgroup root. All data sets with a single outgroup (herein termed ‘single outgroup data sets’; SO) had their outgroup roots straightforwardly assigned. The other data sets (named ‘multiple outgroup data sets’), however, were subject to outgroup root consistency checks (Maddison, Donoghue & Maddison, 1984). Such checks were performed by comparing the root yielded by each of the available outgroups individually. In addition, we also compared these root positions with the one obtained through the simultaneous use of all outgroups. When individual outgroups were inconsistent, but the combination of all outgroups produced a tree in which they were all joined at the same root position, we assigned that position as the outgroup root for MPR comparison purposes. These data sets were named ‘multiple outgroup, inconsistently rooted data sets’ (MOI). In the two MOI data sets in which the combination of multiple outgroups did not produce a single root position, the final root was based on a majority-rule consensus of individual outgroups. Finally, the last category of data sets was the ‘multiple outgroup, consistently rooted data sets’ (MOC), in which all outgroups, either individually or combined, yielded the exact same root position. To test the performance of the MPR based on the outgroup method, one midpoint-rooted tree was constructed for each data set. Naturally, the outgroup was excluded from this analysis. The SYSTAT program, version 11 (available at http://www.systat.com) was used to perform a nonparametric Kruskal–Wallis test to check the homogeneity concerning the numbers of ingroups and outgroups, among and within the three different categories (SO, MOI, and MOC). Additionally, we evaluated the significance of differences in MPR success rates among categories by a chi-square test in SYSTAT, version 11.

RESULTS

In the present study, we assumed that the outgroup method yields the correct root position in every tree. Unfortunately, this assumption may be doubtful in some cases (Holland ), yet testing the choice of outgroup by systematists is clearly beyond the scope of our study. Nevertheless, 28 out of 50 analysed data sets (i.e. the MOI and MOC categories) provided multiple outgroups. This allowed us to reduce the potential issue of outgroup root misplacement through ingroup monophyly checks (Maddison ). Another issue is how to deal with topological differences caused by the exclusion of the outgroups. This is bound to happen, particularly at this taxonomic level, because the closeness between species produces some short branches with typically low support. We attempted to minimize such problems by analysing root differences in condensed trees. For this, we used 33% and 50% cut-off values for condensing the trees. Cut-off values indicate the minimum support required for a branch to remain uncollapsed. Therefore, the application of a 33% cut-off value to a tree causes every branch with a bootstrap value lower than 33% to be collapsed and become part of a polytomy. Cut-off values higher than those have consistently produced complete polytomies (data not shown).

OUTGROUP NUMBER AND ROOT CONFIDENCE

As previously mentioned, the SO data sets were unsuitable for ingroup monophyly checks, lending this category an uncertain degree of confidence in root placements. Even though the data sets in the MOI category allowed us to test every tree for ingroup monophyly, their inconsistent results also portrayed a doubtful root position. Therefore, we placed an intermediate confidence on the root positions derived by the outgroup method for the data sets in this category. Doubtless confirmation of ingroup monophyly was only possible in the MOC category, which also showed the highest success rate of the MPR amongst all categories. Generally, data set features such as number of ingroups, number of OTUs, and mean distance between ingroups and outgroups were not significantly distinct among the three categories (SO, MOI, MOC). This result eliminates some potential sources of biases in our analyses. Regarding the 33% cut-off trees, we found significant differences among categories referring to MPR success rates. However, we were unable to establish such a difference between the MOI and MOC categories in the trees condensed at the 50% cut-off limit. Nevertheless, when considered as one category (MOI + MOC), there was a significant difference from the SO category (χ2: P = 0.035 and 0.030, at 33% and 50% cut-off values, respectively).

CONDENSED TREES AND ROOT CONFIDENCE

When trees were condensed at the 33% bootstrap cut-off limit, MPR correctly placed the root in 35 of 50 (70%) data sets. In the SO category, the MPR method retrieved the correct root location in only 54% of such data sets (Table 1) whereas, in the multiple-outgroup data sets (MOI and MOC categories), MPR achieved a much higher (82%) success rate. More specifically, in the 12 MOI data sets, MPR achieved a 67% success rate, whereas the 16 MOC data sets yielded an impressive 94% success rate.

Table 1.

Midpoint rooting success rates for the three data set categories

Category	33% cut-off	50% cut-off
SO	54%	64%
MOI	67%	83%*
MOC	94%	94%*

We failed to assign statistical difference between these values. Only when used in combination (MOI + MOC) did these values show statistical difference from the SO category in the 50% cut-off value.

Percentages indicate cut-off values used for condensing the trees.

MOC, data sets with multiple outgroups available, which showed no consistency issues; MOI, data sets with multiple outgroups available, which showed inconsistencies in rooting the trees; SO, data sets with only one outgroup available.

Midpoint rooting success rates for the three data set categories We failed to assign statistical difference between these values. Only when used in combination (MOI + MOC) did these values show statistical difference from the SO category in the 50% cut-off value. Percentages indicate cut-off values used for condensing the trees. MOC, data sets with multiple outgroups available, which showed no consistency issues; MOI, data sets with multiple outgroups available, which showed inconsistencies in rooting the trees; SO, data sets with only one outgroup available. In the 50% cut-off trees analysis, the overall (SO + MOI + MOC) number of MPR successes was slightly larger than in the 33% analysis, increasing from 35 (70%) to 39 (78%) out of 50 data sets. Data sets in the SO category were correctly rooted by MPR on 64% of the trees (Table 1). On the other hand, the multiple-outgroup data sets (MOI + MOC) yielded a 89% success rate for MPR. The MOI category alone yielded a 83% success rate, whereas the MOC data sets maintained the 94% rate already achieved through the 33% cut-off condensation. To ascertain that the collapsing of branches had not artificially increased the success rate of the midpoint method using condensed trees, we also analysed the success rate in noncondensed trees. In this case, the midpoint method successfully retrieved the correct root for 33 out of 50 (66%) data sets. It is interesting to note that, in ten (65%) of the remaining 17 data sets, the midpoint root position was a single node away from that derived by the outgroup method. To interpret the low MPR success rate (66%) on noncondensed trees, it should not be overlooked that these trees often had their roots placed, by both methods, on branches with very low support values. Consequently, such root positions are highly uncertain themselves. Therefore, the noncondensed trees ought to remain inconclusive, even though such success rates are higher than the expected by chance (Huelsenbeck, Bollback & Levine, 2002). Condensed trees, on the other hand, allow us to place greater confidence on every branch because poorly supported branches are collapsed. Higher cut-off values are capable of reducing even further the effects of poorly supported branches. By utilizing this approach, we could briefly investigate, in more detail, whether failures in the MPR method were due to phylogenetic reconstruction problems in general.

DISCUSSION

The midpoint method displayed an impressively high success rate, which is especially remarkable in the MOC data sets because these are the situations in which we know the root position with the greater degree of confidence. Furthermore, on every data set category, MPR offered better results with trees condensed at greater cut-off values. Conversely, for all trees condensed at the same values, the midpoint method achieved greater success in the data sets with higher branch (and thus root) confidence. Again, this is a clear indication that a consistent outgroup root placement also corresponds to an increase in the MPR success rate with the same data. For example, in the SO category, it is possible that MPR retrieved the correct root whereas the single outgroup did not. In the MOC category, the trees were already quite trustworthy at the 33% cut-off threshold, which is demonstrated by the 94% MPR success rate. Results with the higher 50% condensation value corroborate this trend, as the success rate of the midpoint method remained the same. The MPR success rates in the MOC category are surprisingly high for a rooting method based solely on branch lengths and, hence, highly dependent on the assumption of an untested molecular clock (Holland ). Nevertheless, in spite of such a high efficiency in placing the root, we would expect that the performance of the midpoint method would be reduced in higher taxa. One would expect this to happen because the main assumption of this method (i.e. homogeneity of substitution rates along the tree, or a clock-like behaviour for the sequences) tends to be progressively violated as biological processes become more distinct between historically distant lineages (Li, 1997). Interestingly, Huelsenbeck showed that even severe violations of the molecular clock assumptions still allow for a moderate, yet significant, success rate at rooting trees with the direct molecular clock rooting method, which, by definition, strongly depends on clock-like evolution. Therefore, we suggest that the MPR method, which is slightly less dependent on the assumptions of a molecular clock, might as well be successfully applied to higher level phylogenetic reconstructions. At this point, it is important to mention some issues that may affect outgroup rooting. Long-branch attraction (Felsenstein, 1978; Qiu ; Sanderson & Shaffer, 2002) is probably the most important and debated source of failure for the outgroup method. In this case, the often long outgroup branch may be attached to other long branches in the tree, thus yielding a wrong root position. Another source of error for the outgroup rooting may be due to differences in nucleotide composition between outgroups and ingroups (Tarrío ). The difference might confound character polarity, and thus also contribute to outgroups being clustered with OTUs based on sequences compositions rather than on their evolutionary relationships. Finally, long-edge attraction (Hendy & Penny, 1989) may also cause the outgroup to cluster with any external long branch with higher probability than to correctly place the root on one of the short internal branches. Most major animal groups have their internal phylogenetic relationships already stable and trustfully established. Hence, the aforementioned circularity in the outgroup method usually poses no problem for such groups, but it is often a restricting factor for viruses (Stavrinides & Guttman, 2004) because of their high, usually heterogeneous evolutionary rates, and on account of the lack of a priori phylogenetic information on them. Also for some major groups, such as angiosperms (Qiu ), whose adequate sister-groups are extinct, and, in some situations, even birds and mammals (Holland ), phylogenies are affected by problems in the application of the outgroup method. In such cases, MPR might become a more valuable method than the outgroup method for retrieving the correct root position in the tree. When any of these issues are in effect, outgroup rooting usually becomes a less convenient option for rooting (Tarrío ; Holland ), and midpoint rooting may be preferred. Considering the surprisingly high success rates for the midpoint method, we suggest that it should be used as an alternative rooting method, and it could be adopted by default when outgroup rooting is not straightforward and the constructed phylogeny is stable enough. The following material is available for this article online: Table S1. General information (# of ingroups, outgroups, OTUs, Mean p-distance between ingroups and between in- and outgroups, and the category – see text) and the performance of the MPR method for each data set. Click here for additional data file. Figure S1. Trees are shown as rooted by the midpoint method, with trees collapsed at a 33% bootstrap cut-off. Bootstrap support values (only those greater than 50%) are shown above each branch. Grey circles indicate outgroup positions. A circle with a U indicates the single outgroup location, and thus is only present in single-outgroup data sets. In multiple-outgroup data sets, circles with numbers represent the insertion points of individual outgroups, whereas circles with a T indicate the root position as inferred by all combined outgroups when they agree. In cases where multiple combined outgroups disagree on root placement, the circled T is invalidated (as indicated with an X on the circle), and the arrows show where individual outgroups point when combined. Click here for additional data file. Appendix S1. References used in Supplementary Material. This material is available as part of the online article from: http://www.blackwell-synergy.com/doi/abs/10.1111/j.1095-8312.2007.00864.x (This link will take you to the article abstract). Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. Click here for additional data file.

17 in total

1. Inferring the root of a phylogenetic tree.

Authors: John P Huelsenbeck; Jonathan P Bollback; Amy M Levine
Journal: Syst Biol Date: 2002-02 Impact factor: 15.683

2. Outgroup misplacement and phylogenetic inaccuracy under a molecular clock--a simulation study.

Authors: B R Holland; D Penny; M D Hendy
Journal: Syst Biol Date: 2003-04 Impact factor: 15.683

3. Molecular phylogenetic relationships based on mitochondrial and nuclear gene sequences for the Todies (Todus, Todidae) of the Caribbean.

Authors: Lowell C Overton; Douglas D Rhoads
Journal: Mol Phylogenet Evol Date: 2004-08 Impact factor: 4.286

4. The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies.

Authors: S B Hedges
Journal: Mol Biol Evol Date: 1992-03 Impact factor: 16.240

5. Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a case study.

Authors: R Tarrío; F Rodríguez-Trelles; F J Ayala
Journal: Mol Phylogenet Evol Date: 2000-09 Impact factor: 4.286

6. Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny.

Authors: C A Russo; N Takezaki; M Nei
Journal: Mol Biol Evol Date: 1996-03 Impact factor: 16.240

7. Efficiencies of different statistical tests in supporting a known vertebrate phylogeny.

Authors: C A Russo
Journal: Mol Biol Evol Date: 1997-10 Impact factor: 16.240

8. CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.

Authors: Joseph Felsenstein
Journal: Evolution Date: 1985-07 Impact factor: 3.694

9. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors: N Saitou; M Nei
Journal: Mol Biol Evol Date: 1987-07 Impact factor: 16.240

10. Mosaic evolution of the severe acute respiratory syndrome coronavirus.

Authors: John Stavrinides; David S Guttman
Journal: J Virol Date: 2004-01 Impact factor: 5.103

11 in total

1. Fundamental evolution of all Orthocoronavirinae including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV and SARS-CoV-2.

Authors: Denis Jacob Machado; Rachel Scott; Sayal Guirales; Daniel A Janies
Journal: Cladistics Date: 2021-04-26 Impact factor: 5.254

2. Fundamental evolution of all Orthocoronavirinae including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV and SARS-CoV-2.

Authors: Denis Jacob Machado; Rachel Scott; Sayal Guirales; Daniel A Janies
Journal: Cladistics Date: 2021-04-26 Impact factor: 4.714

3. Phylogeographic Patterns of Haemoproteid Assemblages of Selected Avian Hosts: Ecological and Evolutionary Implications.

Authors: Alžbeta Šujanová; Radovan Václav
Journal: Microorganisms Date: 2022-05-12

4. Genomic and Phenotypic Characterization of Chloracidobacterium Isolates Provides Evidence for Multiple Species.

Authors: Mohit Kumar Saini; Aswathy Sebastian; Yoshiki Shirotori; Nathan T Soulier; Amaya M Garcia Costas; Daniela I Drautz-Moses; Stephan C Schuster; Istvan Albert; Shin Haruta; Satoshi Hanada; Vera Thiel; Marcus Tank; Donald A Bryant
Journal: Front Microbiol Date: 2021-06-17 Impact factor: 5.640

5. FLAVi: An Enhanced Annotator for Viral Genomes of Flaviviridae.

Authors: Adriano de Bernadi Schneider; Denis Jacob Machado; Sayal Guirales; Daniel A Janies
Journal: Viruses Date: 2020-08-14 Impact factor: 5.048

6. Evolutionary History of the Globin Gene Family in Annelids.

Authors: Flávia A Belato; Christopher J Coates; Kenneth M Halanych; Roy E Weber; Elisa M Costa-Paiva
Journal: Genome Biol Evol Date: 2020-10-01 Impact factor: 3.416

7. Phylogenetic Analysis Indicates That Evasin-Like Proteins of Ixodid Ticks Fall Into Three Distinct Classes.

Authors: Shoumo Bhattacharya; Patricia Anne Nuttall
Journal: Front Cell Infect Microbiol Date: 2021-10-22 Impact factor: 5.293

8. The tree of life of polyamine oxidases.

Authors: Daniele Salvi; Paraskevi Tavladoraki
Journal: Sci Rep Date: 2020-10-20 Impact factor: 4.379

9. Mitochondrial genome sequencing and phylogeny of Haemagogus albomaculatus, Haemagogus leucocelaenus, Haemagogus spegazzinii, and Haemagogus tropicalis (Diptera: Culicidae).

Authors: Fábio Silva da Silva; Ana Cecília Ribeiro Cruz; Daniele Barbosa de Almeida Medeiros; Sandro Patroca da Silva; Márcio Roberto Teixeira Nunes; Lívia Carício Martins; Jannifer Oliveira Chiang; Poliana da Silva Lemos; Gabriel Muricy Cunha; Renato Freitas de Araujo; Hamilton Antônio de Oliveira Monteiro; Joaquim Pinto Nunes Neto
Journal: Sci Rep Date: 2020-10-12 Impact factor: 4.379

10. First Description of the Mitogenome and Phylogeny of Culicinae Species from the Amazon Region.

Authors: Bruna Laís Sena do Nascimento; Fábio Silva da Silva; Joaquim Pinto Nunes-Neto; Daniele Barbosa de Almeida Medeiros; Ana Cecília Ribeiro Cruz; Sandro Patroca da Silva; Lucas Henrique da Silva E Silva; Hamilton Antônio de Oliveira Monteiro; Daniel Damous Dias; Durval Bertram Rodrigues Vieira; José Wilson Rosa; Roberto Carlos Feitosa Brandão; Jannifer Oliveira Chiang; Livia Carício Martins; Pedro Fernando da Costa Vasconcelos
Journal: Genes (Basel) Date: 2021-12-14 Impact factor: 4.096