| Literature DB >> 20140067 |
Ralf Aurahs1, Markus Göker, Guido W Grimm, Vera Hemleben, Christoph Hemleben, Ralf Schiebel, Michal Kucera.
Abstract
The high sequence divergence within the small subunit ribosomal RNA gene (SSU rDNA) of foraminifera makes it difficult to establish the homology of individual nucleotides across taxa. Alignment-based approaches so far relied on time-consuming manual alignments and discarded up to 50% of the sequenced nucleotides prior to phylogenetic inference. Here, we investigate the potential of the multiple analysis approach to infer a molecular phylogeny of all modern planktonic foraminiferal taxa by using a matrix of 146 new and 153 previously published SSU rDNA sequences. Our multiple analysis approach is based on eleven different automated alignments, analysed separately under the maximum likelihood criterion. The high degree of congruence between the phylogenies derived from our novel approach, traditional manually homologized culled alignments and the fossil record indicates that poorly resolved nucleotide homology does not represent the most significant obstacle when exploring the phylogenetic structure of the SSU rDNA in planktonic foraminifera. We show that approaches designed to extract phylogenetically valuable signals from complete sequences show more promise to resolve the backbone of the planktonic foraminifer tree than attempts to establish strictly homologous base calls in a manual alignment.Entities:
Keywords: automated alignment; fossil record; phylogeny; planktonic foraminifera
Year: 2009 PMID: 20140067 PMCID: PMC2808177 DOI: 10.4137/bbi.s3334
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Lengths of manual alignments used to infer the phylogeny of planktonic foraminifera. Summary of planktonic foraminifera molecular phylogenies based on the 3’ fragment of the SSU rDNA gene. Almost one half of the ~1000 bp in the analysed fragment are lost when attempting to align “unambiguously” across the entire clade. The remaining variable regions clearly contain phylogenetically useful information, as can be seen by the longer alignments produced for subclades including only selected species. This phylogenetic information is lost when aligning across the three major clades of planktonic foraminifera, or when the alignment includes benthic outgroups. Data sources (in chronological order): 1997, Darling et al2 [7], Huber et al4 [8], de Vargas et al3 [3]; 1999, Darling et al7 [5]; 2000, Darling et al9 [4]; 2001, Stewart et al11 [3], de Vargas et al10 [16,17]; 2002, de Vargas et al69 [9]; 2003, Darling et al70 [10,11,18]; 2004, Darling et al51 [19,20]; 2006, Darling et al54 [2,21]; 2007, Darling et al71[22]; 2008, Kuroyanagi et al72 [12], Ujiié et al73 [1]; 2009, Aurahs et al74 [13,14,15].
Species of planktonic foraminifers. A list of all planktonic foraminifera species included in this study; and their representation by SSU rDNA data in public databases and newly assembled data.
| | No | No |
| | Yes | Yes |
| | No | No |
| | Singleton | Yes |
| | No | No |
| | No | No |
| | No | No |
| | No | No |
| | No | No |
| | No | No |
| | No | No |
| | Singleton | No |
| | Singleton | Yes |
| | Singleton | Yes |
| | Yes | No |
| | No | No |
| | Yes | Yes |
| | No | No |
| | No | No |
| | No | No |
| | No | No |
| | Yes | No |
| | Yes | Yes |
| | Yes | No |
| | Yes | No |
| | No | No |
| | Yes | No |
| | Yes | No |
| | No | No |
| | Singleton | No |
| | Yes | Yes |
| | Yes | No |
| | Yes, biphyletic | No |
| | Yes | No |
| | No | No |
| | No | No |
| | Yes | No |
| | No | No |
| | No | No |
| | No | No |
| | Yes | No |
| | Singleton | Yes |
| | No | No |
| | No | No |
These singletons are possibly not representative for the assigned species.
The new data revealed new sequence (sub)types.
The new data includes sequences from a globorotaliid specimen, which may be G. scitula or not.
Available in public databases at the time of data mining (October 2008). A SSU rDNA sequence of C. nitida is available since the end of 2008.69
Features of the alignments and phylogenetic trees. This table lists features of the eleven sequence alignments constructed and the resulting phylogenetic trees. The entire alignment length is shown. For the resulting best ML trees, the final estimate for the alpha value of the gamma distribution and the log likelihood of the best tree are shown, as well as the sum of the Robinson-Foulds (RF) distances of each tree to the other nine trees and the agreement with the affiliation of sequences to morphospecies (T-score; lower scores indicate better agreement). Note that the likelihood of the best tree cannot directly be used to select the best alignment, because common ML functions as those implemented in RAxML do not consider gaps.
| CLWOPT | 1557 | 0.97349 | −3,598,746,746 | 3416 | 25 |
| EINSI | 1786 | 0.48367 | −3,012,840,593 | 3194 | 23 |
| GINSI | 1837 | 0.48314 | −2,849,473,664 | 3206 | 23 |
| − | |||||
| LINSI | 1751 | 0.53379 | −3,069,451,219 | 3226 | 23 |
| − | |||||
| MUSCLE | 2192 | 0.82643 | −5,422,632,153 | 4126 | 25 |
| − | |||||
| 1856 | 0.60630 | −3,203,410,297 | 3356 | 23 | |
| − |
Alignments considerd for Results and Discussion in bold font.
Figure 2.Comparison of alignments and trees. UPGMA dendrograms inferred from overlap scores between sequence alignments (right) and from Robinson-Foulds distances between the corresponding trees (left) are shown. Based on this comparison, einsi, ginsi and linsi were not considered further because they are too close to the mafft approach. muscle and clwopt were omitted because they resulted in some sequences being severely misplaced (see text). Apparently, tree topology can partially (mainly the close relationship of einsi, ginsi, linsi and mafft) be predicted by the comparison of the underlying sequence alignments.
Figure 3.Partly collapsed ML tree inferred from the MAFFT alignment. The best ML tree inferred from the mafft alignment is shown. Branches are scaled in terms of the expected numbers of substitutions per site. Subtrees that include only sequences from the same morphospecies are collapsed at their root node and represented by black rectangles. Support, i.e. bootstrap percentages from the clustalw/kalign/mafft/nralign/poa/poaglo-based analyses, of the collapsed subtrees and their relationships is indicated on the terminal nodes and on the branches. Not collapsed and accordingly annotated versions of all best known trees are found in the Additional file 2.
Support of morphotaxa under parsimony. ML bootstrap support (see also Fig. 3) is included for comparison. Hastigerina pelagica is, in addition to the known problematic case of Globigerinoides ruber (see text) the only morphotaxon that receives no sufficient support.
| 100 | 19 | 91 | 98 | 95 | 97 | 0 | 100 | 100 | 100 | 100 | 100 | 100 | 1 | 100 | 100 | 100 | |
| 100 | 100 | 100 | 96 | 60 | 89 | 100 | 100 | 77 | 80 | 100 | 95 | 96 | 99 | 94 | 26 | 59 | |
| 98 | 99 | 99 | 97 | 99 | 94 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 99 | 98 | 100 | 100 | 100 | 94 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 99 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 100 | 100 | 94 | 100 | 100 | 100 | 54 | 100 | 100 | 100 | 100 | 72 | 100 | 0 | 100 | 100 | 100 | |
| 100 | 100 | 99 | 98 | 86 | 99 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 64 | 100 | 100 | 100 | |
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 99 | 98 | 99 | 100 | 98 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 93 | 99 | 100 | 98 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 100 | 89 | 100 | 99 | 100 | 100 | 100 | 100 | 100 | 100 | 12 | 100 | 100 | 98 | 100 | 100 | 100 | |
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 99 | 97 | 95 | 94 | 90 | 82 | 100 | 100 | 95 | 99 | 100 | 100 | 100 | 81 | 90 | 99 | 99 | |
| 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| 4 | 0 | 38 | 9 | 88 | 68 | 1 | 4 | 5 | 12 | 2 | 8 | 13 | 0 | 6 | 24 | 31 | |
| 100 | 100 | 100 | 94 | 86 | 85 | 100 | 99 | 96 | 87 | 100 | 93 | 100 | 61 | 86 | 91 | 93 | |
| 100 | 100 | 100 | 93 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 90 | 99 | 100 | 100 | |
Moderate and low support values are highlighted.
Support for selected phylogenetic scenarios. Comparison of our multiple analysis results (Fig. 3; Additional files 2, 3; BS under ML and MP) with eight previous manual-alignment based phylogenetic reconstructions in terms of the statistical support for relationships that appear to be consistently resolved in the fossil record of planktonic foraminifera. Values of support for each node are given where the respective study have identified the node as the dominant signal; “no” indicates analyses where an alternative topology has been preferred and “N/A” indicates analyses where some of the constituent species of the clade above the node have not been included.
| Darling et al | N/A | No | N/A | N/A | N/A | N/A | (No) | N/A | 99 | 82 | 87 |
| De Vargas et al | N/A | 46/41/73 | N/A | N/A | N/A | N/A | No/58/51 | N/A | 91/100/100 | No | No |
| De Vargas and Pawlowski | N/A | N/A | 47 | N/A | N/A | N/A | (81) | N/A | 100 | <50 | No |
| Darling et al | N/A | No | N/A | N/A | N/A | N/A | (57) | N/A | 100 | 47 | No |
| Darling et al | N/A | (76) | N/A | N/A | N/A | Unresolved | (86) | N/A | 99 | <50 | Unresolved |
| Stewart et al | Unresolved | (69) | N/A | N/A | N/A | N/A | (88) | No | 98 | <50 | No |
| Darling et al | Unresolved | <70 | N/A | N/A | 78 (?) | Unresolved | <70 | N/A | 100 | <70 | No |
| Ujiié et al | 1.00/100 | 0.88/80 | No | No | Unresolved | N/A | 0.87/52 | N/A | 1.0/100 | 0.83/80 | Unresolved |
| 100–59 | 82–30 (10 | 78–2 | 39–5 | 91–0 | 30–5 | 100–37 | 94–56 | 100–99 | 100–83 (0 | 100–32 | |
| 100–52 | 20–0 | 34–0 | 7–0 | 99–0 | 14–0 | 61–22 (0 | 100–56 | 100 | 99–64 (0 | 66–12 | |
These studies did not include the phylogenetically challenging taxon Hastigerina pelagica.
Based on the KALIGN-generated alignment (see text).
No Globorotalia species included.
Only two close relatives included.
Figure 4.Alternative phylogenetic relationships within the nonspinose macroperforate clade as inferred from the six alignments. Shown are reduced ML phylograms based on the six selected alignments, with bootstrap support under ML annotated on the according branches (MP bootstrap support can be found in Additional files 2, 3). Scale bars are adjusted to 0.1 expected substitutions per site. Where indicated, branches have been broken down to one half of the original length. Subtrees comprising the same morphospecies were collapsed, as in Figure 3, as well as the microperforate (blue triangle) and spinose (red) clades. Not collapsed full ML trees can be found in the Additional file 2.
Figure 5.Alternative phylogenetic relationships within the spinose clade inferred from the six alignments. Shown are reduced ML phylograms based on the six selected alignments, with BSML annotated on the according branches. Scale bars are adjusted to 0.2 expected substitutions per site. Where indicated, branches have been broken down to one half of the original length. Subtrees comprising the same morphospecies as well as the microperforate and nonspinose macroperforate clades were collapsed, analogous to Figures 3 and 4. Not collapsed full ML trees can be found in the Additional file 2.
Figure 6.Comparison to the fossil record. A compilation of the fossil record of modern lineages.59,60,64 Solid lines represent known fossil ranges of species or lineages leading to these species. Incongruence between the molecular-based hypothesis and the fossil record is highlighted; fossil evidence that is contradictory to molecular phylogenies but poorly resolved is also indicated.