| Literature DB >> 15453916 |
Sasa Stefanović1, Danny W Rice, Jeffrey D Palmer.
Abstract
BACKGROUND: Numerous studies, using in aggregate some 28 genes, have achieved a consensus in recognizing three groups of plants, including Amborella, as comprising the basal-most grade of all other angiosperms. A major exception is the recent study by Goremykin et al. (2003; Mol. Biol. Evol. 20:1499-1505), whose analyses of 61 genes from 13 sequenced chloroplast genomes of land plants nearly always found 100% support for monocots as the deepest angiosperms relative to Amborella, Calycanthus, and eudicots. We hypothesized that this conflict reflects a misrooting of angiosperms resulting from inadequate taxon sampling, inappropriate phylogenetic methodology, and rapid evolution in the grass lineage used to represent monocots.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15453916 PMCID: PMC543456 DOI: 10.1186/1471-2148-4-35
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Current consensus hypothesis of angiosperm relationships. Tree topology is based on [42, 91] and references in Table 1. Small asterisks indicate the general phylogenetic position of the ten angiosperms (generic names shown for all but the three grasses) examined by Goremykin et al. [19]. The large asterisk indicates the addition in this study of the early-arising monocot Acorus to the Goremykin et al. [19] dataset. The height of the triangles reflects the relative number of species in eudicots (~175,000 species), monocots (~70,000), and magnoliids (~9,000) as estimated by Judd et al. [18] and Walter Judd (personal communication). The other five angiosperm groups shown contain only between 1 and ~100 species.
Comparison of recent studiesa that identify the sister lineages of angiosperms.
| Study reference | No. of genes (genomesb) | No. of angiosperms | No. of nucleotides | Basal vs. core angiospermsc | Monophyly of monocotsc | ||||
| [4] | 5 (c, m, n) | 97 | 8,733 | + | 90 | + | 97 | + | 99/98 |
| [3] | 5 (c, m, n) | 45 | 6,564 | + | 94d | + | 99d | + | 98d |
| [6] | 3 (c, n) | 553 | 4,733 | + | 65e | + | 71e | + | 95e |
| [1] | 2 (n) | 26 | 2,208 | + | 92/83f | + | 86 | + | 100 |
| [2] | 2 (n) | 52 | 2,606 | + | 88/57f | + | 68 | + | 87 |
| [8] | 6 (c, m, n) | 33 | 8,911 | - | n/ag | + | 99 | + | 100 |
| [9] | 17 (c) | 18 | 14,244 | + | 69 | + | 94 | + | 53 |
| [11] | 1 (c) | 38 | 4,707 | + | 99 | + | 100 | + | 100 |
| [14] | 1 (c) | 361 | 1,749 | + | 86 | + | 89 | + | 99 |
aNot included are several other studies also supportive of Amborella-sister, but which are largely duplicative of the above [5, 7, 31], or whose structure does not match sufficiently with the structure of this table [10, 12, 13], or which have extremely limited sampling (6 taxa) within angiosperms [15].
bc = chloroplast; m = mitochondrial; n = nuclear
cIndicated relationship recovered (+) or not recovered (-); parsimony BS values shown unless otherwise specified. See Fig. 1 for definition of indicated relationships.
dOnly BS values derived from ML analysis are shown.
eJackknife support values.
fBootstrap values were inferred from separate phyA and phyC treatments; other BS values in this study were derived from concatenated phyA and phyC sequences.
n/a – not applicable. This study found Amborella+Nymphaea as sister to all other angiosperms (see Discussion).
Figure 3Neighbor joining analyses using different evolutionary models and/or taxon sampling. Distance matrices were calculated from the first- and second-position matrix of Goremykin et al. [19] using (A) the K2P model, (B) the ML HKY85 model with four gamma-distributed rate categories and parameters estimated from the corresponding ML analysis, and (C) the K2P model with Acorus added to the first- and second-position matrix as described in Methods.
Figure 8Competing hypotheses for the rooting of angiosperms showing the same underlying angiosperm topology when outgroups are excluded. A. Rooting within monocots (Mono), on the branch between grasses and all other angiosperms (see Fig. 2C, whose BS values are shown here, and also Fig. 2F; also see Goremykin et al. [19]). B. Unrooted network, with arrow showing alternative rootings as in A and C. C. Canonical rooting on the branch between Amborella and the rest of angiosperms (see Fig. 2I, whose BS values are shown here, and also Fig. 2L). We emphasize that 100% BS was obtained for Amborella-sister and for monocot monophyly (compared to 79% and 78% in C) using ML methods that allow for site-to-site rate heterogeneity (e.g., Additional files 1–3).
Figure 2The effect of changing sampling of monocots as a function ofphylogenetic method. Analysis of the 61-gene data matrix using: Rows A-C, DNA parsimony; D-F, protein parsimony; G-I DNA ML HKY85 with no rate categories; J-L, RY-coded DNA parsimony. The first column of trees is with the Goremykin et al. [19] taxon sampling (grasses, but not Acorus), the second is with Acorus but not grasses, and the third is with both grasses and Acorus. All analyses used the first- and second-position matrix, either with or without the addition of Acorus as explained in Methods. Trees J-L use the same matrices, but with the nucleotides RY-coded.
The 56 MODELTEST models and the grasses- or Amborella-sister topology that received the higher likelihood.
| grasses | grasses | grasses | grasses | |
| grasses | Amborella | Amborella | grasses | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella | |
| grasses | Amborella | Amborella | Amborella |
The four rate-heterogeneity conditions used in these MODELTEST analyses are: 1) "equal" = equal rates across sites; 2) "+I" = estimated percentage of invariant sites; 3) "+G" = four gamma-distributed rate categories; and 4) "+I+G" = combination of invariant sites and 4 gamma-rate categories.
Figure 4Maximum likelihood analyses using different evolutionary models. Trees A-C were calculated using the first- and second-position Goremykin et al. [19] matrix. Tree D was calculated using all three codon positions. All trees were built using ML with the HKY85 model and the following treatments of rate heterogeneity: A. No rate categories. B. Four gamma-distributed rate categories. C. Estimated proportion of invariant sites (no gamma rate categories). D. No rate categories (all three positions). Parameters were estimated separately for each analysis as described in Methods.
Figure 5Bootstrap support and the SH-test p-value for the The left vertical line in A and right line in B indicate the rate-heterogeneity parameter estimated from the data. The right vertical line in A and left line in B indicate the boundary where the topology of the best tree transitions between Amborella-sister and grasses-sister. All analyses were performed using the 61-gene first- and second-position matrix of Goremykin et al. [19] and the ML HKY85 model with the α parameter or proportion of invariant sites indicated on the X-axis. The transition-transversion parameter was estimated for each specified rate-heterogeneity parameter. p(Δ|LAmb-Lgrasses|) signifies the SH-test p-value for the difference between the likelihood scores of the two topologies. Bootstrap searches and SH-tests were performed as described in Methods.
Figure 6Support for A. ML HKY85 analyses with four gamma-distributed rate categories. Parameter estimates were calculated individually for each gene in a manner analogous to that performed on the concatenated dataset. B. MP analyses. All three codon positions are included in all analyses shown in both figures. Solid red lines correspond to Amborella-sister and dashed blue lines to grasses-sister topologies.
Figure 7Inclusion of A. ML HKY85 with no rate categories (cf. Fig. 4A). B. ML HYK85 with four gamma-distributed rate categories (cf. Fig. 4B). C. ML with estimated proportion of invariant sites (no gamma rate categories; cf. Fig. 4C). D. NJ using a ML HKY85 model with four gamma-distributed rate categories to calculate distances (cf. Fig. 3B). All analyses used first- and second-positions only.