| Literature DB >> 21702958 |
Akito Y Kawahara1, Issei Ohshima, Atsushi Kawakita, Jerome C Regier, Charles Mitter, Michael P Cummings, Donald R Davis, David L Wagner, Jurate De Prins, Carlos Lopez-Vaamonde.
Abstract
BACKGROUND: Researchers conducting molecular phylogenetic studies are frequently faced with the decision of what to do when weak branch support is obtained for key nodes of importance. As one solution, the researcher may choose to sequence additional orthologous genes of appropriate evolutionary rate for the taxa in the study. However, generating large, complete data matrices can become increasingly difficult as the number of characters increases. A few empirical studies have shown that augmenting genes even for a subset of taxa can improve branch support. However, because each study differs in the number of characters and taxa, there is still a need for additional studies that examine whether incomplete sampling designs are likely to aid at increasing deep node resolution. We target Gracillariidae, a Cretaceous-age (~100 Ma) group of leaf-mining moths to test whether the strategy of adding genes for a subset of taxa can improve branch support for deep nodes. We initially sequenced ten genes (8,418 bp) for 57 taxa that represent the major lineages of Gracillariidae plus outgroups. After finding that many deep divergences remained weakly supported, we sequenced eleven additional genes (6,375 bp) for a 27-taxon subset. We then compared results from different data sets to assess whether one sampling design can be favored over another. The concatenated data set comprising all genes and all taxa and three other data sets of different taxon and gene sub-sampling design were analyzed with maximum likelihood. Each data set was subject to five different models and partitioning schemes of non-synonymous and synonymous changes. Statistical significance of non-monophyly was examined with the Approximately Unbiased (AU) test.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21702958 PMCID: PMC3145599 DOI: 10.1186/1471-2148-11-182
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Four data sets representing different taxon/gene sampling strategies. A) 27 taxa × 10 genes, B) 27 taxa × 21 genes, C) 57 taxa × 10 genes, D) combination of B and C into a single data set with a block of missing data accounting for approximately a quarter of total data.
Representation of genes and their amplicon names in each of the four data sets.
| Data set | ||||||
|---|---|---|---|---|---|---|
| A | B | C | D | |||
| | | | | |||
| 40fin2_3 | Phosphogluconate dehydrogenase [ | 750 | X | X | ||
| 42fin1_2 | Putative GTP-binding protein [ | 840 | X | X | ||
| 109fin1_2 | Gelsolin [ | 552 | X | X | X | X |
| 192fin1_2 | Glutamyl- & prolyl-tRNA sybphetase [ | 402 | X | X | ||
| 197fin1_2 | Triosephosphate isomerase [ | 444 | X | X | ||
| 262fin1_2 | Proteasome subunit [ | 501 | X | X | ||
| 265fin2_3 | Histidyl-tRNA sybphetase [ | 447 | X | X | X | X |
| 268fin1_2 | AMP deaminase [ | 768 | X | X | X | X |
| 3007fin1_2 | Glucose phosphosphate dehydrogenase [ | 621 | X | X | X | X |
| 3017fin1_2 | Tetrahydrofolate sybphase [ | 594 | X | X | ||
| 3070fin4_5 | Alanyl-tRNA sybphetase [ | 705 | X | X | ||
| 8028fin1_2 | Nucleolar cysteine-rich protein [ | 324 | X | X | ||
| 8091fin1_2 | Glucose phosphate isomerase [ | 666 | X | X | ||
| acc2_4 | Acetyl-coA carboxylase [ | 501 | X | X | X | X |
| CAD | Pyrimidine biosynthesis [ | 2913 | X | X | X | X |
| DDC | Dopa-decarboxylase [ | 708 | X | X | X | X |
| EF-1alpha | Elongation factor-1 alpha [ | 519 | X | X | X | X |
| enolase | Enolase [ | 1134 | X | X | X | X |
| histone 3 | Histone 3 [ | 273 | X | X | X | X |
| period | Period [ | 747 | X | X | ||
| wingless | Wingless [ | 402 | X | X | ||
A box with an "X" indicates a gene that was included in that particular data set.
Bootstrap support values across data sets for selected clades.
| Data set | Analysis | 'G.B.R.Y.' clade | Gracillariidae + Bucculatricidae + Yponomeutidae | Gracillariidae + Roeslerstammiidae + Yponomeutidae | Gracillariidae + Yponomeutidae ('G.Y.' clade) | Gracillariidae | Lithocolletinae + | Phyllocnistinae + Oecophyllembiinae + | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| A | nt123 | [< 50] | [< 50] | < 50 | < 50 | 99 | N/A | 87 | N/A | 100 | N/A |
| codon | [< 50] | [< 50] | [< 50] | [< 50] | 99 | N/A | 86 | N/A | 100 | N/A | |
| degen | 74 | 55 | [< 50] | [< 50] | 100 | N/A | 99 | N/A | 100 | N/A | |
| partitioned | [< 50] | [< 50] | [< 50] | [< 50] | 99 | N/A | 89 | N/A | 100 | N/A | |
| aa | [< 50] | [< 50] | [< 50] | [< 50] | 97 | N/A | 90 | N/A | 100 | N/A | |
| B | nt123 | [ | [< 50] | < 50 | [< 50] | 100 | N/A | 98 | N/A | 100 | N/A |
| codon | [ | [< 50] | < 50 | [< 50] | 100 | N/A | 93 | N/A | 100 | N/A | |
| degen | 90 | [< 50] | < 50 | [< 50] | 100 | N/A | 92 | N/A | 100 | N/A | |
| partitioned | [ | [< 50] | [< 50] | [< 50] | 100 | N/A | 94 | N/A | 100 | N/A | |
| aa | 66 | [< 50] | [< 50] | < 50 | 98 | N/A | 97 | N/A | 100 | N/A | |
| C | nt123 | [< 50] | [< 50] | [< 50] | [< 50] | 99 | 100 | 98 | 71 | 100 | [< 50] |
| codon | [< 50] | [< 50] | [< 50] | [< 50] | 99 | 100 | 98 | 58 | 100 | [< 50] | |
| degen | [< 50] | [< 50] | [< 50] | [< 50] | 100 | 100 | 100 | 89 | 100 | [< 50] | |
| partitioned | [< 50] | [< 50] | [< 50] | [< 50] | 99 | 100 | 97 | 77 | 100 | [< 50] | |
| aa | [< 50] | < 50 | [< 50] | [< 50] | 90 | 100 | 93 | < 50 | 100 | [< 50] | |
| D | nt123 | [ | [< 50] | < 50 | [< 50] | 99 | 100 | 100 | 67 | 100 | 51 |
| codon | [ | [< 50] | < 50 | < 50 | 97 | 100 | 100 | 100 | 100 | < 50 | |
| degen | 83 | [< 50] | < 50 | < 50 | 100 | 100 | 100 | 93 | 100 | [< 50] | |
| partitioned | [ | [< 50] | < 50 | [< 50] | 99 | 100 | 97 | 67 | 100 | 51 | |
| aa | 75 | [< 50] | [< 50] | < 50 | 89 | 100 | 94 | < 50 | 100 | < 50 | |
Square brackets indicate support values for clades that were not present in the ML tree.
Results of Chi-square tests on nucleotide compositional homogeneity.
| Taxon (number of species) | noLRall1 + nt2 | nt3 |
|---|---|---|
| All (27) | > 0.999 | < 0.001 |
| Gracillariidae (11) | > 0.999 | < 0.001 |
| Oecophyllembiinae | 0.969 | < 0.001 |
| Bucculatricidae + Tineidae (3) | 0.953 | < 0.001 |
| Bucculatricidae + Outgroups + | >0.999 | < 0.001 |
| Outgroups + | > 0.999 | < 0.001 |
| Total number of characters | 8701 | 4937 |
Results of Approximately Unbiased (AU) significance tests [67] for non-monophyly of predicted clades for data sets C and D.
| Predicted clade | nt123 | codon | degen | AA |
|---|---|---|---|---|
| Gracillarioidea | 0.083/< | 0.081/ | ||
| Gracillariinae + Lithocolletinae | 0.455/0.161 | 0.152/0.119 | ||
| Gracillariinae | ||||
| Gracillariinae minus | 0.107/0.084 | 0.104/0.202 | ||
| Oecophyllembiinae | 0.467/0.165 | 0.385/0.739 | 0.339/0.352 | 0.472/0.073 |
nt123, all nucleotides; Codon, codon model; degen, degeneracy1; AA, amino acids. Groups that were significant at alpha = 0.05 are shown in bold.
Figure 2Maximum likelihood degen1 tree for data set D. Large numbers denote six major clades in Gracillariidae (see Results). Asterisks indicate taxa sequenced for 21 genes. Hyphens denote support values < 50%. Square brackets, shown only for nodes with support > 50% that conflict with the nt123 ML tree, denote groupings not present in the ML tree for that analysis. Green branches lead to taxa placed in Gracillariinae. Morphological and behavioral traits that are characteristic of each group are also noted.
Figure 3ML trees based on non-synonymous differences only (degen1) of data sets A through C. Bucculatricidae + Gracillariidae + Roeslerstammiidae + Yponomeutidae form a monophyletic group for data sets A and B. Scale bar = 0.02 substitutions/site.