| Literature DB >> 26645679 |
Martijn J T N Timmermans1, Christopher Barton2, Julien Haran3, Dirk Ahrens4, C Lorna Culverwell5, Alison Ollikainen6, Steven Dodsworth7, Peter G Foster2, Ladislav Bocak8, Alfried P Vogler9.
Abstract
Mitochondrial genomes are readily sequenced with recent technology and thus evolutionary lineages can be densely sampled. This permits better phylogenetic estimates and assessment of potential biases resulting from heterogeneity in nucleotide composition and rate of change. We gathered 245 mitochondrial sequences for the Coleoptera representing all 4 suborders, 15 superfamilies of Polyphaga, and altogether 97 families, including 159 newly sequenced full or partial mitogenomes. Compositional heterogeneity greatly affected 3rd codon positions, and to a lesser extent the 1st and 2nd positions, even after RY coding. Heterogeneity also affected the encoded protein sequence, in particular in the nad2, nad4, nad5, and nad6 genes. Credible tree topologies were obtained with the nhPhyML ("nonhomogeneous") algorithm implementing a model for branch-specific equilibrium frequencies. Likelihood searches using RAxML were improved by data partitioning by gene and codon position. Finally, the PhyloBayes software, which allows different substitution processes for amino acid replacement at various sites, produced a tree that best matched known higher level taxa and defined basal relationships in Coleoptera. After rooting with Neuropterida outgroups, suborder relationships were resolved as (Polyphaga (Myxophaga (Archostemata + Adephaga))). The infraorder relationships in Polyphaga were (Scirtiformia (Elateriformia ((Staphyliniformia + Scarabaeiformia) (Bostrichiformia (Cucujiformia))))). Polyphagan superfamilies were recovered as monophyla except Staphylinoidea (paraphyletic for Scarabaeiformia) and Cucujoidea, which can no longer be considered a valid taxon. The study shows that, although compositional heterogeneity is not universal, it cannot be eliminated for some mitochondrial genes, but dense taxon sampling and the use of appropriate Bayesian analyses can still produce robust phylogenetic trees.Entities:
Keywords: PhyloBayes; RY coding; long-range PCR; mitogenomes; mixture models; rogue taxa
Mesh:
Year: 2015 PMID: 26645679 PMCID: PMC4758238 DOI: 10.1093/gbe/evv241
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FThe tree of Coleoptera based on protein-coding genes obtained with PhyloBayes. Major groups at the level of superfamily and above are labeled, and each superfamily is illustrated with a representative line drawing. Numbers on the branches represent posterior probabilities. Changes in anticodons of tRNALys (in Chrysomeloidea and in taxon labeled with blue triangle) and tRNAAla (in Polyphaga and taxa labeled with orange triangles) and several newly discovered gene order changes are mapped on the tree.
Likelihood and AIC Values under Various Partitioning Schemes
| Partitioning | No. of Partitions | Parameters ( | ln( | AIC | ΔAIC | 2 × ln ΔBF | RBF |
|---|---|---|---|---|---|---|---|
| None | 1 | 9 | −1,279,328.877 | 2,558,675.754 | 105,496.41 | 21.76 | 0.059 |
| Forward/Reverse | 2 | 18 | −1,258,902.112 | 2,517,840.225 | 64,660.88 | 20.79 | 0.058 |
| Homogeneous/Heterogeneous | 2 | 18 | −1,273,139.835 | 2,546,315.669 | 93,136.33 | 21.51 | 0.060 |
| Gene | 14 | 126 | −1,256,482.92 | 2,513,217.84 | 60,038.51 | 20.64 | 0.082 |
| Codon 1 + 2 + 3 | 3 | 27 | −1,251,864.871 | 2,503,783.742 | 50,604.41 | 20.30 | 0.058 |
| Codon 1 + 2 + 3 + Forward/Reverse | 6 | 54 | −1,229,360.303 | 2,458,828.606 | 5,649.26 | 16.11 | 0.050 |
| Gene × codon | 42 | 378 | −1,226,211.669 | 2,453,179.339 | n/a | n/a | n/a |
Note.—The likelihood of the data under each partitioning scheme was assessed on the fixed topology of a randomized parsimony tree under a GTR + G model, with the number of partitions, free parameters, and ln(L) scores used in the calculations given. ΔAIC refers to the decrease in likelihood relative to the most complex model (partitioning by gene and codon). Values for 2 × ln ΔBF10 > 10 are usually considered to be highly significant. RBF was calculated according to Castoe et al. (2005) as 2 × ln ΔBF10/Δ parameters, to penalize greater model complexity.
Compositional Heterogeneity in Mitogenomes
| Conventional Chi-square | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| No rogue | Protein | ||||||||||
| 1st | 2nd | 1st RY | 1st RY | 2nd | 1st two-state | 1st RY | 2nd | 1st two-state | |||
| atp6 | 22 | 0.0999 | 1 | 1 | 1 | 1 | 0.2 | 1 | 1 | 0.21 | 1 |
| atp8 | 1 | 1 | 1 | 1 | 1 | 0.36 | 0.53 | 1 | 0.24 | 0.51 | 0.02 |
| cox1–5′ | 142 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.85 |
| cox1–3′ | 43 | 1 | 1 | 1 | 1 | 1 | 0.95 | 1 | 1 | 0.99 | 1 |
| cox2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| cox3 | 2 | 1 | 1 | 1 | 1 | 1 | 0.82 | 1 | 1 | 0.85 | 0.98 |
| Cytb | 1 | 0.006 | 1 | 1 | 1 | 0.99 | 0.94 | 1 | 0.93 | 0.99 | 0.03 |
| nad1 | 8 | 1 | 1 | 1 | 1 | 1 | 0.34 | 1 | 0.98 | 0.42 | 0.79 |
| nad2 | 148 | 0 | 0.981 | 1 | 1 | 0 | 0 | 0.5 | 0 | 0 | 0 |
| nad3 | 2 | 1 | 1 | 1 | 1 | 0.22 | 0.12 | 1 | 0.14 | 0.22 | 0.45 |
| nad4 | 5 | 0 | 1 | 1 | 1 | 0.01 | 0.01 | 1 | 0 | 0 | 0 |
| nad4L | 5 | 1 | 1 | 1 | 1 | 0.96 | 0.95 | 1 | 0.96 | 0.99 | 0.73 |
| nad5 | 5 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| nad6 | 5 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0.01 | 0 | 0 |
Note.—Each gene was tested for the probability that the data are homogeneous and P values are provided in the table, separately for 1st and 2nd codon positions. Significance of the chi-square statistic was assessed either with the chi-square curve (“conventional chi-square”) or using a null distribution as described in Foster (2004). Note that four loci generally have low probability of homogeneity throughout. n missing, mitogenomes in the matrix not sequenced for a locus; no rogue, analysis conducted with rogue taxa omitted; protein, analysis based on amino acid sequence.
Recovery of Key Nodes and Other Features in Trees Obtained from Different Analyses with PhyML, RAxML or PhyloBayes (PB), Before and After (excl. heterogeneous) Removing the Composition Heterogeneous Markers nad2, nad4, nad5, and nad6
| Taxon | PhyML | RAxML | PhyloBayes | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PhyML | nhPhyML | Unpartitioned | Partitioned | Partitioned no 12S/16S | RY code | RY code no 12S/16S ( | Excluding heterogenous | Amino acid | Plus outgroups ( | No outgroups ( | Excluding rogue | Excluding heterogenous | ||||
| Position in | 1 | 2 | 3 | 4 | 4 | x | X | 5 | 6 | 7 | x | 8 | 9 | |||
| All suborders monophylyetic | N | Y | N | Y | Y | Y | Y | N | N | Y | Y | Y | N | |||
| Suborders relationships | n/a | P (Ar (M + Ad)) | (P + Ar) (M + Ad) | P (Ar (M + Ad)) | P (Ar (M + Ad)) | P (Ar (M + Ad)) | P (Ar (M + Ad)) | n/a | (P + Ar) (M + Ad) | P (M (Ar + Ad)) | P (M (Ar + Ad)) | P (M (Ar + Ad)) | P (M + Ar + Ad) | |||
| Geadephaga | M* | M* | M* | M* | M* | M | M | M* | M | M* | M* | M | M* | |||
| Elateriformia | P | M | P | M | M | M | M | M | P | M | M | M | M | |||
| Staphyliniformia + Scarabaeiformia | P | M | M | M | M | M | M | M | M | M | M | M | M | |||
| Scarabaeiformia | P | M | M | M | M | M | M | M | M | M | M | M | M | |||
| Bostrichiformia | P | M | M | M | M | M | M | P | M | M | M | M | M | |||
| Bostrichiformia sister | n/a | Elat | Elat | Elat | Elat | Cuc | Cuc | Cuc | Cuc | Cuc | Cuc | Cuc | Cuc | |||
| Cucujiformia | M | M | M | M | M | M | M | M | M | M | M | M | M | |||
| Cleroidea | M | M | M | M | M | M | M | P | M | M | M | M | M | |||
| Cerylonid Series | M | M | M | M | M | M | M | M | M | M | M | M | M | |||
| Nitidulid Series | M | P | M | M | M | M | M | P | P | U | P | M | P | |||
| Cucujoid Series | M | M | M | M | M | M | M | M | M | M | P | M | P | |||
| Nitidulid + Cucujoid | M | M | M | M | M | M | M | M | M | U | U | M | M | |||
| Tenebrionoidea + Lymexyloidea | M | M | M | M | M | M | M | M | M | M | M | M | M | |||
| Ten. + Lym. recipr.monophyly | N | Y | Y | N | N | N | N | N | Y | Y | Y | Y | Y | |||
| Chrysomeloidea | P | P | P | P | P | P | P | P | P | M | M | M | P | |||
| Curculionoidea | P | P | P | P | P | P | P | P | N | M | M | P | P | |||
| Chrys. + Curc. recipr. monophyly | N | N | N | N | N | N | N | N | N | Y | Y | Y | Y | |||
Note.—RAxML trees were obtained with the RY-coded 1st positions and 3rd positions removed, or on all data (including the rRNA genes). All PhyloBayes trees were conducted on the amino acid coded matrix. M, monophyletic; P, paraphyletic or polyphyletic; U, unresolved, consistent with monophyly; Y, yes, a feature is present; N, no, a feature is not present. In some cases, the groups were recovered but with certain member taxa absent (−) or other taxa included (+), as indicated. Note that Sphindus (Sphindidae) was disregarded when scoring Nitidulid and Cucujid series. The asterisks mark the trees that are monophyletic for Geadephaga only if Habrodera (Cicindelidae) is disregarded.
FMean branch length for major groups at suborder and superfamily levels. The corresponding numbers for the amino acid tree are provided in supplementary figure 4, Supplementary Material online.
FSchematic representation of the basal relationships from mitogenome sequences. The tree is based on the PhyloBayes analysis of figure 1, with outgroups removed. Key nodes were scored for nine trees obtained in various analyses described in table 3.