Literature DB >> 26759362

Linear Plasmids and the Rate of Sequence Evolution in Plant Mitochondrial Genomes.

Jessica M Warren1, Mark P Simmons1, Zhiqiang Wu1, Daniel B Sloan2.   

Abstract

The mitochondrial genomes of flowering plants experience frequent insertions of foreign sequences, including linear plasmids that also exist in standalone forms within mitochondria, but the history and phylogenetic distribution of plasmid insertions is not well known. Taking advantage of the increased availability of plant mitochondrial genome sequences, we performed phylogenetic analyses to reconstruct the evolutionary history of these plasmids and plasmid-derived insertions. Mitochondrial genomes from multiple land plant lineages (including liverworts, lycophytes, ferns, and gymnosperms) include fragmented remnants from ancient plasmid insertions. Such insertions are much more recent and widespread in angiosperms, in which approximately 75% of sequenced mitochondrial genomes contain identifiable plasmid insertions. Although conflicts between plasmid and angiosperm phylogenies provide clear evidence of repeated horizontal transfers, we were still able to detect significant phylogenetic concordance, indicating that mitochondrial plasmids have also experienced sustained periods of (effectively) vertical transmission in angiosperms. The observed levels of sequence divergence in plasmid-derived genes suggest that nucleotide substitution rates in these plasmids, which often encode their own viral-like DNA polymerases, are orders of magnitude higher than in mitochondrial chromosomes. Based on these results, we hypothesize that the periodic incorporation of mitochondrial genes into plasmids contributes to the remarkable heterogeneity in substitution rates among genes that has recently been discovered in some angiosperm mitochondrial genomes. In support of this hypothesis, we show that the recently acquired ψtrnP-trnW gene region in a maize linear plasmid is evolving significantly faster than homologous sequences that have been retained in the mitochondrial chromosome in closely related grasses.
© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  DNA polymerase; angiosperms; mitochondrial plasmids; mtDNA; mutation rate

Mesh:

Year:  2016        PMID: 26759362      PMCID: PMC4779610          DOI: 10.1093/gbe/evw003

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Identifying the factors that determine rates of DNA sequence evolution remains a fundamental challenge in the field of molecular evolution. Land plant mitochondrial genomes offer valuable opportunities for pursuing this challenge because they are some of the slowest evolving eukaryotic genomes ever identified (Wolfe et al. 1987; Drouin et al. 2008; Richardson et al. 2013). Rates of nucleotide substitution in plant mitochondrial DNA (mtDNA) are generally slower than in the plastid and nuclear genomes. The relatively low mitochondrial substitution rates are a derived state in land plants (Smith et al. 2014) and contrast with the rapid sequence evolution in mitochondrial genomes in many other eukaryotes, including bilaterian animals and yeast (Brown et al. 1979; Lynch et al. 2008). Plants also exhibit remarkable heterogeneity in mitochondrial substitution rates. In several angiosperm lineages, there have been mysterious genome-wide increases in rates of mtDNA evolution (Cho et al. 2004; Parkinson et al. 2005; Sloan, Alverson, et al. 2012; Skippington et al. 2015). In even more puzzling cases, some species have experienced massive gene-specific accelerations, while the rest of the mitochondrial genes maintain typically slow rates of nucleotide substitution (Mower et al. 2007; Sloan et al. 2009). For example, protein-coding genes within the Ajuga reptans mitochondrial genome differ by 340-fold in synonymous (i.e., “silent”) substitution rates (Zhu et al. 2014). These increases in nucleotide substitution rate have been interpreted as resulting from changes in underlying mutation rates, but the specific mechanisms remain elusive. Another important feature of many plant mitochondrial genomes is their propensity to acquire foreign or “promiscuous” DNA, which comes from diverse sources including the nucleus, plastids, bacteria, viruses, and mitochondria from other plant species (Ellis 1982; Knoop et al. 2011; Mower et al. 2012). In addition, plant mitochondria contain linear plasmids that are similar to mitochondrial plasmids found in other eukaryotic lineages and were likely acquired by horizontal transmission from fungi (Handa 2008). The genealogical relationships among angiosperm plasmid and plasmid-derived genes conflict with established angiosperm phylogenetic relationships (Robison and Wolyn 2005; McDermott et al. 2008), indicating a possible history of horizontal transfer among flowering plants, which has also been observed for plant mitochondrial genes, introns, and even entire genomes (Bergthorsson et al. 2003; Sanchez-Puerta et al. 2008; Rice et al. 2013; Park et al. 2015). Linear plasmids can exist as standalone extrachromosomal elements, but their sequences can also be physically integrated into the mitochondrial genome. They are not known to encode an integrase function but instead undergo recombination involving repeated sequences that are shared between plasmids and the mitochondrial chromosome (Brown and Zhang 1995). Plant mitochondrial plasmids often contain DNA polymerase (DPO) and RNA polymerase (RPO) genes, suggesting that they are capable of autonomous replication and transcription (Kuzmin and Levchenko 1987). The plasmid-encoded DPO genes are related to family B DNA polymerases found in some viruses (Knopf 1998; Filee et al. 2002) and are clearly distinct from the nuclear-encoded Pol I-like polymerases that are responsible for replication of plant mitochondrial and plastid genomes (Cupp and Nielsen 2014). Mitochondrial plasmids also exhibit lower guanine-cytosine (GC) content (Handa 2008) than the rest of the mitochondrial genome (Sloan and Taylor 2010), further supporting the interpretation that they are replicated independently. Analysis of complete mitochondrial genomes in Zea mays has found that integrated copies of linear plasmids have a disproportionately large number of single nucleotide polymorphisms (SNPs; Allen et al. 2007), which raises the possibility that these plasmids may contribute to elevated and variable rates of sequence evolution. Plasmids are capable of taking up genes from the mitochondrial genome (Leon et al. 1989), and plasmid-derived sequences have been transferred to the mitochondrial chromosome in both angiosperms (McDermott et al. 2008) and liverworts (Weber et al. 1995). However, our understanding of the phylogenetic distribution and evolutionary history of plasmid-derived sequences remains limited. Here, we take advantage of the large number of green plant mitochondrial genomes that are now available to address the following questions: 1) How widely distributed are mitochondrial plasmid sequences among the major lineages of green plants? 2) To what extent is the diversity of plasmid-derived sequences consistent with a history of vertical versus horizontal transmission? and 3) Do the rates of sequence evolution in plasmid-derived sequences differ from those of typical plant mitochondrial genes?

The Phylogenetic Distribution and Evolutionary History of Linear Plasmids in Plant Mitochondria

Plasmid-derived sequences are widespread in land plant mitochondrial genomes. We performed a BLAST-based search for DPO and/or RPO genes in all sequenced green plant mitochondrial genomes and found hits below an e-value threshold of 1 × 10−6 in all major embryophyte lineages except hornworts and mosses (fig. 1 and supplementary table S1, Supplementary Material online). No green algae matches were found that met this significance threshold, but a TBLASTN search returned a weak hit (e-value of 3 × 10−5) aligning a small portion of an RPO open reading frame (ORF) from the angiosperm Lolium perenne to the mtDNA of the chlorophyte Pseudendoclonium akinetum (56 amino acids with 46% identity and 1 indel). This region (nucleotide position 10,486–10,653) is not found within any ORF or annotated gene within the P. akinetum mitochondrial genome (GenBank accession NC_005926). Therefore, although it is possible that the hit represents a small fragment of an ancient linear plasmid insertion in this chlorophyte lineage, we are not able draw any definitive conclusions about the origins of this short sequence or about the history, if any, of linear mitochondrial plasmids in P. akinetum.
F

Summary of the distribution of plasmid-derived insertions of DPO and RPO genes in mitochondrial genomes across the green plant phylogeny as detected by BLAST searches (supplementary table S1, Supplementary Material online). The phylogenetic relationships follow the reconstruction in figure 2 from Wickett et al. (2014).

Summary of the distribution of plasmid-derived insertions of DPO and RPO genes in mitochondrial genomes across the green plant phylogeny as detected by BLAST searches (supplementary table S1, Supplementary Material online). The phylogenetic relationships follow the reconstruction in figure 2 from Wickett et al. (2014). Plasmid-derived sequence was most abundant in angiosperm mtDNA (fig. 1 and supplementary table S1, Supplementary Material online), with 74.5% of surveyed angiosperm mitochondrial genomes having significant similarity to DPO and/or RPO sequences. Identified sequences within angiosperms were also much more intact than those in other land plants, with many ORFs >3 kb in length. In contrast, the longest sequence fragments detected outside angiosperms were <600 bp, which suggest that these other land plant lineages may have had ancient associations with mitochondrial plasmids that are no longer active. In contrast, free plasmids still exist in the mitochondria of many angiosperms (Handa 2008), so it is not surprising that integrated plasmid gene sequences are much more common and intact in flowering plant mtDNA (fig. 1 and supplementary table S1, Supplementary Material online). For multiple reasons, it is likely that we are underestimating the prevalence of plasmid-derived insertions and the distribution of linear mitochondrial plasmids across the plant phylogeny. First, our search was based on only two genes (DPO and RPO), but plasmids sometimes lack one or both of these polymerase genes and usually contain additional genes (predominantly uncharacterized ORFs, which are difficult to detect or compare across species because of their lack of sequence conservation). Therefore, some insertions would be undetectable based on our methods. Second, the extreme level of sequence divergence in plasmid genes and the fact that copies inserted into mitochondrial chromosomes generally appear to degenerate as pseudogenes make it difficult to detect significant similarity between plasmid-derived sequences that are truly homologous. Finally, linear plasmids may be present in mitochondria without leaving any inserted fragments in the mitochondrial chromosome. For example, the sequenced mitochondrial genomes of Daucus carota (Iorizzo et al. 2012) and Brassica napus (Handa 2003) lacked any detectable insertions (supplementary table S1, Supplementary Material online) even though these species are known to have free plasmids and integrated plasmid sequence have been documented in other D. carota cytotypes (Robison and Wolyn 2005). We performed more detailed phylogenetic and cophylogenetic analyses to infer the transmission history of linear mitochondrial plasmids. These analyses were restricted to angiosperms because the identified sequences outside of flowering plants were too short and fragmented to provide a robust phylogenetic signal. We identified numerous well-supported conflicts between plasmid(-derived) and angiosperm phylogenies (figs. 2 and 3 and supplementary figs. S1–S6 and file S7, Supplementary Material online), which is consistent with previous findings rejecting a single plasmid origin and strict vertical inheritance (Robison and Wolyn 2005; McDermott et al. 2008). In addition, there were instances where multiple copies of DPO and RPO sequences from the same species failed to form monophyletic clades. For example, two divergent copies of the DPO sequence present in the same Ferrocalamus rimosivaginus mitochondrial genome were clearly resolved into two different clades (supplementary figs. S1 and S2, Supplementary Material online). Therefore, the history of linear mitochondrial plasmids must involve horizontal transfer among angiosperms and/or multiple independent acquisitions from fungi or other taxa. The latter scenario would mean that the closest extant relatives of angiosperm mitochondrial plasmids have yet to be identified because known plasmids in flowering plants appear to form a monophyletic group to the exclusion of fungal sequences (McDermott et al. 2008). Partial concatenation tree with reduced terminal sampling (27 terminals) that is arbitrarily rooted using midpoint rooting for consistency with the cophylogenetic analyses. The most likely topology identified by PhyML is shown with likelihood bootstrap (left) and SH-like aLRT (right) values above branches and parsimony bootstrap values (when applicable) below branches. The three pairs of cases wherein the DPO and RPO sequences from the same genome assembly that were not concatenated because of clear topological conflict between the gene trees are labeled, with the exception that the RPO sequence from Rhazya stricta has been excluded. Branches that are contradicted on the strict consensus of most parsimonious trees are indicated by asterisks flanking the highest contradictory jackknife value. The phylogram of this tree is presented in supplementary fig. S7, Supplementary Material online. "Tanglegram" illustrating the similarities and conflicts between the phylogenies of linear mitochondrial plasmids and angiosperms, based on the inferred maximum-likelihood topology (fig. 2) with branches with <50% bootstrap support collapsed into polytomies. This figure was generated with TreeMap v3 build 1243 (https://sites.google.com/site/cophylogeny/home, last accessed December 21, 2015).
F

Partial concatenation tree with reduced terminal sampling (27 terminals) that is arbitrarily rooted using midpoint rooting for consistency with the cophylogenetic analyses. The most likely topology identified by PhyML is shown with likelihood bootstrap (left) and SH-like aLRT (right) values above branches and parsimony bootstrap values (when applicable) below branches. The three pairs of cases wherein the DPO and RPO sequences from the same genome assembly that were not concatenated because of clear topological conflict between the gene trees are labeled, with the exception that the RPO sequence from Rhazya stricta has been excluded. Branches that are contradicted on the strict consensus of most parsimonious trees are indicated by asterisks flanking the highest contradictory jackknife value. The phylogram of this tree is presented in supplementary fig. S7, Supplementary Material online.

Despite the evidence for horizontal transfer, cophylogenetic analyses identified a nonrandom level of topological similarity between plasmid gene trees and the angiosperm phylogeny (fig. 3 and table 1). For all plasmid data sets (DPO, RPO, and a partial concatenation of both genes at different levels of taxon sampling; see Materials and Methods), we found more topological congruence with the angiosperm phylogeny than expected based on random tip-mapping (table 1). Nonrandom congruence between gene trees is typically taken as evidence of cotransmission or cospeciation (Brooks and McLennan 1993; Moran and Baumann 1994), although other mechanisms can potentially create this pattern (de Vienne et al. 2007; Andam et al. 2010). In this case, regions of similarity between plasmid and angiosperm phylogenies most likely reflect sustained periods of vertical transmission or mechanisms of horizontal transmission that favor transfers among very close relatives.
F

"Tanglegram" illustrating the similarities and conflicts between the phylogenies of linear mitochondrial plasmids and angiosperms, based on the inferred maximum-likelihood topology (fig. 2) with branches with <50% bootstrap support collapsed into polytomies. This figure was generated with TreeMap v3 build 1243 (https://sites.google.com/site/cophylogeny/home, last accessed December 21, 2015).

Table 1

Summary of Cophylogenetic Analyses

GeneSpeciesSequencesObserved CostRandom Cost (mean)P Value
DPO20242731.340.017
DPO23283237.840.002
RPO19242631.840.008
RPO23283238.420.004
Concatenated20272739.85<0.001
Concatenated34394856.38<0.001

Note.—For each data set, the observed cost is the minimum total event costs identified as being needed to reconcile the plasmid gene tree with the angiosperm phylogeny. Lower costs are indicative of more congruent trees. The random cost is derived from the mean of 1,000 permutations of the data set (random tip mappings), and the P value indicates where the observed cost falls within that random distribution. Two different analyses are reported for each gene/concatenation, corresponding to the full and reduced taxon samplings described in the Materials and Methods.

Summary of Cophylogenetic Analyses Note.—For each data set, the observed cost is the minimum total event costs identified as being needed to reconcile the plasmid gene tree with the angiosperm phylogeny. Lower costs are indicative of more congruent trees. The random cost is derived from the mean of 1,000 permutations of the data set (random tip mappings), and the P value indicates where the observed cost falls within that random distribution. Two different analyses are reported for each gene/concatenation, corresponding to the full and reduced taxon samplings described in the Materials and Methods.

Rapid Evolution of Plasmid and Plasmid-Derived Sequences in Angiosperms

The levels of sequence divergence between plasmid-derived sequences in angiosperm mitochondrial genomes greatly exceed what is typically observed for mitochondrial genes (supplementary fig. S6 and tables S2 and S3, Supplementary Material online). Even after extensive trimming to remove the most variable positions within the DPO and RPO alignments, plasmid-derived ORFs in angiosperms share as little as 52% amino acid identity (supplementary tables S2 and S3, Supplementary Material online). The divergence is even more striking when considered across the entire length of the untrimmed sequences. For example, two DPO sequences obtained from different populations of Silene vulgaris were identified as each other’s closest relatives in the dataset (fig. 2 and supplementary figs. S1 and S2, Supplementary Material online) and yet shared only 51% amino acid identity across their entire lengths. Similarly high levels of divergence were observed between each of these S. vulgaris DPO sequences and the copy found in the mitochondrial genome of its congener Silene latifolia, which diverged ∼5 Ma (Rautenberg et al. 2012). For comparison, there is only 0.2% amino acid sequence polymorphism between the two S. vulgaris populations and only 0.4% fixed divergence with S. latifolia for the set of eight complex I proteins encoded by the mitochondrial genome (Sloan, Muller, et al. 2012). In cases such as this where the plasmid gene trees reflect expected phylogenetic relationships (i.e., all the Silene samples cluster together), the extreme levels of sequence divergence between plasmid-derived genes are likely a result of high rates of sequence evolution rather than ancient divergence times that greatly exceed the divergence times between their angiosperm host species. There is one known case in which a sequence from a plant mitochondrial genome has been transferred to a linear plasmid, which occurred recently in the Z. mays lineage. A 474-bp region containing the functional transfer RNA (tRNA) gene trnW and the pseudogene ψtrnP is found in a 2.3-kb linear mitochondrial plasmid in Z. mays (Leon et al. 1989) but located in the mitochondrial chromosome in other grasses, including other Zea species (supplementary fig. S7, Supplementary Material online). This recent transfer event creates an opportunity to directly compare rates of evolution for sequences located on plasmids versus the mitochondrial chromosome. Maximum likelihood analysis of the ψtrnP-trnW region resulted in a longer branch length for Z. mays than in related grasses (fig. 4), indicating an accelerated rate of sequence evolution for the copy in the Z. mays plasmid. Relative rate tests confirmed that there was a statistically significant difference in nucleotide substitution rates between the plasmid-encoded ψtrnP-trnW region in Z. mays and the homologous region in four close relatives: Zea perennis (P = 0.011), Zea luxurians (P = 0.011), Tripsacum dactyloides (P = 0.005), and Sorghum bicolor (P = 0.035). None of the observed substitutions in Z. mays occurred within the functional trnW gene, which is completely identical to the inferred ancestral sequence for grasses (supplementary fig. S7, Supplementary Material online).
F

Maximum-likelihood phylogram based on the ψtrnP-trnW sequence region that is present in a small linear plasmid in Zea mays but located in the mitochondrial chromosome in related grasses. The branch lengths are scaled to number of nucleotide substitutions per site, illustrating the accelerated rate of sequence evolution in the plasmid copy. The placement of the root of the tree was determined using the outgroup Phoenix dactylifera.

Maximum-likelihood phylogram based on the ψtrnP-trnW sequence region that is present in a small linear plasmid in Zea mays but located in the mitochondrial chromosome in related grasses. The branch lengths are scaled to number of nucleotide substitutions per site, illustrating the accelerated rate of sequence evolution in the plasmid copy. The placement of the root of the tree was determined using the outgroup Phoenix dactylifera. One potential nonbiological explanation for the higher observed levels of sequence divergence in Z. mays is that there were errors that occurred in the original sequencing of the plasmid (Leon et al. 1989). However, resequencing the ψtrnP-trnW region from 15 accessions of Z. mays (including B37, which was used for the original sequencing study) consistently produced a sequence that was almost identical to the previously published sequence except that it differed by a single SNP (supplementary fig. S7 and table S4, Supplementary Material online; GenBank accession KT444594). Repeating the relative rate tests with this new sequence produced qualitatively similar results (data not shown). Although we did not find SNPs within our chosen set of Z. mays samples, there was evidence of length polymorphism within individuals associated with a homopolymer region (positions 116–123 in GenBank accession KT444594). We consistently observed stuttering in sequencing reads after this region, indicating the presence of multiple competing products with varying homopolymer lengths.

Linear Plasmids as Causes of Heterogeneous Substitution Rates in Plant Mitochondrial Genomes

Based on three key observations, we hypothesize that linear plasmids are partially responsible for variation in rates of molecular evolution among genes in angiosperm mitochondrial genomes. First, there is a history of bidirectional transfer of DNA sequence between mitochondrial chromosomes and plasmids. The presence of the trnW gene in the small linear plasmid of Z. mays demonstrates that functional mitochondrial genes can be moved to plasmids (Leon et al. 1989), and whole-genome sequencing has revealed that mitochondrial chromosomes are littered with plasmid-derived insertions (fig. 1 and supplementary table S1, Supplementary Material online). Second, linear plasmids replicate independently of the mitochondrial chromosome and often encode their own viral-like DNA polymerases. Therefore, plasmids may experience more error-prone replication and/or fail to utilize the recombinational repair machinery that is likely responsible for low rates of nucleotide substitutions in plant mtDNA (Christensen 2013, 2014). Third, rates of sequence evolution for plasmid genes appear to be dramatically higher than for the mitochondrial chromosome. Based on these three observations, we propose a simple model in which mitochondrial genes are occasionally transferred to extrachromosomal plasmids, resulting in episodes of accelerated sequence evolution before being reincorporated into the mitochondrial chromosome. Our hypothesized model is supported by the observation that the chromosomally derived ψtrnP-trnW region in the small maize linear plasmid is evolving significantly faster than homologous sequences that are retained in the mitochondrial genome in closely related species (fig. 4). This model could explain the recent finding that some angiosperm mitochondrial genomes have experienced major gene-specific accelerations in synonymous substitution rates. The clearest examples of this phenomenon have been described in Ajuga (Zhu et al. 2014) and Silene (Sloan et al. 2009). Notably, we found relatively full-length insertions of plasmid polymerase genes in the mitochondrial genomes from species in each of these genera (supplementary table S1 and files S1 and S2, Supplementary Material online), suggesting especially recent interactions with linear plasmids. Furthermore, free linear mitochondrial plasmids have been identified (but not yet characterized with respect to sequence content) in some Swedish populations of S. vulgaris (Andersson-Ceplitis and Bengtsson 2002). Under a slight variant of this proposed model, it is also possible that plasmids and mitochondrial chromosomes could have duplicate copies of the same gene and that recombination (gene conversion) between the two copies could periodically introduce plasmid mutations into the mitochondrial genome. This mechanism could explain why the atp9 gene, which is unusually fast evolving throughout the tribe Sileneae, was found to exist in multiple copies in many Sileneae species (Sloan et al. 2009). Based on the hypothesis that mitochondrial linear plasmids are responsible for gene-specific accelerations in some angiosperm mitochondrial genomes, we would predict that further identification and sequencing of free linear plasmids in plant mitochondria will reveal additional examples of mitochondrial genes that have been acquired by plasmids and undergone accelerated rates of sequence evolution. To date, free mitochondrial plasmids have only been sequenced in four angiosperm species (Handa 2008), and the small linear plasmid in Z. mays is the only documented case in plants of a functional mitochondrial gene being transferred to a plasmid (Leon et al. 1989). Examining additional free linear plasmids in angiosperm mitochondria would be particularly valuable because there are some important uncertainties related to the accelerated rate of sequence evolution in the ψtrnP-trnW region in Z. mays. In particular, unlike many other plasmids, the small linear plasmid in Z. mays does not encode its own DNA polymerase gene, so it is not clear if and how it replicates autonomously. Nevertheless, the plasmid’s low GC content (36.5%) indicates that it is subject to different mutation pressures than the mitochondrial chromosome. Also, although the ψtrnP-trnW region on the Z. mays plasmid was subject to a significant rate acceleration (fig. 4), its overall level of sequence divergence is still low (>97% nucleotide identity with other Zea species), and we did not find evidence of SNPs in the plasmid-borne ψtrnP-trnW region among different Z. mays accessions (supplementary table S4, Supplementary Material online). Therefore, the extent to which the transfer of mitochondrial genes to linear plasmids could be responsible for much larger observed levels of sequence divergence remains unclear. The distribution and evolutionary history of linear mitochondrial plasmids in plants and their potential role in altering rates of sequence evolution have a number of parallels in mitochondrial evolution throughout the eukaryotic phylogeny. For example, the spread of linear plasmids bears many similarities to the distribution of mitoviruses (Bruenn et al. 2015). In addition, a similar hypothesis to what we present here regarding the effect of linear mitochondrial plasmids on rates of mitochondrial genome evolution has been proposed for the ciliate Oxytricha trifallax (Swart et al. 2012). It is also noteworthy that Pol γ, which is encoded in the nucleus but responsible for replication of the rapidly evolving mitochondrial genomes in fungi and metazoans, appears to be phage derived (Shutt and Gray 2006). Therefore, the invasion of selfish genetic elements with error-prone, viral-like replication machinery may be a recurring process that has shaped the dramatic variation in rates of mitochondrial sequence evolution across eukaryotes.

Materials and Methods

Green Plant Mitochondrial Genome Data Set

We obtained the complete nucleotide sequences of all published green plant mitochondrial genomes in the National Center for Biotechnology Information (NCBI) Genome website as of March 10, 2015 (supplementary table S1, Supplementary Material online). In addition, we were provided access to unpublished mitochondrial genome assemblies from the gymnosperms Ginkgo biloba and Welwitschia mirabilis and the ferns Equisetum hyemale and Ophioglossum californicum (Mower JP, personal communication). All genomes were analyzed based on their reported sequence on GenBank. Therefore, we cannot rule out the possibility that misassemblies may have occurred in the original studies if both integrated and free plasmids were present in the same mtDNA samples.

BLAST Searches to Identify Plasmid and Plasmid-Derived Sequences and to Determine Presence/Absence in Green Plant Mitochondrial Genomes

To identify published DNA sequences related to plant linear plasmids, DPO and RPO gene sequences from the B. napus mitochondrial linear plasmid (GenBank accession AB073400) were first searched against the entire NCBI nucleotide collection (nr/nt) database with NCBI-TBLASTN. Predicted DNA and RNA polymerase ORFs were extracted from identified BLAST hits using the program ORF Finder at the NCBI website (http://www.ncbi.nlm.nih.gov/gorf/gorf.html, last accessed December 21, 2015). To perform a more thorough search specifically in our set of green plant mitochondrial genomes, we used all identified plant ORFs longer than 1,500 bp as queries for NCBI-TBLASTN and NCBI-BLASTN version 2.2.30+ searches against the mitochondrial genomes. The TBLASTN searches were run with default parameters, and the BLASTN searches were run with the “-task BLASTN” option. BLAST hits were parsed and filtered based on an e-value threshold of 1 × 10−6 with a custom Perl script utilizing BioPerl modules (Stajich et al. 2002).

Alignment of Angiosperm DPO and RPO Sequences

To infer the evolutionary history of plasmid-derived DPO and RPO sequences found in angiosperms, we performed multiple sequence alignments followed by parsimony- and maximum-likelihood-based phylogenetic inference methods. Our BLAST searches against the NCBI nr/nt databases resulted in numerous hits outside of land plants, including fungal mitochondrial plasmids, bacteria, and viruses. However, our exploratory analyses indicated that these hits were highly divergent and could not be reliably aligned along most of their length. With one exception, the only hits to ORFs that had a minimum length of 500 bp and enough sequence similarity to be confidently aligned were found in angiosperms. The exception was a whole-genome assembly for the nematode Brugia timori (GenBank assembly accession GCA_000950975.1), which contained short contigs (<2 kb) that were highly similar to plant mitochondrial plasmid sequences. Given that these hits were only found on short contigs from an unfiltered genome assembly, we considered it likely that they were the result of contamination rather than true nematode sequence, and they were not included in subsequent alignments and phylogenetic analyses. Angiosperm DPO and RPO ORF sequences longer than 500 bp were translated into amino acids using the standard genetic code in MacClade version 4.08 (Maddison DR and Maddison WP 2001). To implement a form of the heads-or-tails alignment check (Landan and Graur 2007), the amino acid sequences were reversed using a custom Perl script. DPO and RPO sequences were aligned independently of each other using MAFFT version 7 online (Katoh and Standley 2013; supplementary files S1 and S2, Supplementary Material online). Nondefault options implemented in MAFFT were as follows: Iterative refinement method E-INS-i, amino acid scoring matrix BLOSUM45, and “leave gappy regions.” Numerous DPO and RPO sequences were identical or nearly identical to each other and could have biased our alignment trimming step (see below) by inflating estimated sequence similarity, thereby favoring inclusion of regions with several such sequences. The forward alignments were uploaded into MEGA version 6.06 (Tamura et al. 2013), which was used to calculate pairwise p distances between all sequences, with pairwise deletion for nonoverlapping sequences. Sequences with a pairwise distance of ≤ 0.06 were identified and the single longest sequence was maintained while the others were deleted. In cases with two or more sequences of identical length, one sequence was selected at random. Six sets of (near) identical DPO sequences were merged (from Beta, Cucumis, Ferrocalamus, Lolium, and Zea [two sets]), with a total of 24 sequences deleted (supplementary table S5, Supplementary Material online). Nine sets of (near) identical RPO sequences were merged (from Beta, Cucumis, Ferrocalamus, Lolium, Silene, Triticum, Vitis, and Zea [2 sets]), with a total of 21 sequences deleted (supplemented table S6, Supplementary Material online). Because of the high sequence divergence and the confounding effect caused by numerous indels, many regions appeared arbitrarily aligned in MAFFT’s global alignment. Sequences were trimmed using trimAl version 1.2 (Capella-Gutierrez et al. 2009). The first trimming step was used for the heads-and-tails alignments of DPO and RPO using a consistency score of 0.5, thereby decreasing the DPO (forward) alignment from 1,378 to 933 positions and the RPO (forward) alignment from 1,606 to 1,194 positions. The second trimming step applied a similarity score of 0.001, thereby decreasing the DPO alignment from 933 to 444 positions and the RPO alignment from 1,194 to 351 positions. Taken together, the two trimming steps reduced the average DPO sequence length from 626 to 309 amino acids and the average RPO sequence length from 732 to 252 amino acids. The resulting alignments were manually examined in MEGA and regions of individual sequences that were adjacent to gapped positions and appeared arbitrarily aligned were rescored as missing data (a total of 72 cells from 4 DPO sequences and a total of 55 cells from 4 RPO sequences). The final data matrices are provided as supplementary files S3–S5, Supplementary Material online.

Phylogenetic Analysis of Angiosperm DPO, RPO, and Partial Concatenation Sequences

Many of the DPO and RPO sequences are fragments rather than the entire gene (missing and inapplicable data represent 31% of the DPO data matrix and 29% of the RPO data matrix), and several sequences have zero sequence overlap. Therefore, it is important that our gene tree analysis methods be robust to cases wherein clades can only be ambiguously supported because of the distribution of missing data. Rigorous parsimony analyses followed by calculating the strict consensus of all most parsimonious trees (for the entire matrix as well as within each resampling pseudoreplicate) are highly robust to these cases (Goloboff and Pol 2005; Simmons and Goloboff 2014). Parsimony-based gene tree analyses were conducted using TNT version 1.1 May 2014 (Goloboff et al. 2008), with branch support calculated using the strict consensus jackknife (Farris et al. 1996; Davis et al. 1998). Ten thousand tree bisection reconnection (TBR) tree searches with up to 1,000 trees held per search were conducted to search for the most parsimonious trees with TBR collapsing implemented (Goloboff and Farris 2001), followed by calculation of the strict consensus (Schuh and Polhemus 1980). Jackknife analyses were conducted using 1,000 pseudoreplicates and a deletion probability of 0.37. Each pseudoreplicate consisted of 100 TBR searches with up to 1,000 trees held per search and TBR collapsing implemented. Jackknife values were then mapped onto the strict consensus of most parsimonious trees using TreeGraph2 version 2.2.0 (Stöver and Müller 2010), following Simmons and Freudenstein (2011). Different implementations of maximum likelihood, including different programs, models, and search settings, can produce divergent topologies and branch support values when applied to data matrices with high amounts of nonrandomly distributed missing data (Simmons and Norton 2013; Simmons and Randle 2014). PhyML (Guindon et al. 2010) and the Shimodaira-Hasegawa-like approximate likelihood ratio test (SH-like aLRT; Anisimova and Gascuel 2006; Guindon et al. 2010) have been identified as relatively robust to the artifact of providing high support for clades that can only be ambiguously supported because of the distribution of missing data. Likelihood-based gene tree analyses were conducted using PhyML version. 20120412, with branch support calculated using the bootstrap (Felsenstein 1985) and SH-like aLRT. The best-fit model for the complete sequence sampling version of each matrix was selected using the Akaike Information Criterion (AIC; Akaike 1974) in ProtTest version 3.2 (Abascal et al. 2005). In all cases the LG model (Le and Gascuel 2008) with the gamma distribution (Yang 1993) and estimated amino acid frequencies was chosen by the AIC and implemented in PhyML. One thousand subtree pruning regrafting (SPR) searches were conducted to search for the most likely tree (PhyML only ever outputs a single fully resolved optimal tree) and 1,000 bootstrap pseudoreplicates, with a single SPR search per pseudoreplicate, were conducted. Bootstrap and SH-like aLRT branch support values were then mapped onto the most likely tree using TreeGraph 2 version 2.2.0. Gene tree analyses that included potential outgroup DPO and RPO sequences from fungi were attempted in exploratory analyses (data not shown), but the likelihood-estimated branch lengths connecting these outgroup(s) to the plant-sourced ingroup sequences were >1 for DPO and >2.4 for RPO. Furthermore, the alignments were dubious in multiple regions and these outgroup(s) connected to the ingroup at very weakly supported internal branches. Therefore, no outgroups were included in our analyses and the trees are considered unrooted. Many clades in the DPO and RPO gene trees received very low branch support values (<50% by both the likelihood bootstrap and parsimony jackknife), even after exclusion of three very short and problematic sequences from each of the analyses (supplementary figs. S1–S4, Supplementary Material online). Therefore, in an attempt to increase branch support values and confidence in our trees, we performed a partial concatenation-based analysis (Kluge 1989; Lecointre and Deleporte 2005). DPO and RPO sequences that were obtained from the same plant genome assembly were concatenated, with three exceptions wherein strong topological conflict was identified between the DPO and RPO gene trees. First, the RPO sequence from L. perenne (JX999996) was resolved in a clade with the other two RPO sequences from this taxon in the main grass clade (supplementary fig. S4, Supplementary Material online), whereas the DPO sequence was resolved as sister to that from F. rimosivaginus JQ235168 well outside the main grass clade (supplementary fig. S2, Supplementary Material online). Second, the DPO sequence from Rhazya stricta was resolved as sister to that from Vaccinium macrocarpon (supplementary fig. S2, Supplementary Material online), whereas the RPO sequence was resolved as distantly related to that from Vaccinium (supplementary fig. S4, Supplementary Material online). Exploratory analyses (data not shown) confirmed that the resolution of the DPO and RPO sequences from Vaccinium were largely consistent with each other, unlike those from Rhazya. Third, all DPO sequences from members of Poaceae tribe Andropogoneae (i.e., Tripsacum, Zea; Grass Phylogeny Working Group II 2012) were resolved in a clade (supplementary fig. S2, Supplementary Material online), whereas the RPO sequence from T. dactyloides DQ984517 was resolved in a clade with those from Ferrocalamus and Lolium (supplementary fig. S4, Supplementary Material online). For these three cases the DPO sequence was treated as a different terminal from the associated RPO sequence in the partial concatenation analyses. The partial concatenation data matrix is provided as supplementary file S5, Supplementary Material online. In addition to the complete sequence sampling DPO, RPO, and partial concatenation analyses, another set of analyses was conducted using sequence subsampling. These sequence subsampling analyses were performed to help increase branch support and our confidence for resolution of the remaining terminals. In all three cases, a subset of the shortest sequences was excluded because these sequences were not resolved in well-supported clades in exploratory analyses (data not shown). Four sequences were removed for the DPO and RPO gene tree analyses (of 52–135 positions vs. the average of 307 positions for DPO; 81–103 positions vs. the average of 250 positions for RPO), and 12 sequences were removed from the partial concatenation analyses (48–250 positions vs. the average of 400 positions).

Cophylogenetic Analysis

To make inferences about the mode of linear plasmid transmission in plant mitochondrial genomes, the cophylogeny program Jane version 4 (Conow et al. 2010) was used to compare DPO and RPO gene trees with established species relationships among angiosperms. Jane is an event-cost–based method to quantify cophylogenetic signal. Event-cost methods aim to reconcile pairs of tree topologies by assigning costs to biologically plausible events, and finding the best reconstructions by minimizing global cost (de Vienne et al. 2013; Bellec et al. 2014). The event-cost parameters were set to default values of cospeciation = 0, duplication = 1, host switch = 2, sorting = 1, and failure to diverge = 1. For the genetic algorithm parameters, the population size and number of generations were set to 200 and 100, respectively. Otherwise, default genetic parameters were used, with the “Prevent Mid-Polytomy Events” option. All models were tested against a null distribution generated with 1,000 random tip mappings. The “host” tree was constructed based on the Angiosperm Phylogeny website version 13 (Stevens 2015) and individual phylogenetic studies for finer-scale relationships among grasses (Grass Phylogeny Working Group II 2012) and Zea species (Doebley 1990). Analyses were performed on plasmid gene tree topologies that were identified in the maximum-likelihood searches described above with branches with <50% bootstrap support collapsed into polytomies. Jane requires rooted trees as inputs, so each plasmid tree was midpoint rooted for these analyses. Separate tests were run for gene trees inferred from DPO, RPO, and partial concatenations.

Maize 2.3-kb Plasmid Comparison, Relative Rates Test, and Resequencing

We analyzed rates of sequence evolution in the region containing the tRNA genes ψtrnP and trnW that are normally found in angiosperm mitochondrial genomes but have recently been transferred to the 2.3-kb linear plasmid in Z. mays (Leon et al. 1989). Branch lengths were estimated by maximum likelihood for this region in Z. mays and in a sample of other monocots where it is still located in the mitochondrial chromosome in related monocots. Nucleotide sequences were aligned using MAFFT (supplementary file S6, Supplementary Material online). The scoring matrix was set to 1PAM, the leave gappy regions option was selected, and the remaining settings were left as default. The TVM+G model was selected as the best fitting based on AIC using jModelTest2 version 2.1.7 (Guindon and Gascuel 2003; Darriba et al. 2012). A maximum-likelihood tree search was performed with PhyML as described above. The resulting tree was visualized using MEGA. Relative rate tests (Tajima 1993) were conducted with MEGA to compare rates of nucleotide substitution in the ψtrnP-trnW gene region of the 2.3-kb linear plasmid in Z. mays to the homologous region in four close relatives (Z. perennis, Z. luxurians, T. dactyloides, and S. bicolor), using Bambusa oldhamii as the outgroup. To verify the accuracy of the originally reported Z. mays ψtrnP-trnW sequence (Leon et al. 1989), we performed polymerase chain reactions (PCR) and Sanger sequencing for multiple Z. mays accessions, including a representative of the B37 line that was used in the original study (supplementary table S4, Supplementary Material online). DNA was extracted from approximately 200 mg of leaf tissue collected within 30 days of germination, using a Qiagen DNeasy Plant Mini kit. The ψtrnP-trnW region was amplified using standard PCR protocols with the following primers: 5′-ATTATCCCTGTCCTGGGAAC-3′ and 5′-CCAACCGATACACAATTACGA-3′. The resulting PCR products were used as templates for Sanger sequencing with internal primers 5′-GGGAACAGATGGGAGACATA-3′ and 5′-TACGACATTGGGTTTTGGAG-3′ performed at the University of Chicago CCC DNA Sequencing and Genotyping Facility.

Supplementary Material

Supplementary figures S1–S7, tables S1–S6, and files S1–S7 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  65 in total

1.  Spurious 99% bootstrap and jackknife support for unsupported clades.

Authors:  Mark P Simmons; John V Freudenstein
Journal:  Mol Phylogenet Evol       Date:  2011-06-16       Impact factor: 4.286

2.  Molecular analysis of the linear 2.3 kb plasmid of maize mitochondria: apparent capture of tRNA genes.

Authors:  P Leon; V Walbot; P Bedinger
Journal:  Nucleic Acids Res       Date:  1989-06-12       Impact factor: 16.971

3.  Horizontal transfer of entire genomes via mitochondrial fusion in the angiosperm Amborella.

Authors:  Danny W Rice; Andrew J Alverson; Aaron O Richardson; Gregory J Young; M Virginia Sanchez-Puerta; Jérôme Munzinger; Kerrie Barry; Jeffrey L Boore; Yan Zhang; Claude W dePamphilis; Eric B Knox; Jeffrey D Palmer
Journal:  Science       Date:  2013-12-20       Impact factor: 47.728

4.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites.

Authors:  Z Yang
Journal:  Mol Biol Evol       Date:  1993-11       Impact factor: 16.240

5.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

6.  Nucleotide substitution analyses of the glaucophyte Cyanophora suggest an ancestrally lower mutation rate in plastid vs mitochondrial DNA for the Archaeplastida.

Authors:  David Roy Smith; Christopher J Jackson; Adrian Reyes-Prieto
Journal:  Mol Phylogenet Evol       Date:  2014-07-11       Impact factor: 4.286

7.  Widespread mitovirus sequences in plant genomes.

Authors:  Jeremy A Bruenn; Benjamin E Warner; Pradeep Yerramsetty
Journal:  PeerJ       Date:  2015-04-09       Impact factor: 2.984

8.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

9.  The "fossilized" mitochondrial genome of Liriodendron tulipifera: ancestral gene content and order, ancestral editing sites, and extraordinarily low mutation rate.

Authors:  Aaron O Richardson; Danny W Rice; Gregory J Young; Andrew J Alverson; Jeffrey D Palmer
Journal:  BMC Biol       Date:  2013-04-15       Impact factor: 7.431

10.  Cophylogenetic interactions between marine viruses and eukaryotic picophytoplankton.

Authors:  Laure Bellec; Camille Clerissi; Roseline Edern; Elodie Foulon; Nathalie Simon; Nigel Grimsley; Yves Desdevises
Journal:  BMC Evol Biol       Date:  2014-03-27       Impact factor: 3.260

View more
  10 in total

1.  Causes and Consequences of Rapidly Evolving mtDNA in a Plant Lineage.

Authors:  Justin C Havird; Paul Trapp; Christopher M Miller; Ioannis Bazos; Daniel B Sloan
Journal:  Genome Biol Evol       Date:  2017-02-01       Impact factor: 3.416

2.  PacBio-Based Mitochondrial Genome Assembly of Leucaena trichandra (Leguminosae) and an Intrageneric Assessment of Mitochondrial RNA Editing.

Authors:  Lynsey Kovar; Madhugiri Nageswara-Rao; Sealtiel Ortega-Rodriguez; Diana V Dugas; Shannon Straub; Richard Cronn; Susan R Strickler; Colin E Hughes; Kathryn A Hanley; Deyra N Rodriguez; Bradley W Langhorst; Eileen T Dimalanta; C Donovan Bailey
Journal:  Genome Biol Evol       Date:  2018-09-01       Impact factor: 3.416

3.  Homologous recombination changes the context of Cytochrome b transcription in the mitochondrial genome of Silene vulgaris KRA.

Authors:  Helena Štorchová; James D Stone; Daniel B Sloan; Oushadee A J Abeyawardana; Karel Müller; Jana Walterová; Marie Pažoutová
Journal:  BMC Genomics       Date:  2018-12-04       Impact factor: 3.969

4.  Horizontally-acquired genetic elements in the mitochondrial genome of a centrohelid Marophrys sp. SRT127.

Authors:  Yuki Nishimura; Takashi Shiratori; Ken-Ichiro Ishida; Tetsuo Hashimoto; Moriya Ohkuma; Yuji Inagaki
Journal:  Sci Rep       Date:  2019-03-19       Impact factor: 4.379

5.  The alternative reality of plant mitochondrial DNA: One ring does not rule them all.

Authors:  Alexander Kozik; Beth A Rowan; Dean Lavelle; Lidija Berke; M Eric Schranz; Richard W Michelmore; Alan C Christensen
Journal:  PLoS Genet       Date:  2019-08-30       Impact factor: 5.917

6.  Tandem integration of circular plasmid contributes significantly to the expanded mitochondrial genomes of the green-tide forming alga Ulva meridionalis (Ulvophyceae, Chlorophyta).

Authors:  Feng Liu; Hongshu Wang; Wenli Song
Journal:  Front Plant Sci       Date:  2022-08-05       Impact factor: 6.627

7.  Complete sequence of kenaf (Hibiscus cannabinus) mitochondrial genome and comparative analysis with the mitochondrial genomes of other plants.

Authors:  Xiaofang Liao; Yanhong Zhao; Xiangjun Kong; Aziz Khan; Bujin Zhou; Dongmei Liu; Muhammad Haneef Kashif; Peng Chen; Hong Wang; Ruiyang Zhou
Journal:  Sci Rep       Date:  2018-08-24       Impact factor: 4.379

8.  The Tempo and Mode of Angiosperm Mitochondrial Genome Divergence Inferred from Intraspecific Variation in Arabidopsis thaliana.

Authors:  Zhiqiang Wu; Gus Waneka; Daniel B Sloan
Journal:  G3 (Bethesda)       Date:  2020-03-05       Impact factor: 3.154

9.  Complete Sequence, Multichromosomal Architecture and Transcriptome Analysis of the Solanum tuberosum Mitochondrial Genome.

Authors:  Jean-Stéphane Varré; Nunzio D'Agostino; Pascal Touzet; Sophie Gallina; Rachele Tamburino; Concita Cantarella; Elodie Ubrig; Teodoro Cardi; Laurence Drouard; José Manuel Gualberto; Nunzia Scotti
Journal:  Int J Mol Sci       Date:  2019-09-26       Impact factor: 5.923

10.  Siberian larch (Larix sibirica Ledeb.) mitochondrial genome assembled using both short and long nucleotide sequence reads is currently the largest known mitogenome.

Authors:  Yuliya A Putintseva; Eugeniya I Bondar; Evgeniy P Simonov; Vadim V Sharov; Natalya V Oreshkova; Dmitry A Kuzmin; Yuri M Konstantinov; Vladimir N Shmakov; Vadim I Belkov; Michael G Sadovsky; Olivier Keech; Konstantin V Krutovsky
Journal:  BMC Genomics       Date:  2020-09-23       Impact factor: 3.969

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.