| Literature DB >> 31774499 |
Alexis R Sullivan1, Yrin Eldfjell2, Bastian Schiffthaler3, Nicolas Delhomme4, Torben Asp5, Kim H Hebelstrup6, Olivier Keech3, Lisa Öberg7, Ian Max Møller5, Lars Arvestad2, Nathaniel R Street3, Xiao-Ru Wang1.
Abstract
Plant mitogenomes can be difficult to assemble because they are structurally dynamic and prone to intergenomic DNA transfers, leading to the unusual situation where an organelle genome is far outnumbered by its nuclear counterparts. As a result, comparative mitogenome studies are in their infancy and some key aspects of genome evolution are still known mainly from pregenomic, qualitative methods. To help address these limitations, we combined machine learning and in silico enrichment of mitochondrial-like long reads to assemble the bacterial-sized mitogenome of Norway spruce (Pinaceae: Picea abies). We conducted comparative analyses of repeat abundance, intergenomic transfers, substitution and rearrangement rates, and estimated repeat-by-repeat homologous recombination rates. Prompted by our discovery of highly recombinogenic small repeats in P. abies, we assessed the genomic support for the prevailing hypothesis that intramolecular recombination is predominantly driven by repeat length, with larger repeats facilitating DNA exchange more readily. Overall, we found mixed support for this view: Recombination dynamics were heterogeneous across vascular plants and highly active small repeats (ca. 200 bp) were present in about one-third of studied mitogenomes. As in previous studies, we did not observe any robust relationships among commonly studied genome attributes, but we identify variation in recombination rates as a underinvestigated source of plant mitogenome diversity.Entities:
Keywords: mitogenome; rearrangement rates; recombination; repeats; structural variation
Mesh:
Substances:
Year: 2020 PMID: 31774499 PMCID: PMC6944214 DOI: 10.1093/gbe/evz263
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Strategy used to assemble the mitogenome of Picea abies. First, a support vector machine (SVM) was trained to identify mitochondrial-like scaffolds from the P. abies genome assembly. We used the classified scaffolds to identify PacBio Sequel subreads containing mitogenome-like 27-mers. These enriched reads were then assembled using three different pipelines. Scaffolds from each assembler with at least one mitochondrial protein coding gene were retained for assembly reconciliation and base-pair correction, thus yielding the final mitogenome draft.
Summary Statistics for Assemblies of Pacific Biosciences Sequel Subreads Enriched In silico for Mitogenome-Like k-mers
| Assembler | No. Contigs | N50 (Mb) | L50 | Longest Contig (Mb) | Assembly Size (Mb) |
|---|---|---|---|---|---|
| canu | |||||
| Raw | 383 | 0.06 | 18 | 1.10 | 10.25 |
| Upgraded | 276 | 0.32 | 4 | 2.36 | 9.76 |
| High conf. | 4 | 1.28 | 2 | 2.56 | 5.29 |
| MECAT | |||||
| Raw | 104 | 0.43 | 5 | 2.29 | 9.24 |
| Upgraded | 95 | 0.43 | 5 | 2.29 | 9.22 |
| High conf. | 6 | 0.70 | 2 | 2.29 | 5.10 |
| SMARTdenovo | |||||
| Raw | 59 | 0.76 | 2 | 3.60 | 7.31 |
| Upgraded | 55 | 0.76 | 2 | 3.60 | 7.32 |
| High conf. | 4 | 3.60 | 1 | 3.60 | 5.13 |
| Final draft | 4 | 3.42 | 1 | 3.42 | 4.90 |
Note.—“Raw” refers to the full contig output produced by each assembler. Upgraded assemblies have been processed with FinisherSC. High-confidence assemblies contain only contigs with at least one protein-coding mitochondrial gene. N50 and L50 are calculated from contig lengths.
Characteristics of the Picea abies Mitogenome
| Genome | |
| Size (Mb) | ∼4.90 |
| GC content | 44.7% |
| Annotation | |
| Repeat content | 15.15% |
| Direct and inverted | 14.25% |
| Tandem | 1.12% |
| Nuclear-mitochondrial DNA | 28.89% |
| Transposable elements | 7.72% |
| Plastid-derived DNA | 0.34% |
| Genes | 1.00% |
| Protein coding genes | 41 |
| Hypothetical proteins high/medium/low confidence | 14/2/4 |
| tRNAs | 17 |
| rRNAs | 3 |
Potential Sources of Mitogenome Size Variation among Gymnosperms
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| Genome size (Mb) | 0.41 | 0.35 | 4.90 | 5.20 | 1.19 | 0.98 |
| Plastid-derived DNA (kb) | 18 (4) | 0 (0) | 17 (0) | 18 (0) | 3 (0) | 9 (9) |
| Dispersed repeats (kb) | 109 (26) | 51 (15) | 699 (14) | 885 (17) | 83 (7) | 42 (4) |
| Nuclear-mtDNA (Mb) | – | 0.35 (100) | 1.40 (29) | 2.29 (44) | 0.60 (50) | – |
| Transposable elements (kb) | 6 (1) | 3 (1) | 482 (10) | 386 (7) | 129 (10) | 15 (2) |
Note.—Dispersed repeats include those in the inverted and direct orientation ≥50 bp and with ≥80% identity. Nuclear-mitochondrial DNA are shared sequences with no direction of transfer inferred. Transposable elements comprises long-terminal repeats (LTR) and non-LTR retrotransposons. Numbers in parenthesis indicate percent coverage of the mitogenome.
Scaffolds containing protein-coding mitochondrial genes extracted from the 5.9 Mb assembly.
. 2.—Recombination frequency at inverted repeats ≥50 bp with ≥80% pairwise identity as inferred from long reads mapping to expected recombination products (alternative genome configurations; AGCs). (A) Most repeat pairs have little or no evidence of recombination, but a minority are highly active. (B) Repeat length explains very little of the variation in recombination frequency (r2 = 0.03).
Recombination Patterns Summarized from 18 Published Vascular Plant Mitogenomes
| Species | Repeats ≥1,000 bp | Repeats <1,000 bp | Study | ||
|---|---|---|---|---|---|
| Proportion Active | Max AGC % | Proportion Active | Max AGC % | ||
|
| na | na | 0.17 | 4 |
|
|
| 1 | 50 | 0.08 | 5 |
|
|
| 0.25 | 50 | 0.00 | 0 |
|
|
| 0.40 | 31 | 0.04 | 23 |
|
|
| 1.00 | 50 | 0.00 | 0 |
|
|
| 1.00 | 50 | 0.38 | 50 |
|
|
| — | — | 0.00 | 0 |
|
|
| na | Na | 0.00 | 0 |
|
|
| 0.00 | 8 | 0.00 | 0 |
|
|
| 0.50 | 25 | 0.00 | 0 |
|
|
| 0.50 | 6 | 0.13 | 31 | This study |
|
| 0.00 | 0 | 0.03 | 2 |
|
|
| 0.48 | 13 | 0.06 | 15 |
|
|
| 0.78 | 10 | 0.00 | 0 |
|
|
| 1.00 | 50 | 0.14 | 10 |
|
|
| 1.00 | 33 | 0.26 | 24 |
|
|
| — | — | 0.66 | 50 |
|
|
| na | na | 0.00 | 0 |
|
Note.—“Proportion active” refers to the fraction of repeats producing alternative genome configurations (AGCs) inferred to be the product of recombination in frequencies ≥1.6% of the parent molecule. “Max AGC” denotes the maximum frequency obtained by any AGC in the given repeat size class. Missing data because repeats of a size class do not exist in a given genome are listed as “na”, whereas “–” indicates missing data due to study limitations.
Minimum detection threshold is ∼4%, thus this proportion is underestimated.
Only inverted repeats analyzed.
. 3.—The mitogenomes of Picea abies and P. glauca are extensively rearranged and collinear regions are limited to genic regions. An absolute rearrangement rate of 36–65/Myr is needed to explain this level of structural divergence. (A) Simplified diagram of the P. abies–P. glauca mitogenome alignment, where colored blocks represent corresponding homologous regions free of internal rearrangements (locally collinear blocks; LCBs) and heights are proportional to pairwise sequence identity. LCBs below the center line in P. glauca are inverted with respect to P. abies. White space indicates regions with no homology. Only LCBs longer than 2,000 bp are shown. (B) Two gene clusters widely conserved in plant mitogenomes are also found in Picea within a 9-kb block, which also serves to illustrate the typical extent of synteny beyond genic regions. Gene structures are indicated by yellow boxes and introns by black lines.
. 4.—Mean substitution rates at synonymous (dS) and nonsynonymous (dN) sites vary 100-fold among gymnosperms. Rates within Pinaceae, in gray, are more consistent but are about 6-fold higher in Picea than in the 75% smaller Pinus mitogenomes.