| Literature DB >> 35446419 |
Peter D Fields1,2, Gus Waneka1, Matthew Naish3, Michael C Schatz4, Ian R Henderson3, Daniel B Sloan1.
Abstract
Intracellular transfers of mitochondrial DNA continue to shape nuclear genomes. Chromosome 2 of the model plant Arabidopsis thaliana contains one of the largest known nuclear insertions of mitochondrial DNA (numts). Estimated at over 600 kb in size, this numt is larger than the entire Arabidopsis mitochondrial genome. The primary Arabidopsis nuclear reference genome contains less than half of the numt because of its structural complexity and repetitiveness. Recent data sets generated with improved long-read sequencing technologies (PacBio HiFi) provide an opportunity to finally determine the accurate sequence and structure of this numt. We performed a de novo assembly using sequencing data from recent initiatives to span the Arabidopsis centromeres, producing a gap-free sequence of the Chromosome 2 numt, which is 641 kb in length and has 99.933% nucleotide sequence identity with the actual mitochondrial genome. The numt assembly is consistent with the repetitive structure previously predicted from fiber-based fluorescent in situ hybridization. Nanopore sequencing data indicate that the numt has high levels of cytosine methylation, helping to explain its biased spectrum of nucleotide sequence divergence and supporting previous inferences that it is transcriptionally inactive. The original numt insertion appears to have involved multiple mitochondrial DNA copies with alternative structures that subsequently underwent an additional duplication event within the nuclear genome. This work provides insights into numt evolution, addresses one of the last unresolved regions of the Arabidopsis reference genome, and represents a resource for distinguishing between highly similar numt and mitochondrial sequences in studies of transcription, epigenetic modifications, and de novo mutations.Entities:
Keywords: CpG methylation; intracellular gene transfer; numt; nupt; structural variants; tandem duplications
Mesh:
Substances:
Year: 2022 PMID: 35446419 PMCID: PMC9071559 DOI: 10.1093/gbe/evac059
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 4.065
Fig. 1.Structure of the Arabidopsis chromosome 2 numt. (A) A simplified circular representation of the Arabidopsis mitogenome. The sequence from the C24 ecotype was used for structural comparisons with the numt because the Col-0 mitogenome contains rearrangements associated with recombination at small repeats (see main text). This conformation of the C24 mitogenome corresponds to the previously described D′–A′–C–B structure (Stupar et al. 2001). R1 and R2 indicate the two large pairs of repeats. Intervening single-copy regions are in different colors, also indicated on the mitogenome map in panel C. (B) Recombination between a pair of repeats in the mitogenome produces four possible alternative combinations of flanking sequences (as shown for R1) which are thought to be present at near equal frequencies in tissue samples. The first three of these conformations are all found within the numt. (C) Structural comparison of the numt and mitogenome. The mitogenome sequence (top) is annotated with the large repeat sequences (R1 and R2) and pairs of breakpoints (BP1, BP2, and BP3) associated with chimeric fusions in the numt that are possibly the result of non-homologous end joining. Green shaded regions show blocks of syntenic sequence conserved in the numt (bottom). Tick marks below the numt show SNVs (black) and indel/structural variants (red) relative to the Col-0 mitogenome sequence. Some large sections of the mitogenome appear three times in the numt (indicated in shades of gray to black in the 3-Copy Reps row). The curved gray lines connect pairs of variants where two copies share an allele that differs from the mitogenome and the other repeat copy. The colored blocks show locations of four BACs originally used to assemble this genome region. The darker block for each BAC indicates the actual location of that BAC within the numt. The blocks in fainter colors represent repeated sequences similar to the BAC. The adjacent white boxes (Tandem Dup) represent the resulting copies from a putative 135-kb tandem duplication that occurred within the nuclear genome after the numt had already begun to diverge in sequence. The repetitive structure of the numt led to the T17H1 BAC being incorrectly overlapped with the T5M2 and T18C6 BACs in the original Arabidopsis genome assembly, resulting in the exclusion of two large regions of intervening sequences (indicated by the black lines in the Omitted Seq row).
Fig. 2.Hypothesized process leading to the origin and evolution of the Arabidopsis chromosome 2 numt. The numt appears to have arisen from multiple fragments of mtDNA, including duplicate copies of some regions. These fragments likely integrated into the nuclear genome through non-homologous end joining, followed by the accumulation of sequence variants (indicated by asterisks). A tandem duplication of a large (∼135 kb) region appears to have occurred after numt sequence divergence began based on the presence of shared variants (curved gray lines) between the two duplicate copies of this region. Note that multiple caveats and alternative interpretations apply to this model (see main text). For instance, it is possible that a larger multimeric insertion explains some of the repeated sequence content rather than a fusion of multiple mtDNA fragments at the time of insertion. It is also possible that a complex history of gene conversion between repeated sequences within the numt is responsible for the shared sequence variants.
Sequence Variants Distinguishing the Arabidopsis Chromosome 2 Numt from the Col-0 Reference Mitogenome Sequence
| Variant | Count |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
Fig. 3.Nanopore-derived estimates of methylation percentage across chromosome 2 of the Col-CEN assembly (after updating it to include the full numt) in CpG (purple), CHG (teal), and CHH (yellow) contexts. (A) Methylation profile including all reads (>30 kb) averaged over 50 kb windows. The boundaries of the numt region are indicated with asterisks and vertical black lines on the x-axis. (B) The same profile after excluding mitogenome-derived reads based on SNVs that distinguish the numt and mitogenome, which greatly increases the estimated methylation levels in the numt because of the lack of methylation in the actual mitogenome. (C) Methylation profile of 650 kb on the telomere side of the numt (left) across the numt (middle) and 650 kb on the centromere side of the numt (right) averaged over 1 kb windows.