Literature DB >> 29126205

Complex analyses of inverted repeats in mitochondrial genomes revealed their importance and variability.

Jana Cechová1, Jirí Lýsek2, Martin Bartas1,3, Václav Brázda1.   

Abstract

Motivation: The NCBI database contains mitochondrial DNA (mtDNA) genomes from numerous species. We investigated the presence and locations of inverted repeat sequences (IRs) in these mtDNA sequences, which are known to be important for regulating nuclear genomes.
Results: IRs were identified in mtDNA in all species. IR lengths and frequencies correlate with evolutionary age and the greatest variability was detected in subgroups of plants and fungi and the lowest variability in mammals. IR presence is non-random and evolutionary favoured. The frequency of IRs generally decreased with IR length, but not for IRs 24 or 30 bp long, which are 1.5 times more abundant. IRs are enriched in sequences from the replication origin, followed by D-loop, stem-loop and miscellaneous sequences, pointing to the importance of IRs in regulatory regions of mitochondrial DNA. Availability and implementation: Data were produced using Palindrome analyser, freely available on the web at http://bioinformatics.ibp.cz. Contact: vaclav@ibp.cz. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29126205      PMCID: PMC6030915          DOI: 10.1093/bioinformatics/btx729

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Although most DNA of eukaryotic organisms is localized in chromosomes within the nucleus, mitochondrial DNA (mtDNA) is a very important part of the vast majority of eukaryotes. Mitochondria are double membrane-bound subcellular organelles which play a central role in metabolism (Brand, 1997), apoptosis (Kroemer ) and ageing (Kauppila ; Wei ). Moreover, defective mitochondrial dynamics play important roles in various human diseases including cancer (Srinivasan ). Cells usually contain hundreds to thousands of mitochondria in the cytoplasm. Mitochondria produce energy through oxidative phosphorylation production of adenosine triphosphate (ATP), the main source of energy in the cell. According to the endosymbiotic theory, mitochondria are derived from bacteria that were engulfed by the ancestors of today's eukaryotic cells (Archibald, 2015; Martin ). In higher eukaryotes, mtDNA codes for a small but crucial part of oxidative phosphorylation pathway proteins and independent translation machinery RNAs, compatible with bacterial translation and differing from translation of the nuclear genome. These data suggested that mitochondria evolved from bacteria that were endocytosed before animals and plants separated when oxygen entered the atmosphere about 1.5 × 109 years ago (López-García ). The majority of mitochondrial proteins are encoded currently in the cell nucleus. Even if the present organelle genomes are stable, extensive transfer of genes from organelle to nuclear DNA must have occurred during eukaryote evolution. For example, human mtDNA (and mtDNA of most animals) encodes 13 proteins and 24 RNAs [transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs)] (Boore, 1999). However, there are many longer mitochondrial genomes that contain additional genes compared to animal and yeast mitochondrial genomes (Barr ; Gualberto ). Local DNA structures such as cruciforms, left-handed DNA (Z-DNA), triplexes and quadruplexes play critical roles in regulating many fundamental biological functions (Cer ; Chasovskikh ; Paleček, 1991). Cruciform formation requires inverted repeats (IRs) of six or more nucleotides in the nucleic acid sequence (Mikheikin ). IRs are distributed non-randomly in the genomes of all living organisms. Although cruciforms are unstable in linear naked DNA because of branch migration (Shlyakhtenko ), cruciform formation has been identified in both prokaryotes and eukaryotes in vivo (Panayotatos and Fontaine, 1987; Yamaguchi and Yamaguchi, 1984). A number of proteins with preferential affinity for cruciforms have been identified, including 14-3-3 proteins and tumor suppressor protein p53 (Brazda , 2017; Brazda and Coufal, 2017) and nuclear DNA cruciforms can regulate DNA replication, gene expression and DNA recombination (Bikard ; Brázda ). The potential role of cruciforms in mtDNA has not been well studied. We analyzed IRs in all sequenced mitochondrial genomes to determine frequencies, localization and similarities. The data show IRs in mtDNA that have been conserved through evolution, pointing to the importance of IRs in mitochondrial as well as nuclear genomes.

2 Materials and methods

2.1 mtDNA sequences

Complete mtDNA sequences were downloaded from the genome database of the National Center for Biotechnology Information (NCBI).

2.2 Data analysis

We used computational core of our DNA analyser software written in Java (Brazda ). We did not use the web frontend of DNA analyser tool for this task. The program was modified to read NCBI identifiers of sequences. There was one text file for each group of species. After the file containing mtDNA sequence was downloaded from NCBI, an analysis process was launched to find IRs using recommended parameters for Palindrome analyser. IR size was set from 6 to 30 bp, spacer size 0 to 10 bp and maximally one mismatch was allowed. An example IR identified using such criteria is provided in Supplementary Figure S1. We produced a separate list of IRs found in each of the 7135 mtDNA sequences available in NCBI and overall reports for each of the 18 species groups. Raw results for each sequence contained IR signature and position, but we did not find these useful for further processing. Results for each species group contained a list of species with size of mtDNA sequence and number of IRs found in that sequence. We also counted IRs grouped by their individual size (6–30 bp individually and sum of IRs longer than 8, 10 and 12 bp).

2.3 Analysis of IRs around annotated NCBI features

We downloaded the so called feature tables containing annotations of known features in mtDNA sequences; see Supplementary Figure S2. We analyzed IR occurrence inside, before and after features grouped by name to obtain a file with numbers of IRs inside and around features for each group of species. Search for IRs took place in predefined feature neighbourhood (we used ±100 bp – this figure is important for calculating IR frequency in feature neighbourhood) and inside feature boundaries. We calculated the amount of all IRs and those longer than 8, 10 and 12 bp in regions before, inside and after features. Categorization of an IR according to its overlap with a feature or feature neighbourhood is shown in Supplementary Figure S3. Further processing was performed in Microsoft Excel.

2.4 Phylogenetic tree construction

Exact taxid IDs of all analyzed groups [obtained from Taxonomy Browser via NCBI Taxonomy Database (Federhen, 2011)] were downloaded to phyloT: a tree generator (http://phylot.biobyte.de) and a phylogenetic tree was constructed using function ‘Visualize in iTOL’ in Interactive Tree of Life environment (Letunic and Bork, 2016). The resulting tree is shown in Supplementary Figure S4.

2.5 Statistical analysis

Cluster dendrogram of IR incidence (Supplementary Table S1) was made in R v. 3. 4. 0 (R Core Team, 2014) using pvclust (Shimodaira, 2006) with the parameters: cluster method ‘average’, distance ‘uncentered’ and number of bootstrap replications ‘10 000’. Cluster method and distance choice was validated using function seplot. The resulting cluster dendrogram is shown in Supplementary Figure S5. Principle component analysis (PCA) interactive plots were made in R with ggplot2 (Wickham, 2016) and plotly (Sievert ). The R code is available in Supplementary Code S1. Incidence of IRs (categorized by length) in individual species groups were used as input data, so for each species group one PCA plot was constructed to display intragroup variability.

3 Results

MtDNAs in NCBI database are stored in five groups (Animals, Fungi, Other, Plants and Protists) and 18 taxonomy subgroups. We downloaded all 7135 mtDNA sequences available (listed in Supplementary list of sequences), which vary from 1136 to 1 999 602 bp (Basu ). We firstly compared mtDNA lengths in the 18 subgroups. Length variability is lower in animals than in fungi, plants and protists (Fig. 1). Contrast between larger groups with low variability (e.g. fishes or insects) and smaller groups with large variability in length of sequence is clearly observable. Length variability generally correlated with evolutionary age. The largest variability is observed in the group Plantae and Fungi while mtDNA lengths are relatively constant in Animalia. The longest mtDNAs are typical for Land plants, the shortest for Protists apicomplexan.
Fig. 1.

Variability of length and amount of mtDNAs. Box plots show sequence length interquartile ranges for different species groups. The whiskers represent the minimum and maximum values. Numbers of species in each group is visualized with bars (scale is on the secondary vertical axis)

Variability of length and amount of mtDNAs. Box plots show sequence length interquartile ranges for different species groups. The whiskers represent the minimum and maximum values. Numbers of species in each group is visualized with bars (scale is on the secondary vertical axis)

3.1 Analyses of IRs

The parameters of analysis by Palindrome analyser were; IR length of 6–30 bp, spacer size 0–10 bp and maximally one mismatch. Totally we analyzed 179 624 234 bases and found 7 540 694 IRs; the overall IR frequency is therefore 41.9 IR/Kbp. The differences between organisms are significant; 50% of mtDNAs have a frequency of 27–47 IR/Kbp, but IR frequencies range from 9.47 IR/Kbp in a unicellular red alga found in hot sulphur springs—Galdieria sulphuraria 074W, while Candida castellii CBS 4332 (Ascomycetes fungi, class Sacharomycetes) has a frequency of 248.50 IR/Kbp. Values for all groups are shown in Figure 2.
Fig. 2.

Frequency of IRs in mtDNAs for subgroups and numbers of mtDNAs. The box plot shows the interquartile ranges of IR frequencies per 1000 bp in different species groups. Whiskers represent the minimum and maximum values

Frequency of IRs in mtDNAs for subgroups and numbers of mtDNAs. The box plot shows the interquartile ranges of IR frequencies per 1000 bp in different species groups. Whiskers represent the minimum and maximum values The highest IR frequencies are in the groups Insects (89.33) and Ascomycetes (85.04) and the lowest in the group Birds (22.36). Statistics for all groups are provided in Supplementary Table S2. Statistical evaluations for each mtDNA are summarized in Supplementary Table S3. Comparing IRs in individual organisms and subgroups shows a general decrease in frequency with increasing IR length, except for IRs 24 and 30 bp long, which are 1.5 times more abundant than expected by approximation from neighbouring values (Table 1). We performed an additional analysis to distinguish IRs of 30 and 31 bp or longer. IRs longer than 30 bp were found in only 180 of 7135 mtDNA sequences.
Table 1.

Numbers and frequencies of IRs according to size

IR sizeNumber in datasetIR/1000bpIR sizeNumber in datasetIR/1000bpIR sizeNumber in datasetIR/1000bp
6446012624.83031543590.0243242540.0014
7184111010.24981628490.0159251130.0006
87173993.99391718070.0101261080.0006
92896011.61231811770.006627910.0005
101177090.6553198890.004926800.0004
11529390.2947206210.003529580.0003
12260480.1450214900.002730650.0004
13142520.0793222970.0017>304770.0027
1475560.0421232280.0013
Numbers and frequencies of IRs according to size The detailed results for all groups are summarized in Table 2. The most common longest IR varied from 11 (in mammals) to 18 (in plants). IRs longer than 30 bp are rare, but their presence is interesting and we made additional analyses of these IRs (see Supplementary comments).
Table 2.

MtDNA sizes and IR frequencies and lengths

Group nameNumber of seq.Median size [bp]Shortest sequenceLongest sequenceIR/Kbp – mean rangeLongest IR for 50% of seq. [bp]
Protists-apicomplexans245 977Plasmodium vivaxBabasia microti4714
(5 882 bp)(11 109 bp)42–56
Other protists7646 840Physarum polycephalumChromera velia7417
(14 503 bp)(430 597 bp)28–156
Plants green algae4845 175Polytomella parvaPseudendoclonium akinetum4718
(3 018 bp)(95 880 bp)17–81
Plants land plants174151 983Vicia fabaCorchorus capsularis3418
(1 478 bp)(1 999 602 bp)27–59
Other plants869 465Mesostigma virideChlorokybus atmophyticus4617
(42 424 bp)(201 763 bp)35–76
Ascomycetes18335 655Cryphonectria parasiticaSclerotinia borealis8517
(1 364 bp)(203 051 bp)21–249
Basidiomycetes2969 195Moniliophthora roreriRhizoctonia solani7215
(9 745 bp)(235 849 bp)37–140
Other fungi2358 788Spizellomyces punctatusGigaspora rosea DAOM5415
(1 136 bp)194757 (97 350 bp)28–138
Flatworms9613 968Taenia pisiformisSchmidtea mediterranea3512
(13 383 bp)(27 133 bp)11–98
Roundworms13713 960Xiphinema americanumRomanomermis culicivorax5615
(12 626 bp)(26 194 bp)19–131
Fishes2 29416 595Gadus ogacRhinochimaera pacifica2812
(15 564 bp)(24 889 bp)22–47
Insects99215 534Anaticola crassicornisHydropsyche pellucidula8915
(8 118 bp)(25 004 bp)23–195
Amphibians23117 175Gegeneophis ramaswamiiBreviceps adspersus3613
(15 897 bp)(28 757 bp)22–56
Reptiles27917 107Sphenodon punctatusHeteronotia binoei3012
(15 181 bp)(25 972 bp)19–48
Birds53416 826Malurus melanocephalusPenelopides panini2212
(15 568 bp)(22 737 bp)18–28
Mammals86016 543Macrotis lagotisLepus timidus3211
(15 289 bp)(17 755 bp)20–59
Other animals1 07415 754Clathrina clathrusAnadara sativa4812
(5 596 bp)(48 161 bp)12–157
Other7335 594Galdieria sulphurariaPhaeodactylum tricornutum4813
(21 428 bp)(77 356 bp)9–84
MtDNA sizes and IR frequencies and lengths The NCBI genome database contains mtDNA annotations. The best described are ‘gene’ (163 443), ‘tRNA’ (152 631), ‘rRNA’ (14 570) and ‘regulatory regions’ such as D-loop, replication origins and stem loops. Numbers of annotations at the time of analysis are given in Supplementary Table S4. The annotations used are those defined in the sequence metadata and may not be entirely accurate, however most are validated by several methods and we obtained very similar results with smaller subsets of well-characterized mitochondrial genomes. To compare IR frequencies at different locations we used the most commonly described location ‘gene’ as a standard for comparison with other locations. There are significant differences in IR frequency in diverse segments of mtDNAs. The largest relative increase of IR frequency is for replication origin sequences followed by D-loop, stem-loop and misc sequences (Fig. 3).
Fig. 3.

Differences in IR frequency by DNA locus. The chart shows IR frequencies comparison per 1000 bp between ‘gene’ annotation and other annotated locations from the NCBI database. We analyzed frequencies of all IRs (all) and of IRs with lengths 8 bp and longer (8+), 10 bp and longer (10+) and 12 bp and longer (12+) within annotated locations (inside) and before and after annotated locations (Color version of this figure is available at Bioinformatics online.)

Differences in IR frequency by DNA locus. The chart shows IR frequencies comparison per 1000 bp between ‘gene’ annotation and other annotated locations from the NCBI database. We analyzed frequencies of all IRs (all) and of IRs with lengths 8 bp and longer (8+), 10 bp and longer (10+) and 12 bp and longer (12+) within annotated locations (inside) and before and after annotated locations (Color version of this figure is available at Bioinformatics online.) The frequency of IRs located in replication origins is double that of IRs located in genes. Frequency changes are more distinct for longer IRs; 4-fold higher for IRs 8 bp and longer, 8-times for 10 bp or longer and 15-times for IRs 12 bp and longer (Fig. 3, orange). There are also changes in frequency in the neighbourhoods of annotated sequences (Fig. 3). The highest enrichment is not only within replication origins and stem loops, but also 100 bp before and after these sequences. Overall statistics of IRs in near neighbourhood and overlapping with annotations are shown in Supplementary Table S5. The ratios of IR frequencies of different annotation classes to gene class are given in Supplementary Table S6.

4 Discussion

In this paper, we analyzed all available mitochondrial genomes for the presence and localization of IRs. The typical maximal IR length was 12–14 bp, although many mtDNAs contain longer IRs. For statistical purposes, we compared IRs of 6 to 30 bp, which can be bound by DNA binding proteins and can form cruciform structures (Brázda ). Surprisingly, substantial numbers of longer IRs are detected in some mtDNAs. See supplementary comments for details of these extended IR sequences. Homo sapiens has one of the lowest mtDNA IR frequencies (21.67 IR/Kbp), with only 359 IRs identified. Furthermore, only 24 are perfect (the other 335 IRs have one mismatch). The two longest IRs are 10 bp long, one with the sequence CCCCTTCGAC (one mismatch and CTT spacer) located in the middle of the ND1 gene [NADH dehydrogenase, subunit 1 (complex I)] and the other with the sequence GTCCAAAGAG (no mismatch and GAACAG spacer) located within the RNR2 gene (mitochondrially encoded 16S RNA). Interestingly, Gorilla gorilla mtDNA contains 410 IRs (25.06 IR/Kbp) and Pan troglodytes mtDNA contains 384 IRs (23.20 IR/Kbp). This IR reduction (Gorilla > Pan > Homo) is in congruence with phylogenetic relationships in hominidae (Pozzi ). In the lower primate group, Lemuriform primate Lemur catta has 554 IRs (32.52 IR/Kbp); tarsiformis primate Tarsius bancanus has 593 IRs with an average frequency 35.03 IR/Kbp. PCA interactive plots intuitively represent similarities in pattern of IR length between all subgroups of organisms (Supplementary Plot P1) and between particular organisms within each subgroup (Supplementary plots P2–P19). The most distinct group is Protists Apicomplexans and all vertebrate subgroups are close together. Land Plants and Green Algae are also closely related by their IR incidence. Therefore, IRs in mitochondrial genomes are copying evolutionary trends and are relatively well conserved between organisms within each phylogenetic clade. From this point of view, IR pattern/incidence could be used as a new additional phylogenetic marker in the future. Our analyses of all accessible mitochondrial genomes show that IR sequences are abundant and non-randomly distributed in the mitochondrial genomes of all living organisms. However, the frequencies of IRs differ between phylogenetic groups. The lowest average IR/Kbp was found in a unicellular polyextremophylic red alga Galdieria sulphuraria strain 074W, an acido-thermophile that can grow both autotrophically and heterotrophically in the dark. Other than living in extreme conditions of temperature and acidity, it also tolerates high metal ion concentrations. This mt genome of 21 428 bp has only 9.47 IR/Kbp and no IR is longer than 9 bp. Plastid and mitochondrial genomes of this organism show many extreme features, for example the mitochondrial genome is much smaller than other algae (Jain ). We have not found any mitochondrial genome without IRs. Most mitochondrial genomes have numerous IRs especially in regulatory regions such as replication origin and D-loop region. These results point to the importance of IRs in basic biological processes. Click here for additional data file.
  28 in total

1.  A cruciform structural transition provides a molecular switch for chromosome structure and dynamics.

Authors:  L S Shlyakhtenko; P Hsieh; M Grigoriev; V N Potaman; R R Sinden; Y L Lyubchenko
Journal:  J Mol Biol       Date:  2000-03-10       Impact factor: 5.469

Review 2.  Animal mitochondrial genomes.

Authors:  J L Boore
Journal:  Nucleic Acids Res       Date:  1999-04-15       Impact factor: 16.971

3.  Searching for non-B DNA-forming motifs using nBMST (non-B DNA motif search tool).

Authors:  R Z Cer; K H Bruce; D E Donohue; N A Temiz; U S Mudunuri; M Yi; N Volfovsky; A Bacolla; B T Luke; J R Collins; R M Stephens
Journal:  Curr Protoc Hum Genet       Date:  2012-04

4.  Effect of DNA supercoiling on the geometry of holliday junctions.

Authors:  Andrey L Mikheikin; Alexander Y Lushnikov; Yuri L Lyubchenko
Journal:  Biochemistry       Date:  2006-10-31       Impact factor: 3.162

Review 5.  The mitochondrial death/life regulator in apoptosis and necrosis.

Authors:  G Kroemer; B Dallaporta; M Resche-Rigon
Journal:  Annu Rev Physiol       Date:  1998       Impact factor: 19.318

6.  Superhelical DNA as a preferential binding target of 14-3-3γ protein.

Authors:  Václav Brázda; Jana Cechová; Jan Coufal; Sigrun Rumpel; Eva B Jagelská
Journal:  J Biomol Struct Dyn       Date:  2012

7.  Palindrome analyser - A new web-based server for predicting and evaluating inverted repeats in nucleotide sequences.

Authors:  Václav Brázda; Jan Kolomazník; Jiří Lýsek; Lucia Hároníková; Jan Coufal; Jiří Št'astný
Journal:  Biochem Biophys Res Commun       Date:  2016-09-04       Impact factor: 3.575

Review 8.  Mitochondrial theory of aging matures--roles of mtDNA mutation and oxidative stress in human aging.

Authors:  Y H Wei; Y S Ma; H C Lee; C F Lee; C Y Lu
Journal:  Zhonghua Yi Xue Za Zhi (Taipei)       Date:  2001-05

9.  The replication origin of pSC101: the nucleotide sequence and replication functions of the ori region.

Authors:  K Yamaguchi; M Yamaguchi
Journal:  Gene       Date:  1984 Jul-Aug       Impact factor: 3.688

Review 10.  Cruciform structures are a common DNA feature important for regulating biological processes.

Authors:  Václav Brázda; Rob C Laister; Eva B Jagelská; Cheryl Arrowsmith
Journal:  BMC Mol Biol       Date:  2011-08-05       Impact factor: 2.946

View more
  13 in total

1.  ImtRDB: a database and software for mitochondrial imperfect interspersed repeats annotation.

Authors:  Viktor A Shamanskiy; Valeria N Timonina; Konstantin Yu Popadin; Konstantin V Gunbin
Journal:  BMC Genomics       Date:  2019-05-08       Impact factor: 3.969

2.  p73, like its p53 homolog, shows preference for inverted repeats forming cruciforms.

Authors:  Jana Čechová; Jan Coufal; Eva B Jagelská; Miroslav Fojta; Václav Brázda
Journal:  PLoS One       Date:  2018-04-18       Impact factor: 3.240

3.  Repetitive DNA profile of the amphibian mitogenome.

Authors:  Noel Cabañas; Arturo Becerra; David Romero; Tzipe Govezensky; Jesús Javier Espinosa-Aguirre; Rafael Camacho-Carranza
Journal:  BMC Bioinformatics       Date:  2020-05-19       Impact factor: 3.169

4.  Common Repeat Elements in the Mitochondrial and Plastid Genomes of Green Algae.

Authors:  David Roy Smith
Journal:  Front Genet       Date:  2020-05-12       Impact factor: 4.599

5.  PCIR: a database of Plant Chloroplast Inverted Repeats.

Authors:  Rui Zhang; Fangfang Ge; Huayang Li; Yudong Chen; Ying Zhao; Ying Gao; Zhiguo Liu; Long Yang
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

6.  Complex Analyses of Short Inverted Repeats in All Sequenced Chloroplast DNAs.

Authors:  Václav Brázda; Jiří Lýsek; Martin Bartas; Miroslav Fojta
Journal:  Biomed Res Int       Date:  2018-07-24       Impact factor: 3.411

7.  The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria.

Authors:  Martin Bartas; Michaela Čutová; Václav Brázda; Patrik Kaura; Jiří Šťastný; Jan Kolomazník; Jan Coufal; Pratik Goswami; Jiří Červeň; Petr Pečinka
Journal:  Molecules       Date:  2019-05-02       Impact factor: 4.411

8.  Can Green Algal Plastid Genome Size Be Explained by DNA Repair Mechanisms?

Authors:  David Roy Smith
Journal:  Genome Biol Evol       Date:  2020-02-01       Impact factor: 3.416

Review 9.  Structures and stability of simple DNA repeats from bacteria.

Authors:  Vaclav Brazda; Miroslav Fojta; Richard P Bowater
Journal:  Biochem J       Date:  2020-01-31       Impact factor: 3.857

10.  G-Quadruplexes in the Archaea Domain.

Authors:  Václav Brázda; Yu Luo; Martin Bartas; Patrik Kaura; Otilia Porubiaková; Jiří Šťastný; Petr Pečinka; Daniela Verga; Violette Da Cunha; Tomio S Takahashi; Patrick Forterre; Hannu Myllykallio; Miroslav Fojta; Jean-Louis Mergny
Journal:  Biomolecules       Date:  2020-09-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.