Literature DB >> 30140690

Complex Analyses of Short Inverted Repeats in All Sequenced Chloroplast DNAs.

Václav Brázda1, Jiří Lýsek2, Martin Bartas3, Miroslav Fojta1.   

Abstract

Chloroplasts are key organelles in the management of oxygen in algae and plants and are therefore crucial for all living beings that consume oxygen. Chloroplasts typically contain a circular DNA molecule with nucleus-independent replication and heredity. Using "palindrome analyser" we performed complete analyses of short inverted repeats (S-IRs) in all chloroplast DNAs (cpDNAs) available from the NCBI genome database. Our results provide basic parameters of cpDNAs including comparative information on localization, frequency, and differences in S-IR presence. In a total of 2,565 cpDNA sequences available, the average frequency of S-IRs in cpDNA genomes is 45 S-IRs/per kbp, significantly higher than that found in mitochondrial DNA sequences. The frequency of S-IRs in cpDNAs generally decreased with S-IR length, but not for S-IRs 15, 22, 24, or 27 bp long, which are significantly more abundant than S-IRs with other lengths. These results point to the importance of specific S-IRs in cpDNA genomes. Moreover, comparison by Levenshtein distance of S-IR similarities showed that a limited number of S-IR sequences are shared in the majority of cpDNAs. S-IRs are not located randomly in cpDNAs, but are length-dependently enriched in specific locations, including the repeat region, stem, introns, and tRNA regions. The highest enrichment was found for 12 bp and longer S-IRs in the stem-loop region followed by 12 bp and longer S-IRs located before the repeat region. On the other hand, S-IRs are relatively rare in rRNA sequences and around introns. These data show nonrandom and conserved arrangements of S-IRs in chloroplast genomes.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30140690      PMCID: PMC6081594          DOI: 10.1155/2018/1097018

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

Inverted repeat sequences (IRs) play an important regulation role in genomic DNA [1]. It has been demonstrated that short IRs (S-IRs) are essential for DNA replication in both prokaryotes and eukaryotes [2]. Depending on their length, sequence, and crowding conditions, S-IRs can form a local DNA structure called cruciform, which is a target of numerous DNA-binding proteins with important regulatory functions [3]. Among these proteins are cruciform resolving proteins like E. coli enzyme RuvA [4, 5] and proteins important for human diseases including the 14-3-3 [6] and p53 protein families [7, 8]. Similarly, high mobility group- (HMG-) like cruciform binding proteins have also been purified from plants [9] and it was demonstrated that IRs in chloroplast DNA (cpDNA) genomes play a role in evolution and various regulatory processes [10, 11]. Both chloroplasts and mitochondria retain their own genomes and transcription machinery. S-IRs in mitochondrial DNA (mtDNA) are located nonrandomly with high abundance in the replication origin, D-loop, and stem-loop sequences [12]. While mitochondria are present in all eukaryotes, chloroplasts are exclusive for algae and plants. Therefore, it is thought that mitochondria were engulfed before the origin of chloroplasts by a common ancestor. This is also supported by the fact that mtDNA is usually shorter than cpDNA. Chloroplasts are semiautonomous organelles; their origin dates back to over 1,000 million years ago, when an ancient cyanobacterium was engulfed by a eukaryotic cell (primary endosymbiotic event), which subsequently gave rise to glaucophytes, red algae, green algae, and plants [13] and probably most recently to the filose amoeba Paulinella chromatophore [14]. Subsequent secondary, serial secondary, and tertiary endosymbiotic processes have been important during the evolution of modern photosynthetic eukaryotes [15]. In land plants, cpDNA sequences are highly conserved and the genome can be divided into four different regions: large single copy (LSC) and small single copy (SSC), which are separated from each other by two large internal repeat regions. A cpDNA genome typically contains between 120 and 130 genes, mainly involved in photosynthesis, transcription, and translation. The cpDNA genome is usually between 107 kbp (Cathaya argyrophylla, Pinaceae family) and 218 kbp (Pelargonium, Geraniaceae family) in size. However, smaller cpDNA genomes are also common; for example, Apicomplexa genomes are around 30 kbp. On the other hand, the chromophore genome of Paulinella chromatophora is larger than some complete bacterial genomes, with 1 022 kbp due to the unique recent engulfing event. Chloroplast genome size is independent of nuclear genome size [16]. The evolution of cpDNA genes is slower than that of nuclear genes [17], but faster than that of mtDNA genes [18]. Many of today's algae and almost all land plants carry two identical copies of a large coding sequence called the large IR A and IR B regions (varying from 20 kbp to 36 kbp), whose sequences may differ remarkably in individual species. In algae, the whole large IR has been lost multiple times during evolution [19]. In the parasitic and often nonphotosynthetic plants of Orobanchaceae family, the independent losses of one large IR region occur; Conopholis americana has the smallest cpDNA genome of land plants (45 kb) [20]. Loss of the large IR was also observed in some legumes (Papilionoideae subfamily, for example) [21]. Recently, it was found that, compared to single copy genes, synonymous substitution rates are on average 3.7-fold slower in chloroplast large IR genes (mainly coding for ribosomal, transfer RNAs, and ribosomal proteins). This may be due to the duplicative nature of the large IR, which reduces the substitution rate within these regions [22]. cpDNA rearrangements are more frequent when a large IR region is lost, suggesting that large IR regions are important for maintaining the conserved arrangement of cpDNA sequences [23]. In our analyses, we focused only on “short” IRs (6-60 bp) which are able to form cruciform structures. We analyzed these S-IRs in all sequenced cpDNA genomes to determine frequencies, localization, and similarities/differences.

2. Materials and Methods

2.1. Analyses of cpDNA Sequences

The set of 2,566 complete plastid cpDNA sequences were downloaded from the genome database of the National Center for Biotechnology Information (NCBI). We used the computational core of our DNA analyser software [24] modified to read NCBI identifiers of sequences from a text file and to download them. The parameters of analyses were size of S-IR 6 to 60 bp and spacer size 0 to 10 bp, and maximally one mismatch was allowed. Analysis produced a separate list of S-IRs found in each of the 2,566 cpDNA sequences and an overall report for each selected group. Overall results for each species group contained a list of species with the size of its cpDNA sequence and the number of S-IRs found in that sequence. Our software also counted S-IRs grouped by their individual size (6–60 bp individually and the sum of S-IRs longer than 8 bp, 10 bp, and 12 bp).

2.2. Analysis of S-IRs around Annotated NCBI Features

We downloaded the genome feature tables from the NCBI database along with the cpDNA sequences. We performed analysis of S-IR occurrence inside and around (before and after) recorded features. Features were grouped by their name stated in the feature table file. From this analysis we obtained a file with feature names and numbers of S-IRs found inside and around features for each group of species analyzed. Search for S-IRs took place in predefined feature neighborhoods (we used ±100 bp; this figure is important for calculation of S-IR frequency in feature neighborhood) and inside feature boundaries. We calculated the amount of all S-IRs and those longer than 8, 10, and 12 bp in regions before, inside, and after features. The categorization of an S-IR according to its overlap with a feature or feature neighborhood is demonstrated by the example shown in Supplementary Figure S1. Further processing was performed in Microsoft Excel.

2.3. Analyses of Similarities

Similarity among S-IRs was performed for those with abundant presence in the cpDNA genomes by Levenshtein algorithm, which counts distance between two strings according to the number of deletions, insertions, or substitutions required to transform source string into target string [25].

2.4. Phylogenetic Tree Construction

Exact taxid IDs of all analyzed groups (obtained from Taxonomy Browser via NCBI Taxonomy Database [26]) were downloaded to phyloT: a tree generator (http://phylot.biobyte.de) and a phylogenetic tree was constructed using function “Visualize in iTOL” in Interactive Tree of Life environment [27]. The resulting tree is shown in Supplementary Figure 2.

2.5. Statistical Analysis

Cluster dendrogram of S-IR frequency data (Supplementary Table S1) was made in R v. 3. 4. 3 (R Core Team, 2016) using the hclust function with the cluster method “ward.D2”. The resulting cluster dendrogram is shown in Supplementary Figure S2. Principal component analysis (PCA) interactive plots were made in R with ggplot2 [28] and plotly [29]. The R code is available in Supplementary Code S1. Frequency of S-IRs (categorized by length) in individual species groups was used as input data, so one PCA plot was constructed for each species group to display intragroup variability.

3. Results and Discussion

3.1. General Statistics for cpDNA

cpDNAs are stored in the genome database in the three taxonomy groups (Protists, Plants, and others) and four subgroups (Apicomplexans, Green Algae, Plants, and others). However, the vast majority of sequences belong to the Plants subgroup (2,278), compared to Apicomplexans (36) and Green Algae (107). Due to discrepancies in the number of sequenced cpDNA genomes in diverse groups (for example, the phylogenetically important group Euglenozoa has only 9 sequenced cpDNAs whereas Rosids and Asterids in the Pentapalae group [30] each have more than 400), we divided downloaded sequences into 21 phylogenetically related groups with a minimum of eight members in each group (see Supplementary Figure 2). This division allows us to observe detailed trends in S-IR frequency evolution across the Plant kingdom (from evolutionary oldest Bryophyta to Polypodiopsida, Acrogymnospermae, Basal Magnoliophyta, Magnoliidae, Alismatales, Dioscoreales, Liliales, Asparagales, Commelinids, Early-Diverging Eudicotyledons, Santalales, Saxifragales, Caryophyllales, to Asterids and Rosids). In total, we have analyzed 2,565 plastid genomes and, in addition, the chromatophore genome of amoeba Paulinella chromatophora. The length of cpDNA sequences (Table 1) varies from 11,348 bp (Pilostyles aethiopica, endoparasitic land plant which preserved only 17 chloroplast genes of the usual number of 116 cpDNA genes [16]) to 610,063 bp in Bulboplastis apyrenoidosa algae from the Rhodophyta division. Recently, a plastid DNA was described for red algae Corynoplastis japonica as the largest and most intron-rich plastid genome (1.13 Mbp) [31]. However, the majority of cpDNA genomes varied in the close interval between 120 and 150 kbp. The cpDNA genome of Paulinella chromatophore, which has a plastid genome from a different engulfment event estimated only 60 million years ago, is 1,021,616 bp [14], longer than the whole genomes of individual prokaryotic organisms. Aside from this unique organism, the longest cpDNAs are typical for Rhodophyta, Chlorophyta, and Zygnemophyceae, and the shortest for unicellular organisms in the phylum Euglenozoa (Figure 1). Length variability is generally correlated with evolutionary age. The largest variability is observed in the group Chlorophyta and Rhodophyta while the sizes of cpDNA in group Liliales and other land plants are relatively constant. The variability of length in higher plants is limited compared to lower plants (bryophyta, algae, etc.) (Figure 1). Contrary to the diverse sizes of cpDNA in ancient phylogenetic groups, the size of the cpDNA in phylogenetically newer groups is more limited with only a few exceptions. For example, 75% of all 522 sequenced cpDNA genomes in clade Rosids are in the size interval of 159,441 to 160,886 bp, differing by less than 1%.
Table 1

Numbers and frequencies of S-IRs according to size.

IR sizeAmount in datasetIR frequency per 1000bpIR sizeAmount in datasetIR frequency per 1000bpIR sizeAmount in datasetIR frequency per 1000bp
610,351,04026.8991513,3700.035241,6190.004

74,157,12710.803166,6410.017251,0050.003

81,656,1014.304175,5050.014267600.002

9637,1841.656183,5950.009271,2310.003

10264,2490.687192,7830.007284500.001

11113,6490.295202,6760.007293500.001

1250,2290.131212,1080.005303020.001

1327,8330.072223,0900.008>301,1050.004

1413,9350.036231,5770.004
Figure 1

Variability of length of cpDNAs. Box plots show sequence length interquartile ranges for different species groups. The whiskers represent the minimum and maximum values.

3.2. Analyses of Short Inverted Repeats

The total number of nucleotides in the 2,566 plastid genomes analyzed is 384,975,139 bp and we found 17,326,953 S-IRs. The average frequency is 47 (17-81) S-IR/kbp for green algae and 34 (29-59) S-IR/kbp for land plants. The differences between organisms are significant; 50% of cpDNAs have a frequency of 40 to 45 S-IR/kbp, but S-IR frequencies range from 26 S-IR/kbp in unicellular green algae of the order Mamiellales Ostreococcus tauri, while other green algae, specifically of the order Volvocaceae Pleodorina starrii have a frequency of 191.98 S-IR/kbp. Ostreococcus tauri is a member of global oceanic picoplankton and is the smallest described free-living eukaryote with very a compact genome. Pleodorina starrii is another alga which is composed of 32 or 64 biflagellate cells. Values of S-IR frequencies for all groups are shown in Figure 2. The highest S-IR frequencies are in the groups Euglenozoa (67.87 S-IR/kbp) and Bryophyta (67.24 S-IR/kbp) followed by Chlorophyta (60.95 S-IR/kbp) and Rhodophyta (59 S-IR/kbp) and the lowest S-IR frequencies are in the groups Polypodiopsida (37.65 S-IR/kbp) followed by Basal Magnoliophyta (40.11 S-IR/kbp), Magnoliidae (40.14 S-IR/kbp), and Commelinids (40.63 S-IR/kbp). Statistics and evaluation for all groups are provided in Supplementary Table S2.
Figure 2

Frequency of S-IRs in mtDNAs for subgroups and numbers of mtDNAs. The box plot shows the interquartile ranges of S-IR frequencies per 1000 bp in different species groups. Whiskers represent the minimum and maximum values.

Comparing S-IRs in individual organisms and subgroups shows a general decrease in frequency with increasing S-IR length, except for S-IRs 15, 22, 24, or 27 bp long, which are present more often than expected by approximation from neighboring values (Table 1). Similar selective abundance according to length has also been observed for mtDNAs, but only for S-IR lengths 24 and 30 bp. Both 24 and 30 bp long S-IRs are also more frequent than expected in cpDNA sequences, but there is very strong relative abundance also for 15, 22, and 27 bp S-IRs. We investigated if the S-IRs abundantly present in cpDNA genomes are similar using analyses of similarity by Levenshtein distance (Supplementary Data 1). We used two different conditions for the comparison, identity and Levenshtein distance 2 or less. These results show that especially longer S-IR sequences are shared only for a few phylogenetic groups; on the other hand shorter S-IR sequences are often shared across several phylogenetic groups (Supplementary Data 1). Our results show that 27 bp S-IRs are present most often as the sequences AAATTCTTTTATTTTAGATAGAAGAAA and ACATTCTTTTATTTTAGATAGAAGAAA (both 320 times); the next most abundant sequence was present in all genomes only 34 times. Both sequences are identical except for one nucleotide; therefore for Levenshtein distance 2 and less we found AAATTCTTTTATTTTAGATAGAAGAAA sequence 655 times, followed by TATAAGTGAACTAGATAAAACGGAATC sequence 49 times. For 22 bp long S-IRs, the sequence AGAGCTCGGATCGAATCGGTAT is present 414 times; the next most abundant sequence was present only 89 times in all genomes. For Levenshtein distance 2 and less we found AGAGCTCGGATCGAATCGGTAT sequence in 590 cases, followed by TAATTGAAGTAAGAAGTCTCCC sequence 240 times. For 15 bp long S-IRs, the most common sequences were ATAAAAGAAAGAAGA and AAAAAAGAAAGAAGA (presented 1,346 and 1,341 times, respectively), and the next most abundant sequence was present only 370 times in all genomes. Both most abundant sequences are identical except for one nucleotide; therefore for Levenshtein distance 2 and less we found ATAAAAGAAAGAAGA 2,702 times, followed by AAAAAAAAAGAAAGA 1,272 times. The detailed results of S-IR frequencies for all groups are summarized in Table 2. The most common longest S-IRs varied from 17 bp (in Commelinids) to 32 bp (in Bryophyta). The most common longest S-IR in vascular plants (Tracheophyta) (24 bp) was found in Asparagales. We also found 1,105 S-IRs longer than 30 bp in cpDNA genomes (range 31-100 bp), but they constituted less than 0.01% of the total number of 17,326,953 S-IRs identified.
Table 2

cpDNA sizes and S-IR frequencies and lengths.

Group nameNumber of seq.Median size [bp]Shortest sequenceLongest sequenceIR/kbpMean rangeLongest S-IR for 50% of seq. [bp]
Euglenozoa 991,616Monomorphina aenigmaticaEuglena gracilis6818
(74,746 bp)(143,171 bp)56 – 79
Stramenopiles 37122,660Aureococcus anophagefferensCylindrotheca closterium5725
(89,599 bp)(165,809 bp)43 – 69
Rhodophyta 60171,284Cyanidioschyzon merolaeBulboplastis apyrenoidosa5919
(149,987 bp)(610,063 bp)34 -83
Chlorophyta 90157,916Ostreococcus tauriFloydiella terrestris6127
(71,666 bp)(521,168 bp)27 – 102
Zygnemophyceae 11142,017Spirogyra maximaCosmarium botrytis5124
(129,954 bp)(207,850 bp)32 – 64
Bryophyta 8123,868Syntrichia ruralisTakakia lepidozioides6732
(122,630 bp)(149,016 bp)44 – 78
Polypodiopsida 49151,126Diplazium unilobumLygodium japonicum3818
(127,840 bp)(157,260 bp)34 – 52
Acrogymnospermae 85127,659Cathaya argyrophyllaMacrozamia mountperriensis4423
(107,122 bp)(166,341 bp)38 – 50
Basal magnoliophyta 13159,881Schisandra chinensisTrithuria inconspicua4018
(146,859 bp)(165,389 bp)38 – 42
Magnoliidae 41159,443Cassytha filiformisPiper kadsura4018
(114,622 bp)(161,486 bp)39 – 43
Alismatales 14163,856Zostera marinaWolffiella ryophyte4622
(143,877 bp)(169,337 bp)40 – 50
Dioscoreales 10154,205Burmannia oblongaTacca leontopetaloides4722
(39,386 bp)(162,477 bp)42 – 63
Liliales 41152,677Amana wanzhensisHeloniopsis tubiflora4518
(150,576 bp)(158,229 bp)42 – 46
Asparagales 125153,953Oberonia japonicaCypripedium formosanum4424
(142,996 bp)(178,131 bp)42 – 64
Commelinids 290139,171Aegilops cylindricaCarex neurocarpa4117
(113,490 bp)(181,397 bp)38 – 52
Early-diverging eudicotyledons 49157,817Kingdonia unifloraBerberis koreana4219
(147,378 bp)(166,758 bp)39 – 45
Santalales 9128,744Schoepfia jasminodoraErythropalum scandens4518
(118,743 bp)(156,154 bp)41 – 48
Saxifragales 10152,692Phedimus takesimensisLiquidambar formosana4120
(147,048 bp)(160,410 bp)40 – 43
Caryophyllales 32151,686Carnegiea giganteaDrosera rotundifolia4519
(113,064 bp)(192,912 bp)40 – 47
Asterids 398153,377Monotropa hypopitysAdenophora divaricata4319
(35,336 bp)(176,331 bp)38 – 61
Rosids 522159,441Cytinus hypocistisPelargonium transvaalense4520
(19,400 bp)(242,575 bp)35 – 75
Rest 662155,196Pilostyles aethiopicaPleodorina starrii4620
(11,348)(269,857)28 – 192

3.3. Comparison of S-IR Frequencies according to Sequence Annotations

The NCBI genome database contains annotations for cpDNA sequences. The best described are gene (343,857), CDS (226,783), tRNA (91,586), exon (36,345), rRNA (18,719), and intron (11,028). Numbers of annotations at the time of analysis are given in Supplementary Table S3. To compare S-IR frequencies at different locations we used the most commonly described location “gene” as a standard for comparison with other locations. There are significant differences in S-IR frequency in diverse segments of cpDNAs. The largest relative increase of S-IR frequency is for stem-loop sequences (12.8-times higher) followed by regulatory (3.2x), intron (2.4x), and misc sequences (2.7x) (Figure 3). The presence of the “large internal repeat region” is typical for higher plants. The difference between S-IR frequency in the large internal repeat regions and gene regions is only 1.7-fold, whereas S-IRs are 4.6-fold higher in the sequences surrounding these features and the frequency of 12 bp S-IRs and longer is 7.6 times higher before the repeat region compared to gene regions. Another interesting finding is the presence of S-IRs in introns; while S-IRs are abundant in introns, S-IR frequencies are significantly reduced in exon sequences that neighbor introns (both 100 bp before and after introns). We observed significantly reduced S-IR frequencies in and around rRNAs compared to gene locations, especially for 12 bp and longer S-IRs. The opposite effect is seen for tRNA sequences. Raw data are in Supplementary Tables S4 and S5.
Figure 3

Differences in S-IR frequency by DNA locus. The chart shows S-IR frequencies per 1000 bp between “gene” annotation and other annotated locations from the NCBI database. We analyzed frequencies of all S-IRs (all) and of S-IRs with lengths 8 bp and longer (8+), 10 bp and longer (10+), and 12 bp and longer (12+) within annotated locations (inside) and before (100 bp) and after (100 bp) annotated locations.

Based on the data from S-IR analyses we produced a cluster dendrogram of individual groups (Supplementary Figure 2). The real phylogenetic tree is fairly consistent with this cluster dendrogram. Very interesting is the proximity of Euglenozoa and Bryophyta clades. However, this may be due the fact that both groups contain only a small number of accessible sequenced genomes (only 9 and 8 cpDNA genomes are available for Euglenozoa and Bryophyta, respectively). Based on PCA analysis (Supplementary Data 2.html) of S-IR frequencies data (Supplementary Table S2), some interesting facts emerged. The most interesting findings in individual groups from our point of view are shortly commented with appropriate references in the following bullet paragraphs; for details please check interactive graphs (Supplementary Data 2.html): In Rosids, the cpDNA of the holoparasitic plant Cytinus hypocistis was sequenced recently [20] and PCA analysis of S-IR frequency revealed a very unusual profile in comparison with the rest of the group (Supplementary Data 2.html). Cytinus hypocistis has an extremely small cpDNA genome (19.4 kb) with only 23 genes and no large IR-A or IR-B regions [20]. On the other hand, Cytinus hypocistis has the highest S-IR/kbp cpDNA value from all 522 analyzed Rosids (78.81 S-IR/kbp). In Asterids, the most divergent S-IR frequencies in cpDNA include two species of the family Orobanchaceae (Phelipanche ramosa and Phelipanche purpurea), which are nonphotosynthetic flowering plants. Their S-IR frequencies are unusually high in comparison to the rest of the Asterids. In Caryophyllales, the flytrap Dionaea muscipula has an unusually low S-IR frequency in cpDNA. In Early-Diverging Eudicotyledons, Gymnospermium microrrhynchum has the highest S-IR frequency, especially for S-IRs longer than 12 bp. In Commelinids, Carex neurocarpa has the highest S-IR frequency in cpDNA. In Asparagales, Cypripedium formosanum (endemic orchid of Taiwan) has a very high S-IR frequency in cpDNA. In Liliales, all three PCA clusters are very well distinguished from each other. In Dioscoreales, Burmannia oblonga has the highest S-IR frequency in cpDNA. In Alismatales, the aquatic plant Najas flexilis has a low S-IR frequency in cpDNA in comparison to the rest of the group. The plastid genome of Najas flexilis presents some anomalous modifications (reduced size of a small single copy region, eleven ndh gene losses) [21]. In Magnoliidae, Cassytha filiformis and two species of pepper vine (Piper kadsura and Piper cenocladum) have the most divergent pattern of S-IR in cpDNA (higher frequencies of S-IRs) compared with the rest of group. In Basal Magnoliophyta, Trithuria inconspicua (endemic aquatic herb of New Zealand) has the lowest S-IR frequency in cpDNA in comparison with the rest of the group (especially for S-IRs longer than 10 bp). In acrogymnospermae, all three PCA clusters are very well distinguished from each other (Supplementary Data 2). In Polypodiopsida, two species of class Equisetopsida (Equisetum hyemale and Equisetum arvense) have a very high frequency of S-IRs compared with other species in this group. In Bryophyta, two species (Sphagnum palustre and Takakia lepidozioides) have very low S-IR frequencies in cpDNA in comparison with the rest of the group. In Zygnemophyceae, Spirogyra maxima has the highest frequency of S-IR in cpDNA (for S-IRs longer than 12 bp there is more than 10-fold enrichment in comparison with the other species in the group). In Chlorophyta, four species of family Ulvaceae (Ulva fasciata, Ulva linza, Ulva prolifera, and Ulva flexuosa) have very high frequencies of S-IRs in comparison with the rest of the group. In Rhodophyta, the strangest pattern of S-IR frequency in cpDNA was found in Cyanidioschyzon merolae, a small unicellular haploid red alga adapted to hot spring environment with high sulfur acidic content. Cyanidioschyzon merolae has only one chloroplast and mitochondria. After detailed analysis we have found that cpDNA of Cyanidioschyzon merolae has an extremely low content of S-IRs in comparison to the rest of the Rhodophyta group. In Stramenopiles, Aureococcus anophagefferens (unicellular alga with only one chloroplast and mitochondrion) and Aureoumbra lagunensis have the lowest S-IR frequency in cpDNA in comparison with the rest of the group. In Aureococcus anophagefferens and Aureoumbra lagunensis the large IR regions were lost [22]. In Euglenozoa, Monomorphina aenigmatica has the highest S-IR frequency and the plastid genome of this species is AT rich (70.6%) and contains 53 intron insertion sites, of which 41 were found to be shared with other euglenids [23]. DNA cruciforms are formed by S-IRs and with their important roles in replication, transcription and DNA stability it is not surprising that S-IRs are also present in cpDNA genomes. Analyses of mtDNA genomes revealed that S-IR sequences are abundant and nonrandomly distributed in the mitochondrial genomes of all living organisms, with particular abundance in regulatory regions such as replication origin and D-loop region [12]. Here, we analyzed all available cpDNA genomes for the presence of S-IRs capable of forming cruciform structures. Our results show that the typical maximal S-IR length in cpDNA varied from 17 bp (Commelinids) to 32 bp (Bryophyta). Interestingly, substantial numbers of longer S-IRs are detected in some cpDNAs. While the mean frequency of S-IR was ~45 IRs/kbp, the frequency was remarkably higher in some plastid genomes. In 13 plastid genomes, the frequency was higher than 100 S-IR/kbp, most of these genomes were from Apicomplexa, but also red algae Choreocolax polysiphoniae, which has reduced gene regions [32]. These organisms typically contain an AT-rich relict plastid which functions in fatty acid biosynthesis, heme biosynthesis, iron-sulfur cluster synthesis, and isoprenoid biosynthesis, but they are deficient in photosynthetic abilities [33]. Long IRs (about 20-36 kbp long) in cpDNA can help stabilize the rest of the chloroplast genome and prevent gene loss-and-gain rearrangements [34]. Interestingly, these longer IRs are surrounded by short S-IRs. While the difference between S-IR frequency in the repeat regions and gene regions is relatively small, the frequency of 12 bp and longer S-IRs in the surrounding sequences is 7.6 times higher before the repeat region. It is likely that S-IRs in cpDNA also form cruciform structures in vivo and these are targets for binding of specific proteins (our hypothesis is based on cruciform functions and preferential protein binding to cruciforms in human and other model organisms). Some information is available to support this concept in plants; ribosomal protein S16 (RPS16) binds to cruciform DNA and is targeted to the chloroplast, indicating the possibility of RPS16-cruciform cpDNA interactions and regulation [35]. Moreover, S-IRs were found to be highly enriched in stem-loop sequences, which have an important regulatory function in cpDNAs.

4. Conclusions

Chloroplasts, as basic organelles for algae and plants, are fundamental for life due to their oxygen management. In this paper, we analyzed all 2,566 sequenced cpDNA by “palindrome analyser”. We described the basic parameters of cpDNA including the frequency and localization of S-IRs able to form cruciform structures. Interestingly, the frequency of S-IR does not decrease for S-IRs 15, 22, 24, or 27 bp long. These results point to the importance of specific S-IRs in cpDNA genomes. Moreover, comparison by Levenshtein distance of S-IR similarities showed that a limited number of sequences are shared in the majority of cpDNA S-IRs. S-IRs are not located randomly, but are length-dependently enriched in specific locations, including the repeat region, stem, introns, and tRNA regions of cpDNA genomes. The highest enrichment was found for 12 bp and longer S-IRs in the stem-loop region, followed by 12 bp and longer S-IRs located before the repeat region. These data showing nonrandom and conserved arrangements of S-IRs in chloroplast genomes indicate the potential importance of S-IRs in basic biological processes within chloroplasts.
  31 in total

Review 1.  The puzzle of plastid evolution.

Authors:  John M Archibald
Journal:  Curr Biol       Date:  2009-01-27       Impact factor: 10.834

2.  EVOLUTIONARY SIGNIFICANCE OF THE LOSS OF THE CHLOROPLAST-DNA INVERTED REPEAT IN THE LEGUMINOSAE SUBFAMILY PAPILIONOIDEAE.

Authors:  Matt Lavin; Jeff J Doyle; Jeffrey D Palmer
Journal:  Evolution       Date:  1990-03       Impact factor: 3.694

3.  Palindrome analyser - A new web-based server for predicting and evaluating inverted repeats in nucleotide sequences.

Authors:  Václav Brázda; Jan Kolomazník; Jiří Lýsek; Lucia Hároníková; Jan Coufal; Jiří Št'astný
Journal:  Biochem Biophys Res Commun       Date:  2016-09-04       Impact factor: 3.575

4.  Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates.

Authors:  Mao-Lun Weng; John C Blazier; Madhumita Govindu; Robert K Jansen
Journal:  Mol Biol Evol       Date:  2013-12-12       Impact factor: 16.240

5.  The complete chloroplast genome sequence of Mahonia bealei (Berberidaceae) reveals a significant expansion of the inverted repeat and phylogenetic relationship with other angiosperms.

Authors:  Ji Ma; Bingxian Yang; Wei Zhu; Lianli Sun; Jingkui Tian; Xumin Wang
Journal:  Gene       Date:  2013-07-27       Impact factor: 3.688

6.  Formation of a stable RuvA protein double tetramer is required for efficient branch migration in vitro and for replication fork reversal in vivo.

Authors:  Alison S Bradley; Zeynep Baharoglu; Andrew Niewiarowski; Bénédicte Michel; Irina R Tsaneva
Journal:  J Biol Chem       Date:  2011-04-29       Impact factor: 5.157

7.  Substitution of the gene for chloroplast RPS16 was assisted by generation of a dual targeting signal.

Authors:  Minoru Ueda; Tomotaro Nishikawa; Masaru Fujimoto; Hideki Takanashi; Shin-Ichi Arimura; Nobuhiro Tsutsumi; Koh-Ichi Kadowaki
Journal:  Mol Biol Evol       Date:  2008-05-02       Impact factor: 16.240

8.  Mechanisms of functional and physical genome reduction in photosynthetic and nonphotosynthetic parasitic plants of the broomrape family.

Authors:  Susann Wicke; Kai F Müller; Claude W de Pamphilis; Dietmar Quandt; Norman J Wickett; Yan Zhang; Susanne S Renner; Gerald M Schneeweiss
Journal:  Plant Cell       Date:  2013-10-18       Impact factor: 11.277

Review 9.  Cruciform structures are a common DNA feature important for regulating biological processes.

Authors:  Václav Brázda; Rob C Laister; Eva B Jagelská; Cheryl Arrowsmith
Journal:  BMC Mol Biol       Date:  2011-08-05       Impact factor: 2.946

10.  The complete chloroplast genome sequence of Cephalotaxus oliveri (Cephalotaxaceae): evolutionary comparison of cephalotaxus chloroplast DNAs and insights into the loss of inverted repeat copies in gymnosperms.

Authors:  Xuan Yi; Lei Gao; Bo Wang; Ying-Juan Su; Ting Wang
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

View more
  7 in total

Review 1.  Interaction of Proteins with Inverted Repeats and Cruciform Structures in Nucleic Acids.

Authors:  Richard P Bowater; Natália Bohálová; Václav Brázda
Journal:  Int J Mol Sci       Date:  2022-05-31       Impact factor: 6.208

2.  Comparative Analyses of 3,654 Plastid Genomes Unravel Insights Into Evolutionary Dynamics and Phylogenetic Discordance of Green Plants.

Authors:  Ting Yang; Sunil Kumar Sahu; Lingxiao Yang; Yang Liu; Weixue Mu; Xin Liu; Mikael Lenz Strube; Huan Liu; Bojian Zhong
Journal:  Front Plant Sci       Date:  2022-04-11       Impact factor: 6.627

3.  Common Repeat Elements in the Mitochondrial and Plastid Genomes of Green Algae.

Authors:  David Roy Smith
Journal:  Front Genet       Date:  2020-05-12       Impact factor: 4.599

4.  The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria.

Authors:  Martin Bartas; Michaela Čutová; Václav Brázda; Patrik Kaura; Jiří Šťastný; Jan Kolomazník; Jan Coufal; Pratik Goswami; Jiří Červeň; Petr Pečinka
Journal:  Molecules       Date:  2019-05-02       Impact factor: 4.411

5.  ECuADOR-Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines.

Authors:  Damien D Hinsinger; Joeri S Strijk; Angelo D Armijos Carrion
Journal:  PeerJ       Date:  2020-04-07       Impact factor: 2.984

Review 6.  Structures and stability of simple DNA repeats from bacteria.

Authors:  Vaclav Brazda; Miroslav Fojta; Richard P Bowater
Journal:  Biochem J       Date:  2020-01-31       Impact factor: 3.857

7.  G-Quadruplexes in the Archaea Domain.

Authors:  Václav Brázda; Yu Luo; Martin Bartas; Patrik Kaura; Otilia Porubiaková; Jiří Šťastný; Petr Pečinka; Daniela Verga; Violette Da Cunha; Tomio S Takahashi; Patrick Forterre; Hannu Myllykallio; Miroslav Fojta; Jean-Louis Mergny
Journal:  Biomolecules       Date:  2020-09-21
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.