| Literature DB >> 31581555 |
Sofía Solórzano1, Delil A Chincoya2, Alejandro Sanchez-Flores3, Karel Estrada4, Clara E Díaz-Velásquez5, Antonio González-Rodríguez6, Felipe Vaca-Paniagua7,8, Patricia Dávila9, Salvador Arias10.
Abstract
The complete sequence of chloroplast genome (cpDNA) has been documented for single large columnar species of Cactaceae, lacking inverted repeats (IRs). We sequenced cpDNA for seven species of the short-globose cacti of Mammillaria and de novo assembly revealed three novel structures in land plants. These structures have a large single copy (LSC) that is 2.5 to 10 times larger than the small single copy (SSC), and two IRs that contain strong differences in length and gene composition. Structure 1 is distinguished by short IRs of <1 kb composed by rpl23-trnI-CAU-ycf2; with a total length of 110,189 bp and 113 genes. In structure 2, each IR is approximately 7.2 kb and is composed of 11 genes and one Intergenic Spacer-(psbK-trnQ)-trnQ-UUG-rps16-trnK-UUU-matK-trnK-UUU-psbA-trnH-GUG-rpl2-rpl23-trnI-CAU-ycf2; with a total size of 116,175 bp and 120 genes. Structure 3 has divergent IRs of approximately 14.1 kb, where IRA is composed of 20 genes: psbA-trnH-GUG-rpl23-trnI-CAU-ycf2-ndhB-rps7-rps12-trnV-GAC-rrn16-ycf68-trnI-GAU-trnA-AGC-rrn23-rrn4.5-rrn5-trnR-ACG-trnN-GUU-ndhF-rpl32; and IRB is identical to the IRA, but lacks rpl23. This structure has 131 genes and, by pseudogenization, it is shown to have the shortest cpDNA, of just 107,343 bp. Our findings show that Mammillaria bears an unusual structural diversity of cpDNA, which supports the elucidation of the evolutionary processes involved in cacti lineages.Entities:
Keywords: divergent inverted repeats; novel gene rearrangements; pseudogenization; short-globose cacti
Year: 2019 PMID: 31581555 PMCID: PMC6843559 DOI: 10.3390/plants8100392
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Figure 1Three different structures found in the complete chloroplast genome of Mammillaria: (a) structure 1, (b) structure 2, and (c) structure 3. In structure 1, the rpl2 gene is flanking IRB in M. albiflora and IRA in M. pectinifera. Gene rpl33 was lost in M. supertexta of structure 2 and in M. zephyranthoides of structure 3. The genomes are displayed circularly, and IRA and IRB correspond to duplicated blocks of regions; starting from the top of the circle, the IRA is the one that appears first in clockwise.
Figure 2MAUVE graphic of five structural alignments of complete chloroplast genomes. The upper graph corresponds to caryophyllid P. oleracea (Portulacaceae); below that, the large giant columnar cactus, C. gigantea; and the last three graphs are the three structures documented in Mammillaria. Relative inverted DNA sequences are drawn above/below of the horizontal line; identical genes are in the same color. P. oleracea has a larger genome than any species of Cactaceae. Discarding the IRs that are recorded in Mammillaria and P. oleracea, but not in C. gigantea, the cpDNA structure of P. oleracea is more similar in structure to C. gigantea than to Mammillaria. Between C. gigantea and P. oleracea, a single large block of inverted genes (encircled) corresponding to atpB and atpE is shown. This block of genes in Mammillaria has identical orientation to P. oleracea. In Mammillaria, many other novel gene rearrangements, which are absent in the other two-caryophyllid taxa, were documented. Additionally, structure 3 has two blocks of inverted genes (described in detail in Figure 3b), with respect to structures 1 and 2. These two blocks of genes are indicated with arrows and have identical orientation in C. gigantea, P. oleracea, and structures 1 and 2 of Mammillaria.
Figure 3(a) Comparison of length and gene composition of IRs in the three structures documented for the complete chloroplast genomes of Mammillaria. The two IRs of structure 3 diverge in rpl23; its location in IRA is denoted with an asterisk. (b) Blocks of genes rearranged at the LSC. These genes are inverted and reoriented in structures 1 and 2, with respect to structure 3. The direction of the row indicates the orientation of transcription, to the left in sense of clockwise and to the right, counter-clockwise. The large squares indicate the genes of LSC that flank these two rearrangements. The asterisk in rpl33 (bottom figure) indicates that, in M. supertexta of structure 2 and in species of structure 3, this gene was lost.
Species of Mammillaria grouped by the type of the structure identified in the complete chloroplast genome (cpDNA). Within and among structure variation in total length size, the two inverted repeats (IRs), large single copy (LSC), and small single copy (SSC) were detected.
| Type of Structure | Total Length | IRs | LSC | SCC | Total Number of Genes | Access Number 1 |
|---|---|---|---|---|---|---|
| I. Structure 1 | ||||||
| 1.1 | 110789 | 1348 | 78380 | 31061 | 113 | MN517610 |
| 1.2. | 108561 | 1544 | 72273 | 29744 | 113 | MN519716 |
| II. Structure 2 | ||||||
| 1. | 115505 | 14522 | 71565 | 29418 | 120 | MN517613 |
| 2. | 115886 | 14488 | 71997 | 29401 | 120 | MN517612 |
| 3. | 115356 | 14428 | 71690 | 29238 | 120 | MN518341 |
| 4. | 116175 | 14490 | 72240 | 29445 | 119 | MN508963 |
| III. Structure 3 | ||||||
| 1. | 107343 | 28252 | 71811 | 7281 | 131 | MN517611 |
1 GeneBank access number of the DNA sequences deposited.
Variation in structural and functional gene composition in the three structures of cpDNA found in Mammillaria. A total of 18 different types of genes were documented, and these are organized alphabetically according to their location in IRs, LSC, and SSC. All the genes located at IRs are duplicated (2X), except the rpl23Ψ in structure 3 that lacks in IRB.
| Gene Type/Structure | Region | Structure 1 | Structure 2 | Structure 3 |
|---|---|---|---|---|
| 1. Ribosomal RNA (rrn) | SSC | rrn4.5, 5,16, 23 | rrn4.5, 5,16, 23 | |
| IRs | rrn4.5, 5,16, 23 (2X) | |||
| 2. Transfer RNA (trn) | LSC | trnCGCA, trnDGUC, trnEUUC, trnFGAA, trnGGCC, trnGUCC, trnHGUG, trnKUUU, trnLUAA, trnMCAU, trnMCAU, trnPUGG, trnQUUG, trnRUCU, trnSGGA, trnSGGU, trnSUGA, trnTGGU, trnTUGU, trnYGUA | trnCGCA, trnDGUC, trnEUUC, trnFGAA, trnGGCC, trnGUCC, trnLUAA, trnMCAU, trnfMCAU, trnPUGG, trnRUCU, trnSGGA, trnSGCU, trnSUGA, trnTGGU, trnTUGU, trnWCCA, trnYGUA | trnCGCA, trnDGUC, trnEUUC, trnFGAA, trnGGCC, trnGUCC, trnKUUU, trnLUAA, trnMCAU, trnfMCAU, trnPUGG, trnQUUG, trnRUCU, trnSGGA, trnSGCU, trnSUGA, trnTGGU, trnTUGU, trnWCCA, trnYGUA |
| SSC | trnA-f, trnIGAU, trnIGAU, trnLCAA, trnLUAG, trnNGUU, trnRACG, trnVGAG | trnA-f, trnIGAU, trnIUAG, trnNGUU, trnLCAA, trnRACG, trnVGAG | trnLUAG, trnLCAA | |
| IRs | trnICAU (2X) | trnHGUG,tmICAU, trnKUUU, trnQUUG (2X) | trnAUGC, trnHGUG, trnICAU, trnIGAU, trnNGUU, trnRACG, trnVGAC (2X) | |
| 3. Proteins of small subunits of the ribosome (rps) | LSC | rps2, 3, 4, 8, 11, 12 (2), 14, 16Ψ, 18Ψ, 19 | rps2, 3, 4, 8, 11, 12 (2), 14, 18Ψ, 19 | rps2, 3, 4, 8 11, 12, 12Ψ, 14, 16Ψ, 18Ψ, 19 |
| SSC | rps7, 12, 15 | rps7, 12, 15 | rps15 | |
| IRs | rps16Ψ (2X) | rps7, 12, (2X) | ||
| 4. Proteins of large subunits of the ribosome (rpl) | LSC | rpl2, 14, 16, 20, 22, 33Ψ, 36Ψ | rpl14, 16Ψ, 20, 22,33Ψ*, 36Ψ | rpl2, 14, 16Ψ, 20, 22, 23Ψ, 36 |
| SSC | rpl32 | rpl32 | ||
| IRs | rpl23Ψ (2X) | rpl2, 23Ψ (2X) | rpl32 (2X), 23Ψ (IRA) | |
| 5. DNA dependent RNA polymerase (rpo) | LSC | rpoA, B, C1, C2 | rpoA, B, C1, C2 | rpoA, B, C1, C2, |
| 6. NADH dehydrogenase (ndh) | SSC | ndhBΨ, DΨ, FΨ, GΨ*** | ndhBΨ, DΨ, FΨ, G*** | |
| IRs | ndhBΨ, DΨ, FΨ, GΨ (2X) | |||
| 7. Photosystem I (psa) | LSC | psaA, B, I, J | psaA, B, I, J | psaA, B, I, J |
| SSC | psaC | psaC | psaC | |
| 8. Photosystem II (psb) | LSC | psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z | psbB, C, D, E, F, H, I, J, K, L, M, N, T, Z | psbB, C, D, E, F, H, I, J, L, K, M, N T, Z |
| IRs | psbA (2X) | psbA (2X) | ||
| 9. Cytochrome b/f complex (pet) | LSC | petA, B, D, G, L, N | petA, B, D, G, L, N | petA, B, D, G, L, N |
| 10. ATP synthase (atp) | LSC | atpA, B, E, F, H, I | atpA, B, E, F, H, I | atpA, B, E, F, H, I |
| 11. Rubisco (rbc) | LSC | rbcL | rbcL | rbcL |
| 12. Maturase K | LSC | matK | matK | |
| IRs | matK (2X) | |||
| 13. Protease (clp) | LSC | clpPΨ, clpP | clpPΨ, clpP | clpPΨ, clpP |
| 14. Envelope membrane protein (cem) | LSC | cemA | cemA | cemA |
| 15. Subunit of acetil-CoA-carboxylase (acc) | LSC | accDΨ | accDΨ | accDΨ |
| 16. c-type cytochrome synthesis (ccs) | SSC | ccsA | ||
| 17. Translational initiation factor (inf) | LSC | infA | infA | infA |
| 18. Hypothetical chloroplast reading frames (ycf) | LSC | ycf3, ycf4Ψ | ycf3, ycf4Ψ** | ycf3, ycf4 |
| SSC | ycf1, ycf2, ycf68Ψ | ycf1, ycf2, ycf68Ψ | ycf1Ψ, ycf2Ψ | |
| IRs | ycf2-p (2X) | ycf2-p (2X) | ycf2Ψ, ycf68Ψ (2X) |
Ψ indicates a pseudogene. The note “-p” indicates that a partial DNA sequence of a gene is inserted in the two IRs. * indicates that rpl33 lacks in M. supertexta but it is present in the other three species of this structure. In addition, this gen is pseudogene in all species except in M. albiflora. ** indicates that is a pseudogene in M. crucigera and M. solisioides of structure 2. *** indicates that it is pseudogene only in M. solisioides of structure 2 and M. pectinifera of structure 1.
Figure 4Phylogenetic ML tree obtained for the seven species of Mammillaria. The analysis is based on 42 coding regions shared to the two species used as outgroups (C. gigantea and P. oleraceae).