| Literature DB >> 27847816 |
Changwei Bi1, Andrew H Paterson2, Xuelin Wang1, Yiqing Xu1, Dongyang Wu3, Yanshu Qu1, Anna Jiang1, Qiaolin Ye1, Ning Ye3.
Abstract
Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp) genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt) DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb) in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense) than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27847816 PMCID: PMC5099484 DOI: 10.1155/2016/5040598
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Assembly statistics for the G. raimondii mt genome.
| Statistical list | Number |
|---|---|
| Number of raw reads | 1,649,158 |
| Average raw read length (bp) | 744 |
| Number of all contigs | 140,540 |
| N50 contigs (bp) | 1,073 |
| Total length of all contigs (Mb) | ~85 |
| Number of assembled contigs | 21 |
| Total length of aligned contigs (bp) | 599,903 |
| Number of aligned reads | 35,103 |
| Aligned reads (%) | 2.13 |
| Average coverage of aligned contigs | 27.2 |
Figure 1The circular mitochondrial genome of G. raimondii. Genes shown outside of the circle are transcribed clockwise, whereas genes on the inside are transcribed counterclockwise. Genes belonging to different functional groups are color-coded. GC content is represented on the inner circle by the dark gray plot.
Gene content of G. raimondii mt genome.
| Group of genes | Names of genes |
|---|---|
| Complex I (NADH dehydrogenase) | nad1 |
| Complex II (succinate dehydrogenase) | sdh3, sdh4 |
| Complex III (ubiquinol cytochrome c reductase) | cob |
| Complex IV (cytochrome c oxidase) | cox1, cox2 |
| Complex V (ATP synthase) | atp1, atp4, atp6, atp8, atp9 |
| Cytochrome c biogenesis | ccmB, ccmC, ccmFc |
| Ribosomal proteins (SSU) | rps3 |
| Ribosomal proteins (LSU) | rpl2, rpl5, rpl10, rpl16 |
| Maturases | matR |
| Transport membrane protein | mttB (×2) |
| Ribosomal RNAs | rrn5 (×2), rrn18 (×2), rrn26 (×2) |
| Transfer RNAs | trnC-GCA, trnD-GUC (×3), trnE-UUC, trnF-GAA, trnG-GCC, trnH-GUG, |
| trnI-UAU, trnK-UUU, trnM-CAU-1, trnM-CAU-cp (×2), trnM-CAU-2 (×2), | |
| trnN-GUU, trnP-UGG, trnQ-UUG, trnS-UGA, trnS-GCU, trnS-GGA, | |
| trnV-GAC, trnW-CCA (×2), trnY-GUA |
Genes containing introns.
Genome features of G. raimondii mt genome.
| Feature | A (%) | C (%) | G (%) | T (%) | Number of features | Nucleotides (bp) | Proportion in genome (%) |
|---|---|---|---|---|---|---|---|
| Genome | 27.52 | 22.58 | 22.37 | 27.53 | — | 676,078 | — |
| Coding sequencesa | 28.5 | 23.01 | 21.85 | 26.64 | 92 | 83,249 | 12.31 |
| Protein-coding genes | 30.38 | 21.58 | 20.81 | 27.24 | 40 | 34,739 | 5.14 |
|
| 23.76 | 26.3 | 25.56 | 24.37 | 21 | 35,710 | 5.28 |
| tRNAs | 24.03 | 25.13 | 25.87 | 24.97 | 25 | 1,902 | 0.28 |
| rRNAs | 23.3 | 27.21 | 24.47 | 25.02 | 6 | 10,898 | 1.61 |
aCoding sequences include protein-coding genes, cis-spliced introns, tRNAs, and rRNAs.
Distribution and interval of gene clusters in G. raimondii mt genome.
| Gene cluster | Location and interval |
|---|---|
| sdh4-cox3 | 32397..32795-(− |
| cox1-rps10 | 34397..36529-( |
| rrn5-rrn18f | 53358..53467-( |
| mttB-nad9f | 82795..83595-( |
| nad3-rps12 | 114810..115166-( |
| nad1d-matR-nad1e | 164104..164162-( |
| rpl2-rpl5-nad5c | 258634..259638-( |
| cob-rps14 | 273866..275044-( |
| rpl16-rps3 | 318985..319419-(− |
| nad5ab-atp9 | 475056..477354-( |
| nad2abc-sdh3 | 643910..645676-( |
Boldface indicates interval length between two cluster genes.
fGene clusters contain two copies.
a, b, c, d, and e followed with nad represent exon1, exon2, exon3, exon4, and exon5, respectively.
Distribution of penta and hexa single sequence repeats (SSRs) in G. raimondii mt genome.
| SSR type | SSR sequence | SSR size (bp) | Start | End | Location |
|---|---|---|---|---|---|
| penta | (TATTA) ×3 | 15 | 50529 | 50543 | IGS (rps10-exon1, atp1) |
| penta | (AAAAT) ×3 | 15 | 85123 | 85137 | IGS (nad9, nad4-exon4) |
| penta | (GTCTG) ×3 | 15 | 89378 | 89392 | nad4-intron3 |
| penta | (GTTTT) ×4 | 20 | 159380 | 159399 | IGS (trnS-UGA, nad1-exon4) |
| penta | (ACTAG) ×3 | 15 | 166777 | 166791 | matR |
| penta | (CTTAG) ×3 | 15 | 279862 | 279876 | IGS (rps14, rps4) |
| penta | (ATTAC) ×3 | 15 | 339824 | 339838 | IGS (trnS-GCU, nad4L) |
| penta | (CCTTT) ×3 | 15 | 420008 | 420022 | IGS (atp8, nad1-exon1) |
| penta | (AAAAT) ×3 | 15 | 536356 | 536370 | IGS (nad9, nad4-exon4) |
| penta | (GTCTG) ×3 | 15 | 540603 | 540617 | nad4-intron3 |
| penta | (AATAA) ×3 | 15 | 583684 | 583698 | IGS (trnG-GCC, trnQ-UUG) |
| penta | (TTTTA) ×5 | 25 | 663479 | 663503 | IGS (atp4, ccmFc-exon1) |
| hexa | (ACCAAT) ×3 | 18 | 294266 | 294283 | IGS (rps4, cox2-exon1) |
| hexa | (TTCTCT) ×3 | 18 | 594736 | 594753 | IGS (trnQ-UUG, nad6) |
IGS: intergenic spacers.
Distribution of tandem repeats in G. raimondii mt genome.
| Number | Size (bp) | Start | End | Repeat (bp) × copy number | Location |
|---|---|---|---|---|---|
| 1 | 15 | 97957 | 97986 | TAAGTGAAATAAAAT (×2) | IGS (nad4-exon1, trnD-GUC) |
| 2 | 21 | 147834 | 147875 | TAACAGAAGTTTCAAGAGAAC (×2) | IGS (nad7-exon5, ccmB) |
| 3 | 36 | 235143 | 235214 | TCGGAAAAACAAATGCCATGAAGGACTTAGGAAAGA (×2) | IGS (nad2-exon5, rpl2) |
| 4 | 26 | 280595 | 280646 | GATCGCCGTCAAAGACAGGATTCGAG (×2) | IGS (rps14, rps4) |
| 5 | 15 | 549174 | 549203 | TAAGTGAAATAAAAT (×2) | IGS (nad4-exon1, trnD-GUC) |
| 6 | 42 | 653201 | 653284 | CTTGGCTTTCCTTTTTGTCTTGACTCTATGCCTTCCAGCTGT (×2) | IGS (sdh3, atp4) |
IGS: intergenic spacers.
Figure 2Frequency distribution of repeat lengths in the G. raimondii mt genome. The number of repeat lengths is shown by gray boxes, and the number represents the specific frequency of each repeat length.
Distribution of repeats (>100 bp) in G. raimondii mt genome.
| Number | Size (bp) | Identity (%) | Copy-1 | Copy-2a | Copy-3a | Typeb | |||
|---|---|---|---|---|---|---|---|---|---|
| Start | End | Start | End | Start | End | ||||
| R1 | 63905 | 99.93 | 50881 | 114784 | 502128 | 565987 | DR | ||
| R2 | 10624 | 99.94 | 248091 | 258714 | 405137 | 415754 | DR | ||
| R3 | 9130 | 99.97 | 22950 | 32078 |
|
| IR | ||
| R4 | 2532 | 99.64 | 499598 | 502127 | 673553 | 676077 | DR | ||
| R5 | 767 | 98.57 | 141082 | 141848 | 440545 | 441311 | DR | ||
| R6 | 596 | 90.44 | 257539 | 258122 | 414581 | 415163 | 646113 | 646685 | DR |
| R7 | 504 | 98.21 | 33537 | 34040 | 142948 | 143451 | DR | ||
| R8 | 380 | 82.11 | 52465 | 52834 |
|
| 503712 | 504081 | IR/DR |
| R9 | 349 | 99.71 | 167653 | 168001 |
|
|
|
| IR |
| R10 | 314 | 96.18 | 52965 | 53273 | 338227 | 338539 | 504212 | 504520 | DR |
| R11 | 260 | 86.54 | 281040 | 281285 |
|
| IR | ||
| R12 | 257 | 100 | 47863 | 48119 | 256882 | 257138 | 413924 | 414180 | DR |
| R13 | 229 | 100 | 125112 | 125340 | 224959 | 225187 | DR | ||
| R14 | 211 | 98.58 | 34589 | 34797 |
|
| IR | ||
| R15 | 209 | 92.34 | 329404 | 329606 | 670726 | 670933 | DR | ||
| R16 | 208 | 83.17 | 100270 | 100476 |
|
| 551480 | 551686 | IR/DR |
| R17 | 194 | 100 | 28320 | 28513 | 177828 | 178021 | 652575 | 652768 | DR |
| R18 | 194 | 92.78 | 378565 | 378750 |
|
| IR | ||
| R19 | 175 | 100 | 378665 | 378839 |
|
| IR | ||
| R20 | 174 | 99.43 | 276747 | 276920 | 338558 | 338731 | DR | ||
| R21 | 172 | 90.7 | 261988 | 262155 |
|
| IR | ||
| R22 | 165 | 98.18 | 369418 | 369582 |
|
| IR | ||
| R23 | 165 | 93.94 | 84343 | 84505 | 478354 | 478516 | 535576 | 535738 | DR |
| R24 | 162 | 97.53 | 258147 | 258306 | 322746 | 322905 | 415187 | 415346 | DR |
| R25 | 156 | 97.44 | 34621 | 34774 | 378565 | 378719 | DR | ||
| R26 | 148 | 89.86 | 604752 | 604898 |
|
| IR | ||
| R27 | 145 | 87.59 | 236506 | 236650 | 634479 | 634615 | DR | ||
| R28 | 142 | 99.3 | 32393 | 32534 | 142478 | 142619 | DR | ||
| R29 | 137 | 95.62 | 59169 | 59304 |
|
| 510413 | 510548 | IR/DR |
| R30 | 133 | 95.49 | 478656 | 478788 | 576482 | 576614 | DR | ||
| R31 | 126 | 89.68 | 303621 | 303744 |
|
| IR | ||
| R32 | 120 | 82.5 | 402261 | 402378 | 650464 | 650573 | DR | ||
| R33 | 118 | 99.15 | 83712 | 83829 | 455331 | 455448 | 534945 | 535062 | DR |
| R34 | 118 | 84.75 | 259617 | 259733 | 338550 | 338658 | DR | ||
| R35 | 117 | 94.02 | 281058 | 281174 |
|
| IR | ||
| R36 | 117 | 87.18 | 354 | 470 | 277212 | 277321 | DR | ||
| R37 | 115 | 87.83 | 337542 | 337645 |
|
| IR | ||
| R38 | 113 | 100 | 167103 | 167215 | 321673 | 321785 | DR | ||
| R39 | 112 | 97.32 | 81756 | 81866 | 162601 | 162711 | DR | ||
| R40 | 112 | 85.71 | 84872 | 84974 | 279997 | 280107 | 536105 | 536207 | DR |
| R41 | 111 | 98.2 | 162601 | 162711 | 532995 | 533104 | DR | ||
| R42 | 110 | 86.36 | 259625 | 259733 | 276747 | 276847 | DR | ||
| R43 | 107 | 100 | 28258 | 28364 | 177977 | 178083 |
|
| DR/IR |
| R44 | 106 | 94.34 | 4873 | 4976 |
|
| IR | ||
| R45 | 104 | 99.04 | 115134 | 115237 |
|
| IR | ||
| R46 | 102 | 94.12 | 404656 | 404756 |
|
| IR | ||
| R47 | 100 | 84 | 416288 | 416387 |
|
| DR/IR | ||
| R48 | 100 | 85 | 76812 | 76904 | 396912 | 397011 | 528052 | 528144 | DR |
| Rc | 85 | 97.65 | 384090 | 384173 |
|
| IR | ||
aBoldface indicates IR copy, compared with copy-1 as control.
bDR and IR: direct and inverted repeats, respectively.
IR/DR or DR/IR: both direct repeat and inverted repeat among multiple copies.
Comparison of tRNA genes in seven higher plant mt genomes.
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| trnA-UGC | − | − | − | − | − | − | + |
| trnC-GCA | + | + | + | cp | cp | + | + |
| trnD-GUC | cp | cp | cp | + | cp | + | + |
| trnE-UUC | + | + | +/cp | + | + | + | + |
| trnF-GAA | + | − | + | cp | cp | + | + |
| trnG-GCC | + | + | + | − | − | + | + |
| trnG-UCC | − | − | − | − | − | − | + |
| trnH-GUG | cp | cp | cp | cp | cp | cp | + |
| trnI-CAU | + | + | +/cp | + | + | + | + |
| trnI-GAU | − | − | − | − | − | − | − |
| trnK-UUU | + | + | + | + | + | + | + |
| trnL-CAA | − | − | − | − | − | + | + |
| trnL-UAA | − | − | − | − | − | − | + |
| trnL-UAG | − | − | − | − | − | + | + |
| trnM-CAU | +/cp | cp | cp | +/cp | cp | +/cp | + |
| trnfM-CAU | − | + | + | + | + | + | + |
| trnN-GUU | cp | cp | cp | cp | cp | + | + |
| trnP-UGG | + | + | +/cp | + | + | + | + |
| trnQ-UUG | + | + | + | + | + | + | + |
| trnR-ACG | − | − | − | − | − | − | + |
| trnR-UCG | − | − | − | − | − | − | + |
| trnR-UCU | − | − | − | − | − | + | + |
| trnS-GCU | + | + | + | + | + | + | + |
| trnS-UGA | + | + | + | + | + | + | + |
| trnS-GGA | cp | cp | cp | cp | cp | cp | − |
| trnT-GGU | − | − | − | − | − | − | + |
| trnT-UGU | − | − | − | − | − | − | − |
| trnV-UAC | − | − | − | − | − | cp | + |
| trnV-GAC | + | − | − | − | − | − | − |
| trnW-CCA | cp | cp | cp | cp | cp | + | + |
| trnY-GUA | + | + | + | + | + | + | + |
aData from [54].
bData from [53].
cData from [52].
dData from [51].
Figure 3Distribution of tRNA genes in higher plant mt genomes. Deep gray and light gray boxes indicate the number of cp-derived tRNAs and mt-native tRNAs, respectively.
Comparison of cis-/trans-spliced introns in four higher plant mt genomes.
|
|
|
|
| |
|---|---|---|---|---|
| nad1 | 2/2 | 2/2 | 2/2 | 1/3 |
| nad2 | 3/1 | 2/2 | 3/1 | 3/1 |
| nad4 | 3/— | 3/— | 3 (×2)/ — | 3/— |
| nad5 | 2/2 | 2/2 | 2/2 | 2/2 |
| nad7 | 4/— | 3/— | 4/— | 4/— |
| ccmFc | 1/— | 1/— | 1/— | 1/— |
| cox2 | 1 (×2)/ — | 1/— | 1/— | 1/— |
| rpl2 | 1/— | 1/— | — | — |
| rps3 | 1/— | 1/— | 1/— | 3/— |
| rps10 | — | 1/— | 1/— | — |
| Total | 19/5 | 17/6 | 21/5 | 18/6 |
aData from [6].
bData from [54].
cData from [53].
Figure 4K a/K s values of 36 protein-coding genes of G. raimondii, C. papaya, and P. tremula. Deep gray and light gray boxes indicate K a/K s ratio of C. papaya versus G. raimondii and P. tremula versus G. raimondii, respectively.
Figure 5RNA-editing sites in the G. raimondii mt genome. Results are based on the PREP sites with the cut-off value of 0.6. The number of RNA-editing sites of each gene is shown by gray boxes.
Figure 6Maximum likelihood tree based on 23 conserved protein-coding genes of 30 representative higher plant mt genomes. Numbers on each node are bootstrap support values. Marchantia polymorpha was used as outgroup. The yellow, red, blue, green, and baby blue circles represent the asterid, rosid, monocot, Gymnospermae, and Bryophyta classes, respectively. The black circle indicated G. raimondii, belonging to rosids.