| Literature DB >> 30367630 |
Shuangshuang Li1, Zhiwen Chen1, Nan Zhao1, Yumei Wang2, Hushuai Nie1, Jinping Hua3.
Abstract
BACKGROUND: The mitochondrial genomes of higher plants vary remarkably in size, structure and sequence content, as demonstrated by the accumulation and activity of repetitive DNA sequences. Incompatibility between mitochondrial genome and nuclear genome leads to non-functional male reproductive organs and results in cytoplasmic male sterility (CMS). CMS has been used to produce F1 hybrid seeds in a variety of plant species.Entities:
Keywords: Chimeric ORFs; Comparative genomics; Cytoplasmic male sterility; Gossypium; Mitochondrial genomes; Transcriptomes
Mesh:
Year: 2018 PMID: 30367630 PMCID: PMC6204043 DOI: 10.1186/s12864-018-5122-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Main features of the assembled Gossypium mitogenomes
| Genome Characteristics | 2074A | 2074S | 2074B | E5903 |
|---|---|---|---|---|
| Genome size (bp) | 668,464 | 668,584 | 621,884 | 666,081 |
| GenBank ID | JX536494.1 | JX944505.1 | JX065074.1 | JX944506.1 |
| Circular chromosomes | 1 | 1 | 1 | 1 |
| Percentage G + C content (%) | 44.97 | 44.98 | 44.98 | 44.95 |
| Protein genes | 37 | 37 | 36 | 37 |
| tRNA genes | 30 | 30 | 29 | 30 |
| Native | 18 | 18 | 17 | 18 |
| Plastid-derived | 12 | 12 | 12 | 12 |
| tRNAs with introns | 3 | 3 | 3 | 3 |
| rRNA genes | 4a | 4a | 4a | 4a |
| Genic content percent coverage of total genome | ||||
| Exonic | 6.23 | 6.23 | 6.25 | 6.39 |
| Intronic-c | 4.43 | 4.43 | 4.45 | 4.76 |
| Intergenic content percent coverage | ||||
| Chloroplast-derived | 1.43 | 1.44 | 1.35 | 1.37 |
| Nuclear-derived | 8.44 | 8.83 | 7.11 | 8.29 |
| Repeat content percent coverage of total genome | ||||
| Large repeats: > 1 kb(number) | 11.78 (4) | 11.74 (5) | 9.44 (4) | 11.32 (7) |
| Small repeats: < 1 kb(number) | 4.71 (475) | 4.77 (476) | 4.05 (465) | 4.81 (470) |
aPresent rrn26 has two copies
Fig. 1Linear maps of the four cotton mitogenomes. Known protein-coding genes, tRNA and rRNA genes, and gene fragments are shown on the line. Genes on the right side and left side of the line are transcribed direct and inverted, respectively. Colors indicate genes by function: Complex I (nad; yellow), Complex II (sdh; green), Complex III (cob; yellowish green), Complex IV (cox; light pink), Complex V (atp; olive-green), ribosomal proteins (brown), maturase (matR; orange), other genes (ccm and tRNA; purple), intron (white)
Gene contents of Gossypium mitotypes
| Product group | Gene | 2074A | 2074S | 2074B | E5903 | Product group | Gene | 2074A | 2074S | 2074B | E5903 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| complex I |
| + | + | + | + | Ribosome |
| + 2b | + 2 | + 2 | +/ψ- |
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
| complex II |
| + | + | + | + |
| + | + | + | + | |
|
| + | + | + | + | tRNA |
| + | + | + | + | |
| complex III |
| + | + | + | + |
| + 2 | + 2 | + 2 | + 2 | |
| complex IV |
| + | + | + | + |
| + | + | + | + | |
|
| + | + | + | + |
| + 2 | + 2 | + 2 | + 2 | ||
|
| + | + | + | + |
| + 4 | + 4 | + 4 | + 4 | ||
| complex V |
| + | + | + | + |
| + | + | + | + | |
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + | + | + | + |
| + 2 | + 2 | + 1 | + 2 | ||
|
| + | + | + | + |
| + | + | + | + | ||
| Cytochrome C |
| + | + | + | + |
| + | + | + | + | |
|
| + | + | + | + |
| + 3 | + 3 | + 3 | + 3 | ||
|
| + | + | + | + |
| + | + | + | + | ||
|
| + 2a | + 2 | + | + 2 |
| + 2 | + 2 | + 2 | + 2 | ||
| Other gene |
| + | + | + | + |
| + | + | + | + | |
|
| + | + | + | + |
| + | + | + | + | ||
| rRNA |
| + | + | + | + |
| + | + | + | + | |
|
| + | + | + | + |
| + | + | + | + | ||
|
| + 2 | + 2 | + 2 | + 2 |
| + 2 | + 2 | + 2 | + 2 | ||
|
| + | + | + | + |
Note. −+, denotes present; −, denotes absent; aGene copy number is shown after +; brps3–2is a pseudo gene
The protein variation in four Gossypium mitogenomes
| Gene | Len | Var | IDY | Loc | 2074A | 2074S | E5903 | 2074B | NSM | SM | aa-Var | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N-S | P-S | N-S | P-S | N-S | P-S | N-S | P-S | ||||||||
|
| 585 | 1 | 99.8 | 222 | tt | Phe | tt | Phe | tt | Phe | tt | Phe | 1 | 0 | |
|
| 465 | 1 | 99.8 | 171 | ag |
| ag |
| ag |
| ag | Ser | 0 | 3 | Ser-Arg |
|
| 225 | 1 | 99.6 | 27 | gg | Gly | gg | Gly | gg | Gly | gg | Gly | 1 | 0 | |
|
| 621 | 1 | 99.8 | 11 | c |
| c | Leu | c | Leu | c | Leu | 0 | 1 | Leu-His |
|
| 1323 | 1 | 99.8 | 585 | gt | Val | gt | Val | gt | Val | gt | Val | |||
|
| 1593 | 3 | 99.8 | 415 | Thr | Thr | Ccc |
| Thr | Thr-Pro | |||||
| 960 | at | Ile | at | Ile | at | Ile | at | Ile | |||||||
| 1428 | at | Ile | at | Ile | at | Ile | at | Ile | 2 | 1 | |||||
|
| 783 | 1 | 99.9 | 481 | Leu | Leu | Leu | Leu | 1 | 0 | |||||
|
| 798 | 4 | 99.6 | 157 | Leu | Atc |
| Atc |
| Leu | Leu-Ile | ||||
| 294 | tt | Pro | tt | Pro | tt | Pro | ttG |
| Pro-Leu | ||||||
| 295 | Ala | Ala | Ala | Tct |
| 0 | 4 | Ala-Ser | |||||||
|
| 1968 | 1 | 99.9 | 1858 |
|
|
| Caa | Gln | 0 | 3 | Gln-Lys | |||
|
| 1467 | 1 | 99.9 | 783 | tc | Ser | tc | Ser | tc | Ser | tc | Ser | 1 | 0 | |
|
| 357 | 1 | 99.7 | 317 | t | Ser | t | Ser | t | Ser | tTt |
| 0 | 1 | Ser-Phe |
|
| 1488 | 3 | 99.8 | 33 | ga | Asp | ga | Asp | ga | Asp | ga | Asp | |||
| 240 | at | Ile | at | Ile | at | Ile | at | Ile | |||||||
| 242 | a | Asn | a | Asn | a | Asn | aTT |
| 2 | 1 | Asn-Ile | ||||
|
| 1185 | 1 | 99.9 | 24 | at | Ile | at | Ile | at | Ile | at | Ile | 1 | 0 | |
|
| 1005 | 2 | 99.8 | 45 | tt |
| tt |
| tt |
| ttT | Phe | Phe-Leu | ||
| 292 |
|
|
| Atc | Ile | 0 | 6 | Ile-Leu | |||||||
|
| 582 | 1 | 99.8 | 139 |
|
|
| Aaa | Lys | 0 | 3 | Lys-Gln | |||
|
| 489 | 1 | 99.8 | 361 |
|
|
| Gaa | Glu | 0 | 3 | Glu-Lys | |||
|
| 435 | 1 | 99.5 | 270 | gt | Val | gt | Val | gt | Val | gt | Val | 1 | 0 | |
|
| 1707 | 3 | 99.8 | 1670 | a | Lys | a | Lys | a | Lys | aGg |
| Lys-Arg | ||
| 1676 | g | Gly | g | Gly | g | Gly | gAC |
| Gly-Asp | ||||||
| 1678 | Arg | Arg | Arg | Ggt |
| 0 | 3 | Arg-Gly | |||||||
|
| 1098 | 1 | 99.9 | 535 |
|
|
| Aaa | Lys | 0 | 3 | ||||
|
| 333 | 1 | 99.4 | 311 | gTC |
| g | Glu | g | Glu | g | Glu | 0 | 1 | Glu-Val |
|
| 435 | 1 | 99.8 | 33 | tt |
| tt |
| tt |
| ttA | Leu | 0 | 3 | |
| nonsynonymous mutation | 31 | 10 | 9 | 10 | 7 | total | 10 | 36 | 17 | ||||||
| Synonymous mutation | 1 | 0 | 0 | 10 | |||||||||||
Note. –Len, length of gene CDS sequence; Var variant sites in fourmitogenomes, IDY identity of gene CDS sequences, Loc location of variant sites, N-S nucleotide sequence, P-S amino acid sequence, Boldface, variant nucleotide, Bold italic variant amino acids, NSM nonsynonymous mutation, SM synonymous mutation, aa-Var amino acidvariation, Boldface mark is mutated base and amino acid
Length and percentage of duplicated fragments (up to 500 bp)
| Genome | Genome length (bp) | Duplication length (bp)a | % of in genome | Minimal length (bp)b | Maximal. length (bp)b | Number of fragments | Genome length without duplication (bp) (Percentage) |
|---|---|---|---|---|---|---|---|
| 2074A | 668,464 | 80,545 | 12.0 | 504 | 29,277 | 7 | 587,919 (87.95%)c |
| 2074S | 668,584 | 80,269 | 12.0 | 505 | 27,666 | 8 | 588,315 (87.99%) |
| 2074B | 621,884 | 58,734 | 9.4 | 879 | 27,558 | 5 | 563,150 (90.56%) |
| E5903 | 666,082 | 78,161 | 11.7 | 504 | 21,563 | 11 | 587,921 (88.27%) |
Note. –aAll duplicated copies less one; bLength of one copy; c% of backbone fragments in genome
Fig. 2The size distribution of repetitive content by the number of repeat pairs and total repeat length. The X position is repeat size category, which contains more than 10 kb, 1–10 kb, 0.5–1 kb, 101–500 bp, 41–100 bp, 31–40 bp, 21–30 bp. The Y positions are number of repeats pairs (primary axis) and proportion of total repeat length (secondary axis). The (a), (b), (c), (d) present 2074A, 2074S, 2074B, E5903 mitogenome, respectively
The unique sequences in 2074A and 2074S compared with 2074B
| No. | Locationa | Length(bp) | Joint of syntenic regions | Predicted ORFb | Identity sequencesd |
|---|---|---|---|---|---|
| U1 | 1–5156 | 5156 | S1 | Aorf1, Aorf2, Aorf3; Sorf1, Sorf2, Sorf3 | 2316–2885, 3107–3879 |
| U2 | 16,918-17,305 | 388 | S2-S3 | 236–379 | |
| U3 | 143,667–151,556 (143,674-151,564) | 7890 | S6, S5-S7 | Aorf7, Sorf7 | 2888–3734, 7191–7409, 6160–6348 |
| U4 | 237,227- 238,728 | 1502 | S10 | 893–1502, 413–720, | |
| U5c | 438,450–457,430 (443,399-457,334) | 18,981 | S14, S13-S15 | Aorf18, Aorf19, Aorf20, Aorf21, Aorf22; Sorf17, Sorf16, Sorf18, Sorf19, Sorf20, Sorf21 | 6486–7727, |
| U6 | 665,761 -668,584 | 2824 | S22 | Aorf29; Sorf30 | 565–857, 1918–2188, 879–1159, |
Note. –a figures in brackets denote the sites in 2074A mitogenome; bthere are 5 ORFs predicted in U5; cU5 is 13936 bp in 2074A, and is longer 5148 bp at 3’end sequence in 2074S dthe identity is more than 80%, the figures denote the sites of alignment fragments
Chimeric ORFs (> 300 bp) presented in 2074A mitogenomes
| 2074 AORF | Start | End | Strand | Length | 2074B | E5903 | Tra-domc | Uni/R-seqd | Homologous sequencee | RNA-Seq Log2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2074B/2074A | F1-A/2074A | ||||||||||
| Aorf1 | 3872 | 4483 | + | 612 | -a | *b | 1 | U1 | No homologous sequence | −11.76 | 2.73 |
|
| 4461 | 3700 | – | 762 | – | * | 6 | U1 | 583–759, 86%, papaya mtDNA | −12.58 | 2.62 |
| Aorf3 | 5180 | 4878 | – | 303 | – | * | 0 | U1 | 154–301, 96%, papaya mtDNA | − 14.54 | 2.52 |
|
| 27,402 | 26,845 | – | 558 | Some | some | 1 | AR4(524) | 45 bp, | −0.22 | 2.92 |
|
| 28,431 | 27,967 | – | 463 | * | * | −0.35 | 2.56 | |||
| Aorf5 | 86,768 | 87,067 | + | 300 | – | – | 0 | 98%, other plant mtDNA | −11.92 | 3.33 | |
| Aorf6 | 143,074 | 142,757 | – | 318 | Some | – | 0 | 1–301, 95%, | 0.00 | 12.63 | |
| Aorf7 | 144,192 | 144,596 | + | 405 | – | * | 0 | U3 | No homologous sequence | 0.00 | 9.96 |
| Aorf8 | 149,251 | 148,952 | – | 300 | – | * | 0 | U3 | 4–242, 94%, | −13.47 | 2.25 |
|
| 185,940 | 189,313 | + | 3374 | * | * | 1.26 | 3.17 | |||
|
| 189,332 | 189,688 | + | 357 | Some | some | 1 | AR1(7125) | 76 bp, | −0.70 | 2.41 |
| Aorf10 | 258,310 | 257,840 | – | 471 | Some | some | 1 | up AR2 93 bp | 4–424, 92%, | − 0.25 | 2.66 |
| Aorf11 | 314,879 | 315,265 | + | 387 | Some | some | 0 | 43 bp, | −1.00 | 2.63 | |
| Aorf12 | 323,440 | 323,838 | + | 399 | Some | some | 0 | No homologous sequence | −12.06 | 2.01 | |
|
| 324,431 | 324,751 | + | 321 | * | * | −1.22 | 2.09 | |||
|
| 326,094 | 327,256 | + | 1163 | * | * | −0.91 | 3.37 | |||
| Aorf13 | 335,286 | 334,948 | – | 339 | Some | some | 0 | No homologous sequence | 1.26 | 3.17 | |
| Aorf14 | 343,549 | 343,223 | – | 327 | – | – | 0 | 36 bp, | −1.25 | 3.38 | |
|
| 346,637 | 347,641 | + | 1005 | * | * | −0.27 | 2.05 | |||
|
| 348,085 | 348,642 | + | 558 | Some | some | 1 | AR4(524) | 45 bp, | −0.22 | 2.92 |
| Aorf15h | 388,830 | 388,447 | – | 384 | Some | some | 1 | No homologous sequence | −1.08 | 2.79 | |
| Aorf16 | 397,394 | 397,735 | + | 342 | – | – | 0 | No homologous sequence | −1.64 | 3.41 | |
| Aorf17 | 415,813 | 415,265 | – | 549 | Some | some | 0 | up AR2 23 bp | 39–527, 94%, | −11.92 | 3.33 |
| Aorf18 | 449,399 | 449,001 | – | 399 | – | some | 0 | U5 | 123–393, 82%, tobacco mtDNA | −11.06 | 4.57 |
| Aorf19 | 452,355 | 453,074 | + | 720 | – | * | 2 | U5 | No homologous sequence | 0.00 | 15.49 |
| Aorf20 | 452,473 | 452,781 | + | 309 | – | * | 1 | U5 | No homologous sequence | 0.00 | 13.35 |
| Aorf21 | 454,116 | 453,781 | – | 336 | – | * | 0 | U5 | 93%, | 0.00 | 14.48 |
| Aorf22 | 454,900 | 454,451 | – | 450 | – | * | 0 | U5 | 96%, | −10.88 | 3.10 |
| Aorf23 | 465,398 | 465,751 | + | 354 | Some | some | 0 | 167–331, other plant mtDNA | 0.09 | 2.15 | |
| Aorf24 | 490,816 | 491,292 | + | 477 | – | – | 2 | 64 bp, | −2.19 | 3.93 | |
| Aorf25g | 491,321 | 491,689 | + | 369 | Some | * | 0 | 20 bp, | −3.15 | 4.83 | |
| Aorf26 | 508,562 | 507,753 | – | 810 | Some | some | 0 | 306–479, other plant mtDNA | 1.45 | 5.21 | |
|
| 631,928 | 633,518 | + | 1591 | * | * | −0.32 | 1.86 | |||
|
| 633,740 | 634,606 | + | 867 | Some | – | 0 | up AR1 1760 bp | 56 bp, | −1.47 | 3.02 |
|
| 634,937 | 635,734 | + | 798 | * | * | −0.03 | 2.69 | |||
| Aorf27 | 665,520 | 666,155 | + | 636 | – | * | 0 | AR1, U6 | No homologous sequence | −4.71 | 0.35 |
Note. –ano detected; *bhave this ORF; cTra-dom: transmembrane domain; dUni/R-seq: unique sequence or repeat sequence; eHomologous sequence contains the sequence of genes in cotton and mitochondrial sequences of other plants; f Aorf4 contain a fragment that is 1-45 bp of rps3, Aorf28 contain a fragment that was 1-56 bp of atp4, identity is 100%; gAorf25 is in upstream 70 bp of nad5ex4; hthe end of Aorf15 is longer 81 bp than Sorf14
Chimeric ORFs (> 300 bp) presented in 2074S mitogenomes
| 2074S | Length (bp) | Tra-domc | Uni/Rep-seqd | 2074B | E5903 | Location | Homologous sequencee |
|---|---|---|---|---|---|---|---|
| Sorf25 | 660 | -a | – | *b | down | 19 bp, | |
| Sorf16 | 381 | 1 | U5 | – | – | 157 bp, | |
| Sorf1 | 612 | 1 | U1 | – | * | No homologous sequence | |
| Sorf26 | 810 | – | partial | Partial | 306–479, other plant mtDNA | ||
| Sorf21 | 450 | – | U5 | – | * | 96%, | |
| Sorf20 | 336 | – | U5 | – | * | 93%, | |
| Sorf17 | 399 | – | U5 | – | partial e | 123–393, 82%, tobacco mtDNA | |
| Sorf15 | 549 | – | up SR2 | partial | Partial | 39–527, 94%, | |
| Sorf7 | 405 | – | U3 | – | * | No homologous sequence | |
|
| 315 | 1 | partial | Partial | No homologous sequence | ||
| Sorf13 | 327 | – | – | – | 36 bp, | ||
| Sorf12 | 339 | – | partial | Partial | No homologous sequence | ||
| Sorf9 | 471 | 1 | up SR2 | partial | Partial | 4–424, 92%, | |
| Sorf6 | 318 | – | partial | – | 1–301, 95%, | ||
|
| 357 | 1 | SR1 | partial | Partial | down | 76 bp, |
|
| 558 | 1 | SR4 | partial | Partial | down | 45 bp, |
|
| 558 | 1 | SR4 | partial | Partial | up | 45 bp, |
| Sorf3 | 303 | – | U1 | – | * | 154–301, 96%, papaya mtDNA | |
| Sorf2 | 762 | 6 | U1 | – | * | 583–759, 86%, papaya mtDNA | |
| Sorf10 | 387 | – | partial | Partial | 43 bp, | ||
| Sorf11 | 399 | – | partial | Partial | No homologous sequence | ||
| Sorf19 | 309 | 1 | U5 | – | * | No homologous sequence | |
| Sorf18 | 720 | 2 | U5 | – | * | No homologous sequence | |
| Sorf5 | 300 | – | – | – | 98%, other plant mtDNA | ||
| Sorf22 | 354 | – | partial | Partial | 167–331, other plant mtDNA | ||
| Sorf23 | 477 | 2 | – | – | 64 bp, | ||
|
| 369 | – | partial | * | up | 20 bp, | |
|
| 867 | – | up SR1 | partial | – | down | 56 bp, |
| Sorf30 | 636 | – | SR1, U6 | – | * | No homologous sequence | |
| Sorf28 | 414 | – | partial | Partial | No homologous sequence | ||
|
| 951 | 3 | partial | Partial | down | No homologous sequence |
Note.–ano detected; *bhave this ORF; cTra-dom: transmembrane domain; dUni/R-seq: unique sequence or repeat sequence; eHomologous sequence contains the sequence of genes in cotton and mitochondrial sequences of other plants. fThe similarity between 1–45bp in Sorf4 and 1–45bp in rps3, 1–56bp in Sorf29 and 1–56bp is 100%; g nad5ex4 is located at 91bp upstream of Sorf24
Fig. 3Differential expression of mt genes in 2074A, 2074B and F1-A. Log2 transformations of the expression fold changes (2074B/2074A and F1-A/2074A) are represented by bars. Y axis denotes the levels of transformed expression fold changes
Fig. 4The probability of transmembrane domains of Aorf4, Aorf9, Aorf2 and Aorf28 gene products