| Literature DB >> 30823362 |
Xiaoqin Li1,2,3, Yunjuan Zuo4, Xinxin Zhu5, Shuai Liao6, Jinshuang Ma7.
Abstract
Aristolochiaceae, comprising about 600 species, is a unique plant family containing aristolochic acids (AAs). In this study, we sequenced seven species of Aristolochia, and retrieved eleven chloroplast (cp) genomes published for comparative genomics analysis and phylogenetic constructions. The results show that the cp genomes had a typical quadripartite structure with conserved genome arrangement and moderate divergence. The cp genomes range from 159,308 bp to 160,520 bp in length and have a similar GC content of 38.5%⁻38.9%. A total number of 113 genes were identified, including 79 protein-coding genes, 30 tRNAs and four rRNAs. Although genomic structure and size were highly conserved, the IR-SC boundary regions were variable between these seven cp genomes. The trnH-GUG genes, are one of major differences between the plastomes of the two subgenera Siphisia and Aristolochia. We analyzed the features of nucleotide substitutions, distribution of repeat sequences and simple sequences repeats (SSRs), positive selections in the cp genomes, and identified 16 hotspot regions for genomes divergence that could be utilized as potential markers for phylogeny reconstruction. Phylogenetic relationships of the family Aristolochiaceae inferred from the 18 cp genome sequences were consistent and robust, using maximum parsimony (MP), maximum likelihood (ML), and Bayesian analysis (BI) methods.Entities:
Keywords: Aristolochia; chloroplast genome; compare analysis; molecular evolution; phylogeny
Mesh:
Substances:
Year: 2019 PMID: 30823362 PMCID: PMC6429227 DOI: 10.3390/ijms20051045
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Gene maps of the complete cp genome of seven species of Aristolochia. Gene map of cp genome of (A) Aristolochia manshuriensis; (B) Aristolochia kaempferi, Aristolochia macrophylla, Aristolochia mollissima and Aristolochia kunmingensis; (C) Aristolochia tagala and Aristolochia tubiflora. Genes on the inside of the circle are transcribed clockwise, while those outside are transcribed counter clockwise. The darker gray in the inner circle corresponds to GC content, whereas the lighter gray corresponds to AT content.
Summary of complete chloroplast (cp) genomes of Aristolochia species.
| Species | Total | LSC | IR | SSC | CDS | Total | Protein Coding Genes | tRNA | rRNA | GC% |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 159,612 | 88,890 | 25,681 | 19,360 | 79191.0 | 113 | 79 | 30 | 4 | 38.8 |
|
| 160,051 | 89,308 | 25,698 | 19,347 | 79143.0 | 113 | 79 | 30 | 4 | 38.7 |
|
| 160,493 | 89,788 | 25,664 | 19,377 | 79116.0 | 113 | 79 | 30 | 4 | 38.6 |
|
| 159,653 | 88,948 | 25,681 | 19,338 | 79194.0 | 113 | 79 | 30 | 4 | 38.8 |
|
| 159,374 | 88,652 | 25,700 | 19,322 | 78753.0 | 113 | 79 | 30 | 4 | 38.7 |
|
| 159,308 | 89,414 | 25,242 | 19,410 | 78582.0 | 113 | 79 | 30 | 4 | 38.5 |
|
| 160,520 | 89,859 | 25,431 | 19,799 | 78624.0 | 113 | 79 | 30 | 4 | 38.8 |
Figure 2The GC (%) composition in different positions of coding sequence (CDS) region of species within Aristolochia.
Gene contents in the cp genomes of Aristolochia species.
| No. | Group of Genes | Genes Names | Amount |
|---|---|---|---|
| 1 | Photosystems I |
| 5 |
| 2 | Photosystems II |
| 15 |
| 3 | Cytochrome b/f complex |
| 6 |
| 4 | ATP synthase |
| 6 |
| 5 | NADH dehydrogenase |
| 12 (1) |
| 6 | Rubisco large subunit |
| 1 |
| 7 | RNA polymerase |
| 4 |
| 8 | Ribosomal proteins(SSU) |
| 14 (2) |
| 9 | Ribosomal proteins(LSU) |
| 11 (2) |
| 10 | Assembly/stability of photosystem I |
| 2 |
| 11 | Transfer RNAs |
| 37 (7)/38(8) |
| 12 | Ribosomal RNAs |
| 8 (8) |
| 13 | RNA processing |
| 1 |
| 14 | Carbon metabolism |
| 1 |
| 15 | Cytochrome c synthesis |
| 1 |
| 16 | Proteins of unknown function |
| 3 (1) |
| 17 | Other genes |
| 3 |
* Gene contains one intron; ** gene contains two introns; (x2) indicates the number of the repeat unit is 2.
Genes with introns in the seven cp genomes of Aristolochia as well as the lengths of the exons and introns.
| Taxon | Gene | Location | Exon I | Intron I | Exon II | Intron II | Exon III |
|---|---|---|---|---|---|---|---|
|
|
| LSC | 144 | 792 | 411 | ||
|
| LSC | 71 | 912 | 288 | 674 | 250 | |
|
| SSC | 551 | 1095 | 541 | |||
|
| IR | 777 | 703 | 756 | |||
|
| LSC | 6 | 215 | 642 | |||
|
| LSC | 6 | 702 | 477 | |||
|
| LSC | 9 | 1092 | 402 | |||
|
| IR | 391 | 700 | 431 | |||
|
| LSC | 432 | 762 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 842 | 221 | |||
|
| IR | 38 | 804 | 35 | |||
|
| LSC | 24 | 763 | 48 | |||
|
| IR | 37 | 936 | 35 | |||
|
| LSC | 37 | 2574 | 35 | |||
|
| LSC | 35 | 454 | 50 | |||
|
| LSC | 37 | 587 | 36 | |||
|
| LSC | 126 | 871 | 226 | 745 | 155 | |
|
|
| LSC | 144 | 772 | 411 | ||
|
| LSC | 71 | 897 | 295 | 672 | 243 | |
|
| SSC | 551 | 1097 | 541 | |||
|
| IR | 777 | 702 | 756 | |||
|
| LSC | 6 | 215 | 642 | |||
|
| LSC | 6 | 702 | 477 | |||
|
| LSC | 9 | 1098 | 399 | |||
|
| IR | 391 | 700 | 431 | |||
|
| LSC | 432 | 762 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 848 | 221 | |||
|
| IR | 38 | 804 | 35 | |||
|
| LSC | 24 | 764 | 47 | |||
|
| IR | 37 | 936 | 35 | |||
|
| LSC | 37 | 2562 | 35 | |||
|
| LSC | 35 | 455 | 50 | |||
|
| LSC | 37 | 587 | 36 | |||
|
| LSC | 126 | 820 | 226 | 753 | 155 | |
|
|
| LSC | 144 | 780 | 411 | ||
|
| LSC | 71 | 900 | 295 | 671 | 243 | |
|
| SSC | 551 | 1097 | 541 | |||
|
| IR | 777 | 702 | 756 | |||
|
| LSC | 6 | 215 | 642 | |||
|
| LSC | 6 | 702 | 477 | |||
|
| LSC | 9 | 1101 | 399 | |||
|
| IR | 391 | 700 | 431 | |||
|
| LSC | 432 | 765 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 842 | 221 | |||
|
| IR | 38 | 804 | 35 | |||
|
| LSC | 24 | 764 | 48 | |||
|
| IR | 37 | 936 | 35 | |||
|
| LSC | 37 | 2552 | 35 | |||
|
| LSC | 50 | 455 | 35 | |||
|
| LSC | 37 | 587 | 36 | |||
|
| LSC | 126 | 812 | 226 | 752 | 155 | |
|
|
| LSC | 144 | 789 | 411 | ||
|
| LSC | 71 | 909 | 288 | 669 | 250 | |
|
| SSC | 551 | 1101 | 541 | |||
|
| IR | 777 | 703 | 756 | |||
|
| LSC | 6 | 211 | 646 | |||
|
| LSC | 6 | 708 | 477 | |||
|
| LSC | 399 | 1100 | 9 | |||
|
| IR | 391 | 700 | 431 | |||
|
| LSC | 432 | 764 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 839 | 221 | |||
|
| IR | 38 | 804 | 35 | |||
|
| LSC | 24 | 758 | 48 | |||
|
| IR | 37 | 936 | 35 | |||
|
| LSC | 37 | 2567 | 35 | |||
|
| LSC | 35 | 462 | 50 | |||
|
| LSC | 37 | 587 | 36 | |||
|
| LSC | 126 | 920 | 226 | 746 | 155 | |
|
|
| LSC | 144 | 778 | 411 | ||
|
| LSC | 71 | 928 | 288 | 664 | 250 | |
|
| SSC | 551 | 1084 | 541 | |||
|
| IR | 777 | 702 | 756 | |||
|
| LSC | 6 | 215 | 642 | |||
|
| LSC | 6 | 706 | 477 | |||
|
| LSC | 9 | 1095 | 402 | |||
|
| IR | 391 | 700 | 431 | |||
|
| LSC | 432 | 788 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 836 | 221 | |||
|
| IR | 38 | 804 | 35 | |||
|
| LSC | 24 | 755 | 47 | |||
|
| IR | 37 | 936 | 35 | |||
|
| LSC | 37 | 2558 | 35 | |||
|
| LSC | 35 | 475 | 50 | |||
|
| LSC | 37 | 589 | 36 | |||
|
| LSC | 126 | 892 | 226 | 757 | 155 | |
|
|
| LSC | 144 | 751 | 411 | ||
|
| LSC | 71 | 819 | 288 | 671 | 250 | |
|
| SSC | 551 | 1079 | 541 | |||
|
| IR | 777 | 705 | 756 | |||
|
| LSC | 6 | 214 | 642 | |||
|
| LSC | 6 | 693 | 477 | |||
|
| LSC | 9 | 1077 | 399 | |||
|
| IR | 391 | 657 | 431 | |||
|
| LSC | 432 | 780 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 889 | 221 | |||
|
| IR | 38 | 809 | 35 | |||
|
| LSC | 24 | 768 | 47 | |||
|
| IR | 37 | 937 | 35 | |||
|
| LSC | 37 | 2635 | 35 | |||
|
| LSC | 35 | 514 | 50 | |||
|
| LSC | 37 | 594 | 36 | |||
|
| LSC | 126 | 764 | 226 | 752 | 149 | |
|
|
| LSC | 144 | 778 | 408 | ||
|
| LSC | 71 | 802 | 288 | 671 | 250 | |
|
| SSC | 551 | 1101 | 541 | |||
|
| IR | 777 | 704 | 756 | |||
|
| LSC | 6 | 219 | 642 | |||
|
| LSC | 6 | 488 | 477 | |||
|
| LSC | 9 | 1071 | 399 | |||
|
| IR | 391 | 657 | 431 | |||
|
| LSC | 432 | 785 | 1617 | |||
|
| LSC | 114 | 232 | 536 | 23 | ||
|
| LSC | 46 | 848 | 221 | |||
|
| IR | 38 | 804 | 35 | |||
|
| LSC | 24 | 768 | 48 | |||
|
| IR | 37 | 743 | 35 | |||
|
| LSC | 37 | 2687 | 35 | |||
|
| LSC | 35 | 490 | 50 | |||
|
| LSC | 37 | 595 | 36 | |||
|
| LSC | 126 | 830 | 226 | 763 | 149 |
Figure 3Comparison of the borders of large single copy (LSC), small single copy (SSC) and inverted repeat (IR) regions among 13 cp genomes. Number above the gene features means the distance between the ends of genes and the borders sites. These features are not to scale.
Figure 4Codon content of 20 amino acid and stop codons in all protein-coding genes of the seven cp genomes. The histogram from the left-hand side of each amino acid shows codon usage within Aristolochia (From left to right: A. tagala, A. tubiflora, A. moupinensis, A. kunmingensis, A. kaempferi, A. macrophylla, and A. mollissima).
Positive selected sites detected in the cp genome of the Piperales.
| Gene name | M2a | M8 | ||
|---|---|---|---|---|
| selected sites | Pr (w > 1) | selected sites | Pr (w > 1) | |
|
| 71A | 0.918 | 71A | 0.967 * |
| 72L | 0.999 ** | 72L | 1.000 ** | |
| 105R | 0.963 * | 105R | 0.984 * | |
| 116H | 0.963 * | 116H | 0.988 * | |
|
| 79M | 0.966 * | 79M | 0.987 * |
|
| 4S | 0.937 | 4S | 0.975 * |
| 99T | 0.921 | 99T | 0.967 * | |
|
| 206S | 0.914 | 206S | 0.967 * |
| 211V | 0.975 * | 211V | 0.989 * | |
| 1412N | 0.922 | 1412N | 0.971 * | |
|
| 2036W | 0.932 | 2036W | 0.950 * |
* p < 0.05; ** p < 0.01.
Figure 5Repeat sequences in ten cp genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identity ≥ 90% in the cp genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colors.
Figure 6Frequency of simple sequence repeats (SSRs) in the ten cp genomes.
Figure 7Sequence identity plot compared seven cp genomes with A. moupinensis as a reference by using mVISTA. Grey arrows and thick black lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the Y-scale represents the percent identity from 50% to 100%.
Variable sites analyses in the seven Aristolochia cp genomes.
| Regions | Number of Sites | Variable Sites | Parsimony Informative Sites | Nucleotide Diversity |
|---|---|---|---|---|
| LSC | 94,564 | 4430 | 2315 | 0.02182 |
| SSC | 20,451 | 1433 | 804 | 0.03114 |
| IR | 25,884 | 253 | 154 | 0.00411 |
| Complete | 166,113 | 6422 | 3461 | 0.01717 |
| CDS | 79,365 | 2528 | 1376 | 0.01337 |
Figure 8Sliding window analysis of the entire cp genome of seven Aristolochia species (window length: 600 bp; step size: 200 bp). X-axis: position of the midpoint of a window; Y-axis: nucleotide diversity of each window.
Sixteen regions of highly variable sequences (Pi > 0.04) of Aristolochia.
| High Variable Marker | Length | Variable Sites | Parsimony Informative Sites | Nucleotide Diversity |
|---|---|---|---|---|
|
| 1301 | 104 | 58 | 0.04278 |
|
| 2357 | 257 | 159 | 0.05364 |
|
| 1160 | 104 | 69 | 0.04439 |
|
| 1119 | 152 | 75 | 0.05888 |
|
| 1572 | 105 | 50 | 0.04178 |
|
| 920 | 85 | 51 | 0.04216 |
|
| 1402 | 152 | 97 | 0.06311 |
|
| 637 | 61 | 35 | 0.04220 |
|
| 1130 | 106 | 65 | 0.04444 |
|
| 682 | 58 | 36 | 0.04155 |
|
| 1492 | 161 | 86 | 0.04758 |
|
| 2679 | 202 | 113 | 0.04608 |
|
| 1225 | 126 | 74 | 0.04285 |
|
| 652 | 56 | 32 | 0.04053 |
|
| 1228 | 134 | 84 | 0.04611 |
|
| 740 | 70 | 39 | 0.04278 |
| Combine | 20296 | 2216 | 1349 | 0.05413 |
Figure 9Phylogenetic relationships of the 18 species inferred from maximum parsimony (MP), maximum likelihood (ML), and Bayesian (BI) analyses. (A) The topology was constructed by cp genomes, LSC, SSC, CDS, and hotspots regions; (B) tree constructed by IR region. Bayesian posterior probability values < 0.95 or Bootstrap values < 90 were marked on the branches. The support values in node (a): 1/86/93 (using LSC region), 0.97/78/84 (SSC), 0.82/-/- (CDS), and 1/81/80 (hotspots); (b): 1/90/79 (SSC) and 1/73/71 (hotspots). Numbers above nodes are support values with Bayesian posterior probabilities values on the left, ML bootstrap values in the middle, and MP bootstrap values on the right. “ - “ indicates the value < 70.
Sampled species and their voucher specimens used in this study.
| Species | Samples | Voucher | locality |
|---|---|---|---|
|
| E2265 | Yuan Wang | Japan, Tokyo |
|
| E754 | Zhanghua Wang | China, Yunnan |
|
| E2111 | Jinshuang Ma | North America, North Carolia |
|
| E1016 | Xinxin Zhu & Zhixiang Hua | China, Guangdong |
|
| E1086 | Xinxin Zhu & Zhixiang Hua | China, Sichuan |
|
| E1071 | Yuan Wang | China, Hongkong |
|
| E2239 | Shuwan Li | China, Guangxi |