| Literature DB >> 28790364 |
Sajjad Asaf1, Abdul Latif Khan2, Muhammad Aaqil Khan1, Muhammad Waqas1, Sang-Mo Kang1, Byung-Wook Yun1, In-Jung Lee3.
Abstract
We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28790364 PMCID: PMC5548756 DOI: 10.1038/s41598-017-07891-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of complete chloroplast genomes for twelve Arabidopsis species.
| Region |
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Length (bp) | 84464 | 84160 | 84197 | 84158 | 84478 | 84170 | 84234 | 84336 | 84397 | 84478 | 84090 | 84251 |
| GC (%) | 34.2 | 34.1 | 34.1 | 34.1 | 34 | 34 | 34.1 | 34.1 | 34.2 | 34.2 | 34 | 34.1 |
| Length (%) | 54.53 | 54.47 | 54.50 | 54.47 | 54.53 | 54.48 | 54.4 | 54.5 | 54.5 | 54.5 | 54.4 | 54.4 |
|
| ||||||||||||
| Length (bp) | 17885 | 17830 | 17738 | 17813 | 17873 | 17780 | 17859 | 17862 | 17882 | 18875 | 17755 | 17872 |
| GC (%) | 29.4 | 29.4 | 29.4 | 29.4 | 29.5 | 29.3 | 29.4 | 29.4 | 29.4 | 29.4 | 29.4 | 29.4 |
| Length (%) | 11.54 | 11.54 | 11.48 | 11.53 | 11.53 | 11.50 | 11.5 | 11.54 | 11.55 | 12.1 | 11.5 | 11.55 |
|
| ||||||||||||
| Length (bp) | 26261 | 26257 | 26264 | 26259 | 26272 | 26264 | 26258 | 26265 | 26260 | 26256 | 26259 | 26261 |
| GC (%) | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 | 42.3 |
| Length (%) | 16.95 | 16.99 | 17.00 | 16.99 | 16.96 | 17.00 | 16.9 | 16.9 | 16.9 | 16.9 | 17 | 16.9 |
|
| ||||||||||||
| GC (%) | 36.4 | 36.4 | 36.4 | 36.4 | 36.3 | 36.3 | 36.4 | 36.4 | 36.4 | 36.4 | 36.3 | 36.4 |
| Length (%) | 154871 | 154504 | 154473 | 154489 | 154895 | 154478 | 154610 | 154728 | 154799 | 154865 | 154366 | 154645 |
A. are = A. arenosa; A. ceb = A. cebennensis; A. h. gem = A. halleri ssp. gemmifera; A. l. pet = A. lyrata ssp. petraea; A. ped = A. pedemontana; A. tha = A. thanliana; A. aren = A. arenicola; A. cro = A. croatica; A. neg = A. neglecta; A. pet = A. petrogenea; A. sue = A. suecia; A. ume = A. umezawana.
Figure 1Gene map of the A. halleri ssp. gemmifera and A. lyrata ssp. petraea chloroplast genomes. Genes drawn inside the circle are transcribed clockwise, and those outside the circle are transcribed counter clockwise. The asterisks indicate intron-containing genes. Genes belonging to different functional groups are colour-coded. The darker grey in the inner circle corresponds to GC content, and the lighter grey corresponds to AT content.
Figure 2Gene contents of the A. halleri ssp. gemmifera and A. lyrata ssp. petraea chloroplast genomes, grouped by gene family. The colour of each gene is unique within its gene family. Horizontal axis indicates each box is proportional to the size of the gene (bp), including introns.
List of genes in the A. halleri ssp. gemmifera and A. lyrata ssp. petraea chloroplast genomes.
| Category | Group of genes | Name of genes |
|---|---|---|
| Self-replication | Large subunit of ribosomal proteins |
|
| Small subunit of ribosomal proteins |
| |
| DNA dependent RNA polymerase |
| |
| rRNA genes |
| |
| tRNA genes |
| |
| Photosynthesis | Photosystem I |
|
| Photosystem II |
| |
| NadH oxidoreductase |
| |
| Cytochrome b6/f complex |
| |
| ATP synthase |
| |
| Rubisco |
| |
| Other genes | Maturase |
|
| Protease |
| |
| Envelop membrane protein |
| |
| Subunit Acetyl- CoA-Carboxylate |
| |
| c-type cytochrome synthesis gene |
| |
| Conserved Open reading frames |
|
*Genes containing introns; aDuplicated gene (Genes present in the IR regions).
Comparison of coding and non-coding region size among twelve Arabidopsis species.
| Region |
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Length (bp) | 78561 | 78564 | 80019 | 80013 | 78540 | 79368 | 78648 | 78672 | 78675 | 78675 | 78666 | 78699 |
| GC (%) | 37.1 | 37.1 | 37 | 37 | 37.1 | 37 | 37.1 | 37 | 37.1 | 37.1 | 37 | 37.1 |
| Length (%) | 50.7 | 50.8 | 51.80 | 51.79 | 50.70 | 51.3 | 50.8 | 50.8 | 50.8 | 50.8 | 50.9 | 50.8 |
|
| ||||||||||||
| Length (bp) | 2790 | 2790 | 2775 | 2775 | 2796 | 3325 | 2790 | 2790 | 2790 | 2791 | 2789 | 2791 |
| GC (%) | 52.6 | 52.6 | 52.2 | 52.3 | 52.5 | 49.2 | 52.6 | 52.6 | 52.6 | 52.6 | 52.5 | 52.6 |
| Length (%) | 1.80 | 1.80 | 1.79 | 1.79 | 1.80 | 2.15 | 1.80 | 1.80 | 1.80 | 1.80 | 1.80 | 1.80 |
|
| ||||||||||||
| Length (bp) | 9050 | 9050 | 9050 | 9050 | 9050 | 8929 | 9050 | 9050 | 9050 | 9050 | 9050 | 9050 |
| GC (%) | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 | 55.4 |
| Length (%) | 5.84 | 5.85 | 5.85 | 5.85 | 5.84 | 5.78 | 5.8 | 5.84 | 5.84 | 5.84 | 5.84 | 5.84 |
| Intergenic | 64470 | 64100 | 62629 | 62629 | 64509 | 62256 | 64302 | 64216 | 64284 | 64347 | 63861 | 64205 |
| GC (%) | 31.6 | 31.6 | 31.8 | 31.7 | 31.7 | 31 | 31.3 | 31.1 | 31.5 | 31.3 | 31.2 | 31.6 |
| Length (%) | 41.6 | 41.48 | 40.56 | 40.53 | 41.64 | 40.3 | 41.5 | 41.5 | 41.52 | 41.55 | 41.36 | 41.51 |
A. are = A. arenosa; A. ceb = A. cebennensis; A. h. gem = A. halleri ssp. gemmifera; A. l. pet = A. lyrata ssp. petraea; A. ped = A. pedemontana; A. tha = A. thanliana; A. aren = A. arenicola; A. cro = A. croatica; A. neg = A. neglecta; A. pet = A. petrogenea; A. sue = A. suecia; A. ume = A. umezawana.
Base compositions in the A. halleri ssp. gemmifera (Ahg) and A. lyrata ssp. petraea (Alp) cp genome.
| T/U | C | A | G | Length (bp) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| Genome | 32.3 | 32.3 | 18.5 | 18.5 | 31.4 | 31.4 | 17.9 | 17.9 | 154473 | 154489 |
| LSC | 33.8 | 33.8 | 17.5 | 17.5 | 32.1 | 32.1 | 16.6 | 16.6 | 84197 | 84158 |
| SSC | 35.2 | 35.2 | 15.2 | 15.2 | 35.4 | 35.4 | 14.2 | 14.2 | 17739 | 17814 |
| IR | 28.8 | 28.8 | 22 | 22.0 | 28.9 | 29.0 | 20.3 | 20.3 | 26270 | 26259 |
| tRNA | 23.2 | 23.2 | 26.3 | 26.3 | 24.5 | 24.5 | 25.9 | 25.9 | 2775 | 2775 |
| rRNA | 22.3 | 22.3 | 27.7 | 27.7 | 22.3 | 22.3 | 27.7 | 27.7 | 9050 | 9050 |
| Protein Coding genes | 31.5 | 31.9 | 17.3 | 17.3 | 31 | 31.1 | 19.7 | 19.7 | 80019 | 80013 |
| 1st position | 24.24 | 24.26 | 16.8 | 18.6 | 30.34 | 30.3 | 26.75 | 28.4 | 26907 | 26905 |
| 2nd position | 33.05 | 33.03 | 20.2 | 20.2 | 28.83 | 28.8 | 17.84 | 17.8 | 26907 | 26905 |
| 3rd position | 38.6 | 36.6 | 13.3 | 14.03 | 32.13 | 32.12 | 15.54 | 15.5 | 26907 | 26905 |
A. h. gem = A. halleri ssp. gemmifera; A. l. pet = A. lyrata ssp. petraea.
Figure 3Analysis of repeated sequences in twelve Arabidopsis cp genomes. (A) Totals of three repeat types; (B) Frequency of forward repeats by length; (C) Frequency of palindromic repeats by length; (D) Frequency of tandem repeats by length.
Figure 4Analysis of simple sequence repeats (SSR) in the twelve Arabidopsis cp genomes. (A) Number of different SSR types detected in the six genomes; (B) Frequency of identified SSR motifs in different repeat class types; (C) Frequency of identified SSRs in coding regions; (D) Frequency of identified intergenic regions.
Figure 5Alignment visualization of the twelve Arabidopsis chloroplast genome sequences. VISTA-based identity plot showing sequence identity among the six-species using A. halleri ssp. gemmifera as a reference. Vertical scale indicates the percentage of identity, ranging from 50% to 100%. Horizontal axis indicates the coordinates within the chloroplast genome. Arrows indicate the annotated genes and their transcriptional direction. The thick black lines show the inverted repeats (IRs) in the chloroplast genomes.
Figure 6Comparison of border distance between adjacent genes and junctions of LSC, SSC, and two IR regions among the chloroplast genomes of twelve Arabidopsis species. Boxes above or below the main line indicate the adjacent border genes. The figure is not to scale with respect to sequence length and only shows relative changes at or near the IR/SC borders.
Figure 7Phylogenetic trees were constructed for twenty-eight species from the family Brassicaceae using several different methods, and the tree shown is for the 70 shared protein coding genes. The following four different methods were used for the 70 shared genes data set: Bayesian inference (BI), maximum parsimony (MP), maximum likelihood (ML) and neighbour-joining (NJ). Numbers above the branches are the posterior probabilities of BI and bootstrap values for NJ, MP and ML. Stars represent the positions of A. halleri ssp. gemmifera and A. lyrata ssp. petraea.