| Literature DB >> 29038765 |
Hui Cheng1, Jinfeng Li2, Hong Zhang1, Binhua Cai1, Zhihong Gao1, Yushan Qiao1, Lin Mi2.
Abstract
Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa 'Benihoppe' using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa 'Benihoppe' chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa 'Benihoppe', F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa 'Benihoppe' were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria.Entities:
Keywords: Benihoppe; Chloroplast DNA markers; Chloroplast genome; Comparative analysis; Fragaria × ananassa Duch.
Year: 2017 PMID: 29038765 PMCID: PMC5641433 DOI: 10.7717/peerj.3919
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Gene map of the F. × ananassa ‘Benihoppe’ chloroplast genome.
Genes inside the circle are transcribed in the clockwise direction, and those outside are transcribed in the counter-clockwise direction. Color coding indicates genes of different functional groups. The dark-gray inner circle denotes the GC content, and the lighter-gray circle denotes the AT content.
Summary of the complete chloroplast genome characteristics of ten species in Rosaceae.
| Species | Genome size (bp) | LSC size (bp) | SSC size (bp) | IR size (bp) | Number of genes | Protein- coding genes | tRNA genes | rRNA genes | Number of genes duplicated in IR | GC content (%) | GenBank no. | Reference |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 155,549 | 85,531 | 18,146 | 25,936 | 112 | 78 (7) | 30 (7) | 4 (4) | 18 | 37.23% |
| This article | |
| 155,603 | 85,566 | 18,147 | 25,945 | 112 | 78 (7) | 30 (7) | 4 (4) | 18 | 37.22% |
| ||
| 155,621 | 85,585 | 18,146 | 25,945 | 112 | 78 (7) | 30 (7) | 4 (4) | 18 | 37.23% |
| ||
| 155,691 | 85,605 | 18,174 | 25,956 | 112 | 78 (7) | 30 (7) | 4 (4) | 18 | 37.21% |
| ||
| 155,619 | 85,566 | 18,151 | 25,951 | 112 | 78 (7) | 30 (7) | 4 (4) | 18 | 37.23% |
| Unpublished | |
| 155,640 | 85,571 | 18,145 | 25,962 | 112 | 78 (7) | 30 (7) | 4 (4) | 18 | 37.25% |
| ||
| 156,749 | 85,851 | 18,792 | 26,053 | 114 | 79 (7) | 31 (7) | 4 (4) | 18 | 37.23% |
| Unpublished | |
| 160,041 | 88,119 | 19,204 | 26,359 | 111 | 77 (7) | 30 (7) | 4 (4) | 18 | 36.56% |
| ||
| 159,922 | 87,901 | 19,237 | 26,392 | 111 | 77 (6) | 30 (7) | 4 (4) | 17 | 36.58% |
| ||
| 157,790 | 85,968 | 19,060 | 26,381 | 112 | 78 (5) | 30 (7) | 4 (4) | 16 | 36.76% |
|
Notes.
large single copy
small single copy
inverted repeat (A/B)
base pairs
Figures in brackets denote the number of genes duplicated in IR.
List of annotated genes in the F.× ananassa ‘Benihoppe’ chloroplast genome.
| Category | Gene group | Gene name | ||||
|---|---|---|---|---|---|---|
| Subunits of photosystem I | ||||||
| Subunits of photosystem II | ||||||
| Subunits of NADH dehydrogenase | ||||||
| Subunits of cytochrome b/f complex | ||||||
| Subunits of ATP synthase | ||||||
| Large subunit of rubisco | ||||||
| Proteins of large ribosomal subunit | ||||||
| Proteins of small ribosomal subunit | ||||||
| Subunits of RNA polymerase | ||||||
| Ribosomal RNAs | ||||||
| Transfer RNAs | ||||||
| Maturase | ||||||
| Protease | ||||||
| Envelope membrane protein | ||||||
| Acetyl-CoA carboxylase | ||||||
| c-type cytochrome synthesis gene | ||||||
| Translation initiation factor | ||||||
| Conserved hypothetical chloroplast ORF | ||||||
Notes.
Gene with two introns.
Gene with one intron.
Genes located in the inverted repeats.
Pseudogene.
Distribution of simple sequence repeat (SSR) loci in the F. × ananassa ‘Benihoppe’ chloroplast genome.
| Repeat motif | Length (bp) | Number of SSRs | Start position |
|---|---|---|---|
| A | 10 | 11 | 3,744*; 7,019; 7,609; 8,256; 26,933; 47,455; 60,427; 65,327 ( |
| 11 | 2 | 15,732; 139,818 | |
| 12 | 2 | 7,853; 136,910∗ | |
| 15 | 1 | 8,608 | |
| 16 | 1 | 36,532 | |
| 17 | 1 | 7,969 | |
| T | 10 | 8 | 15,712; 25,631 ( |
| 11 | 6 | 12,219; 17,914 ( | |
| 12 | 2 | 27,869; 104,161∗ | |
| 14 | 1 | 70,953 | |
| 15 | 1 | 71,654 | |
| 16 | 1 | 64,340 | |
| G | 12 | 1 | 64,213 |
| AT | 10 | 5 | 7,065; 29,392; 37,199; 60,337; 120,666 |
| TA | 10 | 5 | 4,891; 6,971; 19,292 ( |
| 12 | 5 | 1,663; 6,993; 7,053; 36,475; 60,325; | |
| TC | 10 | 1 | 62,100 ( |
| AAT | 12 | 1 | 127,596 ( |
| ATA | 12 | 1 | 154,754∗ |
| TAT | 12 | 1 | 86,317∗ |
| AAAT | 12 | 1 | 55,693 |
| AATA | 12 | 1 | 6,423 |
| ATGT | 12 | 1 | 79,222 ( |
| TATT | 12 | 1 | 72,668∗ |
Notes.
The SSR-containing coding regions are indicated in parentheses.
Asterisk denote the SSR-containing introns.
Figure 2Whole-chloroplast-genome alignments for nine Rosaceae species obtained using the mVISTA program, with the F. × ananassa ‘Benihoppe’ chloroplast genome as the reference.
The Y-scale indicates identity from 50% to 100%. Gray arrows indicate the position and direction of each gene. Red indicates non-coding sequences (CNS); blue indicates the exons of protein-coding genes (exon); and lime green indicates the introns of protein-coding genes (intron).
Figure 3Comparison of the borders of LSC, SSC and IR regions in ten Rosaceae chloroplast genomes.
Ka/Ks ratio of protein-coding genes from four Rosaceae species for comparsion with Fragaria.
| Region | ||||
|---|---|---|---|---|
| LSC | 0.14101 | 0.11975 | 0.11924 | 0.11595 |
| IR | 0.11607 | 0.11744 | 0.12212 | 0.13104 |
| SSC | 0.14942 | 0.15972 | 0.15895 | 0.14729 |
Figure 4Ka/Ks ratios of 78 protein-coding genes in Fragaria, Rosa, Malus, Pyrus and Prunus.
Blue boxes indicate the Ka/Ks ratio for Fragaria vs. Rosa; red, Fragaria vs. Malus; green, Fragaria vs. Pyrus; and purple, Fragaria vs. Prunus.
SNPs and InDels among the F.× ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477) chloroplast genomes.
| Number | Type | Position | Location | Nucleotide position | |||
|---|---|---|---|---|---|---|---|
| 1 | SNP | LSC/ | CNS | 4,274 | C | A | C |
| 2 | SNP | LSC/ | CNS | 5,974 | A | C | A |
| 3 | SNP | LSC/ | CNS | 6,982 | T | A | T |
| 4 | SNP | LSC | CNS | 7,609 | A | C | A |
| 5 | InDel | LSC/ | CNS | 8,635–8,636 | – | A | A |
| 6 | SNP | LSC/ | CNS | 9,309 | G | T | G |
| 7 | SNP | LSC/ | CNS | 9,834 | T | T | A |
| 8 | InDel | LSC/ | CNS | 15,742–15,743 | – | A | – |
| 9 | SNP | LSC/ | CNS | 26,855 | C | C | A |
| 10 | InDel | LSC/ | CNS | 26,942 | A | – | – |
| 11 | SNP | LSC/ | CNS | 27,715 | G | T | G |
| 12 | SNP | LSC/ | CNS | 31,611 | A | T | A |
| 13 | SNP | LSC/ | CNS | 32,314 | A | T | A |
| 14 | SNP | LSC/ | CNS | 32,709 | T | A | A |
| 15 | SNP | LSC/ | CNS | 33,453 | C | A | C |
| 16 | InDel | LSC/ | CNS | 37,465–37,466 | – | – | CCCCAAGAAAAAAAGG TAATTAATTATTCTTT |
| 17 | InDel | LSC/ | CNS | 45,458 | T | – | T |
| 18 | InDel | LSC/ | CNS | 46,152–46,153 | – | – | T |
| 19 | SNP | LSC/ | CNS | 47,869 | C | A | C |
| 20 | SNP | LSC/ | CNS | 48,096 | A | C | A |
| 21 | SNP | LSC/ | CNS | 48,097 | G | T | T |
| 22 | SNP | LSC/ | CNS | 49,580 | T | C | T |
| 23 | SNP | LSC/ | CNS | 50,344–50,348 | AAAAG | AAAAG | CTTTT |
| 24 | SNP | LSC/ | CNS | 53,230 | C | A | C |
| 25 | InDel | LSC/ | CNS | 59,997–60,008 | AATTTATTTTTA | – | AATTTATTTTTA |
| 26 | InDel | LSC/ | CNS | 60,623 | T | – | – |
| 27 | InDel | LSC/ | CNS | 64,224 | G | – | G |
| 28 | InDel | LSC/ | CNS | 64,355–64,356 | – | T | – |
| 29 | InDel | LSC/ | CNS | 66,485 | A | – | – |
| 30 | SNP | LSC/ | CNS | 67,723 | C | C | T |
| 31 | InDel | LSC/ | CNS | 67,834–67,835 | – | TAGTAA | – |
| 32 | SNP | LSC/ | CNS | 68,408 | A | A | T |
| 33 | InDel | LSC/ | CNS | 69,490 | A | – | A |
| 34 | InDel | LSC/ | CNS | 69,491 | A | – | – |
| 35 | SNP | LSC/ | CNS | 70,254 | A | G | G |
| 36 | SNP | LSC/ | CNS | 70,519 | A | A | T |
| 37 | InDel | LSC/ | CNS | 70,966–70,967 | – | T | – |
| 38 | SNP | LSC/ | CNS | 70,999 | G | G | T |
| 39 | InDel | LSC/ | CNS | 71,668–71,669 | – | T | – |
| 40 | SNP | LSC/ | CNS | 71,681 | C | A | C |
| 41 | SNP | LSC/ | CNS | 72,808 | T | C | T |
| 42 | InDel | LSC/ | CNS | 75,456–75,457 | – | CATTATCTC AATTGAAAGT | – |
| 43 | SNP | LSC/ | CNS | 78,077 | G | A | G |
| 44 | SNP | LSC/ | CNS | 80,856 | C | G | C |
| 45 | InDel | LSC/ | CNS | 82,300 | T | – | – |
| 46 | SNP | LSC/ | CNS | 82,928 | G | T | T |
| 47 | SNP | LSC/ | CNS | 83,676 | T | T | C |
| 48 | SNP | IR/ | CNS | 100,249 | C | A | C |
| 49 | SNP | IR/ | CNS | 109,248 | G | T | G |
| 50 | SNP | IR/ | CNS | 110,162 | A | G | A |
| 51 | SNP | SSC/ | CNS | 113,838 | T | A | T |
| 52 | SNP | SSC/ | CNS | 114,675 | T | T | A |
| 53 | SNP | SSC/ | CNS | 118,166 | C | A | C |
| 54 | InDel | SSC/ | CNS | 118,599–118,600 | – | A | – |
| 55 | SNP | SSC/ | CNS | 122,406 | C | A | C |
| 56 | SNP | LSC/ | Gene | 58,891 | C | T | C |
| 57 | SNP | SSC/ | Gene | 113,349 | A | G | A |
| 58 | SNP | SSC/ | Gene | 123,504 | T | C | T |
| 59 | SNP | LSC/ | Gene | 77,457 | G | G | T |
| 60 | SNP | LSC/ | Gene | 676 | A | A | G |
| 61 | SNP | LSC/ | Gene | 25,334 | T | G | G |
| 62 | SNP | LSC/ | Gene | 81,522 | T | T | C |
| 63 | SNP | SSC/ | Gene | 125,275 | G | T | G |
| 64 | SNP | SSC/ | Gene | 128,610 | G | C | G |
| 65 | SNP | SSC/ | Gene | 129,102 | G | G | T |
| 66 | SNP | SSC/ | Gene | 129,303 | C | A | C |
| 67 | SNP | LSC/ | Gene | 61,151 | G | A | A |
Notes.
CNS, Non-coding sequences which containing intergenic spacer region and introns.
Nucleotide position is referenced to the chloroplast genome of F. × ananassa ‘Benihoppe’.
Figure 5Percentage of variable sites and number of parsimony-informative sites in non-coding regions across the ten Fragaria chloroplast genomes.