| Literature DB >> 25774672 |
Pasquale L Curci1, Domenico De Paola1, Donatella Danzi1, Giovanni G Vendramin2, Gabriella Sonnante1.
Abstract
With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for "specific barcode" purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants.Entities:
Mesh:
Year: 2015 PMID: 25774672 PMCID: PMC4361619 DOI: 10.1371/journal.pone.0120589
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Artichoke cp genome map.
Genes shown on the outside of the large circle are transcribed clockwise; genes on the inside are transcribed counterclockwise. Thick lines of the small circle indicate IRs. Pseudogenes are marked with '*'.
Genes present in the globe artichoke cp genome.
| Category | Gene name |
|---|---|
| Photosystem I |
|
| Photosystem II |
|
| Cytochrome b6/f |
|
| ATP synthase |
|
| Rubisco |
|
| NADH oxidoreductase |
|
| Large subunit ribosomal proteins |
|
| Small subunit ribosomal proteins |
|
| RNAP |
|
| Other proteins |
|
| Proteins of unknown function |
|
| Ribosomal RNAs |
|
| Transfer RNAs |
|
aGene containing two introns
bGene containing a single intron
cTwo gene copies in the IRs
dGene divided into two independent transcription units
ePseudogenes
Intron containing genes in the globe artichoke cp genome; exon and intron size.
| Gene | Region | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|
|
| LSC | 9 | 1004 | 399 | - | - |
|
| LSC | 40 | 854 | 215 | - | - |
|
| LSC | 432 | 732 | 1638 | - | - |
|
| LSC | 145 | 707 | 410 | - | - |
|
| LSC | 124 | 742 | 230 | 698 | 153 |
|
| LSC | 71 | 626 | 291 | 803 | 229 |
|
| LSC | 6 | 765 | 642 | - | - |
|
| LSC | 243 | - | 114 | - | - |
|
| LSC | 8 | 705 | 475 | - | - |
|
| IR | 391 | 665 | 434 | - | - |
|
| IR | 782 | 670 | 751 | - | - |
|
| SSC | 553 | 1060 | 539 | - | - |
|
| LSC | 37 | 2530 | 35 | - | - |
|
| IR | 38 | 821 | 35 | - | - |
|
| LSC | 37 | 440 | 50 | - | - |
|
| LSC | 47 | 707 | 23 | - | - |
|
| IR | 43 | 510 | 35 | - | - |
|
| LSC | 38 | 574 | 38 | - | - |
* rps12 gene is subjected to trans-splicing.
Fig 2Total repeat and SSR distribution in C. cardunculus var. scolymus chloroplast genome.
(a) Repeat distribution among four different regions: coding sequence, intronic sequence, intergenic space region and overlapping region. (b) SSR distribution according to type: mononucleotide, dinucleotide, trinucleotide, and tetranucleotide repeats. SSR number and percentages (in brackets) are provided. (c) SSR type distribution between coding and non-coding regions.
Size comparison among nine cp genomes completely sequenced in the Asteraceae family.
| Species | Accession Number | Genome size (bp) | LSC (bp) | SSC (bp) | IR (bp) |
|---|---|---|---|---|---|
|
| NC_013553 | 152803 | 84593 | 18842 | 24684 |
|
| DQ383816 | 152772 | 84105 | 18599 | 25034 |
|
| KM035764 | 152529 | 83578 | 18641 | 25155 |
|
| EU549769 | 151762 | 83535 | 18227 | 24999 |
|
| NC007977 | 151104 | 83530 | 18308 | 24633 |
|
| NC_020607 | 151076 | 82740 | 18394 | 24971 |
|
| JQ362483 | 151033 | 82780 | 18347 | 24953 |
|
| NC_015621 | 150698 | 84829 | 18359 | 23755 |
|
| HQ234669 | 150689 | 82855 | 18276 | 24779 |
Species are ordered by genome size.
LSC: Large Single-Copy
SSC: Small Single-Copy
IR: Inverted Repeat
Fig 3Visualization of alignment of nine Asteraceae cp genome sequences.
VISTA-based identity plot showing sequence identity among eight cp genomes already published (see Materials and Methods for accession numbers) and the artichoke cp genome, set as a reference. Sequence identity is shown as a percentage between 50–100% on y-axis. On x-axis, artichoke genes are indicated on top lines, and arrows represent their orientation. Genome regions are distinguished by colors. CNS: conserved non-coding sequences.
Fig 4Comparison of the border positions of LSC, SSC, and IR regions among nine sequenced Asteraceae chloroplast genomes.
Genes are indicated in boxes and their extensions in the corresponding regions are displayed above boxes.
Coding regions and their parsimony-informative rate.
| No. | Region | Length | Aligned length | Conserved sites | No. Pars. uninf. | Pars. inf. | Pars.inf. % | C.I. | R.I. |
|---|---|---|---|---|---|---|---|---|---|
| 1 |
| 1530 | 1610 | 1414 | 130 | 66 | 4.10 | 0.91 | 0.79 |
| 2 |
| 969 | 975 | 859 | 63 | 53 | 5.44 | 0.88 | 0.79 |
| 3 |
| 690 | 690 | 632 | 40 | 18 | 2.61 | 0.94 | 0.85 |
| 4 |
| 2020 | 2127 | 1894 | 144 | 89 | 4.18 | 0.9 | 0.81 |
| 5 |
| 1521 | 1527 | 1342 | 119 | 66 | 4.32 | 0.94 | 0.88 |
| 6 |
| 2152 | 2307 | 2015 | 190 | 102 | 4.42 | 0.90 | 0.77 |
| 7 |
| 501 | 501 | 470 | 18 | 13 | 2.59 | 0.92 | 0.87 |
| 8 |
| 678 | 678 | 592 | 75 | 11 | 1.62 | 0.95 | 0.67 |
| 9 |
| 1413 | 1444 | 1285 | 103 | 56 | 3.88 | 0.88 | 0.74 |
| 10 |
| 1188 | 1269 | 1160 | 58 | 51 | 4.02 | 0.92 | 0.88 |
| 11 |
| 1434 | 1458 | 1340 | 52 | 66 | 4.53 | 0.78 | 0.67 |
| 12 |
| 1008 | 1014 | 896 | 93 | 25 | 2.47 | 0.92 | 0.72 |
| 13 |
| 2802 | 2918 | 2030 | 778 | 110 | 3.77 | 0.94 | 0.66 |
| 14 |
| 4158 | 4176 | 3763 | 288 | 125 | 2.99 | 0.92 | 0.79 |
| 15 |
| 1109 | 1238 | 1034 | 129 | 75 | 6.06 | 0.92 | 0.82 |
| 16 |
| 5304 | 5568 | 3585 | 1505 | 478 | 8.58 | 0.89 | 0.60 |
| 17 |
| 1503 | 1539 | 1420 | 79 | 40 | 2.60 | 0.85 | 0.64 |
| 18 |
| 2250 | 2260 | 1983 | 173 | 104 | 4.60 | 0.89 | 0.78 |
| 19 |
| 3183 | 3183 | 2965 | 140 | 76 | 2.39 | 0.91 | 0.82 |
Length: refers to sequence length in Cynara cardunculus var. scolymus
Aligned length: refers to the alignment of nine Asteraceae considered in the comparative analysis (see Materials and Methods)
Pars.: parsimony
Uninf. uninformative
Inf.: informative
C.I.: consistency index
R.I.: retention index
Fig 5Phylogenetic tree based on maximum parsimony of 69 accessions belonging to the Asteraceae family.
Seven coding regions were used: matk, ndhD, ndhF, ndhI, rbcL, rpoB and the first exon of rpoC1, for a total of 1,811 parsimony-informative characters. Sequences from C. cardunculus were obtained from this work. Bootstrap values for each node were set greater than 50%. Species for which the complete cp genome is available are shaded.