| Literature DB >> 29596414 |
Sajjad Asaf1, Abdul Latif Khan1, Muhammad Aaqil Khan2, Raheem Shahzad2, Sang Mo Kang2, Ahmed Al-Harrasi1, Ahmed Al-Rawahi1, In-Jung Lee2,3.
Abstract
Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29596414 PMCID: PMC5875761 DOI: 10.1371/journal.pone.0192966
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of complete chloroplast genomes for 15 Pinus species.
| 121,531 | 121,530 | 117,265 | 117,861 | 120,438 | 117,618 | 117,190 | 116,989 | 117,239 | 119,739 | 116,479 | 116,834 | 116,635 | 119,646 | 119,741 | 115,576 | 119,707 | |
| 38.5 | 38.5 | 38.8 | 38.1 | 38.4 | 38.7 | 38.8 | 38.7 | 38.7 | 38.5 | 38.6 | - | 38.7 | 38.5 | 38.5 | 38.8 | 38.5 | |
| 77,614 | 77,615 | 64,548 | 65,373 | 59,591 | - | 64,523 | - | 64,750 | 51,458 | 74,357 | - | 64,080 | 75,628 | 65,670 | 74,634 | 65,696 | |
| 42,258 | 42,532 | 51,767 | 51,538 | 60,131 | - | 51,717 | - | 51,715 | 43,197 | 41,691 | - | 51,782 | 42,329 | 53,080 | 40,310 | 53,020 | |
| 830 | 693 | 475 | 475 | 358 | - | 475 | - | 387 | 378 | 431 | - | 387 | 845 | 409 | 467 | 495 | |
| 61,691 | 60,765 | 61,227 | 60,702 | 58,469 | 60,364 | 60,496 | 59,753 | 60,847 | 60,519 | 60,015 | 69,598 | 62,988 | 60,549 | 65,133 | 53,919 | 70,395 | |
| 2,661 | 2,587 | 2,778 | 2,725 | 2,582 | 2,583 | 2,778 | 2,428 | 2,511 | 2,725 | 2,577 | 2,575 | 2,131 | 2,725 | 2,785 | 2,657 | 2,652 | |
| 4,517 | 4,517 | 4,555 | 4,515 | 4,517 | 4,515 | 4,555 | 4,514 | 4,515 | 4,515 | 4,515 | 4,515 | 4,555 | 4,518 | 4,518 | 4,516 | 4,518 | |
| 122 | 111 | 115 | 113 | 110 | 110 | 110 | 108 | 110 | 109 | 111 | 111 | 113 | 116 | 137 | 111 | 171 | |
| 83 | 71 | 74 | 71 | 70 | 70 | 70 | 69 | 71 | 73 | 70 | 70 | 81 | 74 | 92 | 70 | 123 | |
| 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | |
| 35 | 34 | 36 | 36 | 34 | 34 | 36 | 32 | 33 | 36 | 34 | 34 | 28 | 36 | 36 | 35 | 35 | |
| 3 | 2 | 2 | 2 | 1 | 4 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 2 | ||||
| 13 | 13 | 13 | 14 | 13 | 13 | 15 | 13 | 13 | 15 | 13 | 13 | 13 | 14 | 13 | 13 | 15 |
P. tae = P. taeda; P. tae* = P. taeda (old); P.arm = P. armandii; P. bung = P. bungeana; P. cont = P. contorta; P. gerar = P. gerardiana; P. kor = P. koraiensis; P. krem = P. krempfii; P. lamb = P. lambertiana; P. mass = P. massoniana; P. mono = P. monophylla; P. nel = P. nelsonii; P. sib = P. sibirica; P. tab = P. tabuliformis; P. taiw = P. taiwanensis; P. stro = P. strobus; P. thu = P. thunbergii
Fig 1Gene map of the Pinus taeda plastid genome.
Thick lines in the red area indicate the extent of the inverted repeat regions (IRa and IRb; 850 bp), which separate the genome into small (SSC; 42,258 bp) and large (LSC; 77,614 bp) single copy regions. Genes drawn inside the circle are transcribed clockwise, and those outside are transcribed counter clockwise. Genes belonging to different functional groups are color-coded. The dark grey in the inner circle corresponds to the GC content and the light grey corresponds to the AT content.
Genes in the sequenced P. taeda chloroplast genome.
| Category | Group of genes | Name of genes |
|---|---|---|
| Large subunit of ribosomal proteins | ||
| Small subunit of ribosomal proteins | ||
| DNA-dependent RNA polymerase | ||
| rRNA genes | ||
| tRNA genes | ||
| Photosystem I | ||
| Photosystem II | ||
| Cytochrome b6/f complex | ||
| ATP synthase | ||
| Rubisco | ||
| Chlorophyll biosynthesis | ||
| Maturase | ||
| Protease | ||
| Envelop membrane protein | ||
| Subunit acetyl-CoA-carboxylate | ||
| c-Type cytochrome synthesis gene | ||
| Conserved open reading frames |
Genes with introns in the Pinus taeda chloroplast genome and length of exons and introns.
| Gene | Location | Exon I (bp) | Intron 1 (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|
| LSC | 159 | 740 | 408 | |||
| LSC | 6 | 799 | 648 | |||
| LSC | 8 | 698 | 667 | |||
| IR | 402 | 668 | 429 | |||
| LSC | 9 | 835 | 396 | |||
| LSC | 432 | 674 | 1665 | |||
| 114 | - | 232 | 540 | 26 | ||
| LSC | 124 | 726 | 230 | 709 | 156 | |
| IR | 38 | 770 | 35 | |||
| IR | 42 | 974 | 35 | |||
| LSC | 50 | 488 | 35 | |||
| LSC | 35 | 3307 | 37 | |||
| LSC | 39 | 541 | 37 |
Base compositions in the Pinus taeda chloroplast (cp) genome.
| T/U | C | A | G | Length (bp) | |
|---|---|---|---|---|---|
| 30.8 | 19.3 | 30.7 | 19.3 | 121,531 | |
| 30.7 | 19.0 | 30.3 | 20.0 | 77,614 | |
| 31.3 | 19.5 | 31.0 | 18.3 | 42,258 | |
| 31.1 | 20.2 | 31.1 | 17.6 | 830 | |
| 23.7 | 24.9 | 22.4 | 29.0 | 2661 | |
| 18.8 | 23.6 | 26.4 | 31.1 | 4517 | |
| 30.5 | 18.1 | 30.5 | 20.9 | 61,691 | |
| 20.4 | 16.03 | 30.26 | 28.3 | 20,563 | |
| 31.5 | 20.7 | 28.49 | 18.2 | 20,563 | |
| 38.18 | 13.94 | 31.79 | 16.07 | 20,563 |
Codon–anticodon recognition pattern and codon usage for the Pinus taeda chloroplast genome.
| Amino acid | Codon | No | RSCU | tRNA | Amino acid | Codon | No | RSCU | tRNA |
|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 1394 | 1.11 | Tyr | UAC | 562 | 0.66 | ||
| Phe | UUC | 1108 | 0.89 | Tyr | UAU | 1137 | 1.34 | ||
| Leu | UUA | 841 | 1.23 | Stop | UAA | 776 | 1.05 | ||
| Leu | UUG | 815 | 1.19 | Stop | UGA | 781 | 1.06 | ||
| Leu | CUU | 818 | 1.2 | Stop | UAG | 662 | 0.89 | ||
| Leu | CUC | 533 | 0.78 | Cyc | UGC | 378 | 0.9 | ||
| Leu | CUA | 642 | 0.94 | Trp | UGG | 677 | 1 | ||
| Leu | CUG | 444 | 0.65 | His | CAU | 839 | 1.43 | ||
| Ile | AUU | 1233 | 1.09 | His | CAC | 337 | 0.57 | ||
| Ile | AUC | 963 | 0.85 | Gln | CAA | 842 | 1.27 | ||
| Ile | AUA | 1194 | 1.06 | Gln | CAG | 481 | 0.73 | ||
| Met | AUG | 807 | 1 | Asn | AAU | 1318 | 1.34 | ||
| Val | GUU | 652 | 1.29 | Asn | AAC | 644 | 0.66 | ||
| Val | GUC | 365 | 0.72 | Lys | AAA | 1444 | 1.3 | ||
| Val | GUA | 606 | 1.2 | Lys | AAG | 770 | 0.7 | ||
| Val | GUG | 391 | 0.78 | Asp | GAU | 917 | 1.43 | ||
| Ser | UCC | 752 | 1.22 | Asp | GAC | 368 | 0.57 | ||
| Ser | UCA | 767 | 1.25 | Glu | GAA | 1043 | 1.33 | ||
| Ser | UCG | 431 | 0.7 | Glu | GAG | 529 | 0.67 | ||
| Pro | CCU | 516 | 1.11 | Arg | CGU | 278 | 0.67 | ||
| Pro | CCC | 400 | 0.86 | Arg | CGC | 163 | 0.39 | ||
| Pro | CCA | 624 | 1.35 | Arg | CGA | 439 | 1.06 | ||
| Pro | CCG | 313 | 0.68 | Arg | CGG | 284 | 0.68 | ||
| Thr | ACU | 448 | 1.05 | Ser | AGU | 499 | 0.81 | ||
| Thr | ACC | 497 | 1.17 | Ser | AGC | 387 | 0.63 | ||
| Thr | ACA | 441 | 1.03 | Arg | AGA | 821 | 1.97 | ||
| Thr | ACG | 320 | 0.75 | Arg | AGG | 511 | 1.23 | ||
| Ala | GCU | 397 | 1.38 | Gly | GGU | 456 | 0.99 | ||
| Ala | GCC | 233 | 0.81 | Gly | GGC | 214 | 0.46 | ||
| Ala | GCA | 347 | 1.21 | Gly | GGA | 728 | 1.57 | ||
| Ala | GCG | 172 | 0.6 | Gly | GGG | 451 | 0.98 |
Fig 2Amino acid frequencies of the Pinus taeda chloroplast (cp) protein coding sequences.
The frequencies of amino acids were calculated for all 81 protein-coding genes from the start to the stop codon.
Fig 3Visual alignment of plastid genomes from Pinus taeda and six other Pinus species (five from the subgenus Pinus and one from the subgenus Strobus).
VISTA-based identity plot showing sequence identity among seven species, using P. taeda as a reference.
Fig 4Distance between adjacent genes and junctions of the small single-copy (SSC), large single-copy (LSC), and two inverted repeat (IR) regions among plastid genomes from six Pinus species.
Boxes above and below the main line indicate the adjacent border genes. The figure is not to scale regarding sequence length, and only shows relative changes at or near the IR/SC borders.
Repeat sequences in the Pinus taeda chloroplast genome.
| Repeat type | Repeat size | Repeat Position 1 | Repeat location 1 | Repeat Position 2 | Repeat location 2 |
|---|---|---|---|---|---|
| P | 830 | 8692 | 51,779 | ||
| P | 399 | 66,445 | 121,132 | IGS | |
| P | 304 | 50,503 | IGS | 120,845 | IGS |
| P | 277 | 50,530 | IGS | 120,845 | IGS |
| P | 86 | 0 | 66,359 | ||
| P | 79 | 9017 | IGS | 52,205 | |
| F | 800 | 175 | 1815 | IGS | |
| F | 376 | 109,649 | 120,134 | ||
| F | 288 | 50,861 | IGS | 84,618 | IGS |
| F | 284 | 50,843 | IGS | 84,600 | IGS |
| F | 275 | 50,825 | IGS | 84,582 | IGS |
| F | 247 | 51,131 | 70,403 | ||
| F | 185 | 50,964 | IGS | 84,721 | IGS |
| F | 171 | 51,207 | 70,479 | ||
| F | 165 | 100,638 | 100,659 | ||
| F | 124 | 101,059 | IGS- | 101,068 | IGS- |
| F | 97 | 9677 | IGS | 30,444 | IGS |
| F | 97 | 101,059 | IGS- | 101,113 | IGS- |
| F | 85 | 9737 | IGS | 30,504 | IGS |
| F | 70 | 100,733 | 100,754 | ||
| F | 79 | 9017 | IGS | 52,205 | psbM |
| F | 73 | 9701 | IGS | 30,468 | IGS |
| F | 71 | 100,638 | 100,701 | ||
| F | 70 | 100,712 | 100,754 | IGS | |
| F | 70 | 101,059 | IGS- | 101,122 | |
| F | 70 | 101,086 | 101,140 | ||
| F | 62 | 93,524 | IGS | 93,579 | IGS |
| F | 69 | 115,329 | 115,395 | ycf2 | |
| F | 71 | 9777 | 30,544 | IGS | |
| F | 71 | 101,086 | 101,149 | ||
| F | 70 | 101,077 | 101,140 | ||
| F | 69 | 9714 | IGS | 30,481 | IGS |
| F | 58 | 71,811 | IGS | 71,831 | IGS |
| F | 67 | 101,149 | 101,167 | ||
| F | 61 | 101,059 | 101,131 | ||
| F | 64 | 101,057 | 101,138 | ||
| F | 63 | 101,057 | 101,147 | ||
| F | 59 | 101,043 | 101,133 | ||
| F | 55 | 100,895 | ycf1 intron | 100,976 | |
| F | 61 | 101,068 | 101,149 |
Tandem repeat sequences in the Pinus taeda chloroplast genome.
| Serial No | Indices | Repeat Length | Size of repeat unit × Copy number | A | C | G | T | Location |
|---|---|---|---|---|---|---|---|---|
| 1 | 9274–9310 | 36 | 2 × 18 | 16 | 16 | 16 | 50 | |
| 2 | 15,199–15,235 | 36 | 2 × 18 | 44 | 8 | 23 | 23 | |
| 3 | 20,648–20,678 | 30 | 2 × 15 | 50 | 10 | 20 | 20 | |
| 4 | 28,466–28,534 | 68 | 2 × 34 | 30 | 24 | 12 | 33 | |
| 5 | 31,275–31,313 | 38 | 2 × 19 | 23 | 13 | 36 | 26 | |
| 6 | 33,103–33,166 | 63 | 3 × 21 | 29 | 16 | 19 | 33 | |
| 7 | 43,597–43,625 | 28 | 2 × 14 | 46 | 0 | 10 | 43 | |
| 8 | 43,615–43,659 | 44 | 2 × 22 | 40 | 12 | 8 | 38 | |
| 9 | 45,578–45,620 | 42 | 2 × 21 | 31 | 2 | 24 | 41 | |
| 10 | 51,993–52,029 | 36 | 2 × 18 | 50 | 16 | 16 | 16 | |
| 11 | 56,031–56,069 | 38 | 2 × 19 | 18 | 12 | 12 | 57 | |
| 12 | 93,544–93,631 | 87 | 3 × 29 | 37 | 16 | 10 | 35 | |
| 13 | 93,525–93,635 | 110 | 2 × 55 | 35 | 15 | 11 | 36 | |
| 14 | 97,002–97,056 | 54 | 2 × 27 | 28 | 20 | 24 | 26 | |
| 15 | 100,583–100,631 | 48 | 2 × 24 | 54 | 9 | 18 | 16 | |
| 16 | 100,639–100,828 | 189 | 9 × 21 | 45 | 9 | 28 | 16 | |
| 17 | 100,827–101,025 | 198 | 6 × 33 | 31 | 1 | 43 | 23 | |
| 18 | 100,866–101,016 | 150 | 10 × 15 | 30 | 1 | 44 | 23 | |
| 19 | 100,827–101,953 | 126 | 2 × 63 | 31 | 1 | 43 | 23 | |
| 20 | 100,823–101,985 | 162 | 2 × 81 | 32 | 2 | 42 | 22 | |
| 21 | 100,939–101,047 | 108 | 2 × 54 | 34 | 4 | 38 | 22 | |
| 22 | 115,330–115,452 | 122 | 2 × 66 | 21 | 22 | 11 | 45 |
Fig 5Analysis of simple sequence repeat (SSR) in the Pinus taeda plastid genome.
A, Number of SSR types in complete genome, coding, and non-coding regions; B, Frequency of identified SSR motifs in different repeat class types.
Simple sequence repeats (SSRs) in the Pinus taeda chloroplast genome.
| Unit | Length | No | SSR start |
|---|---|---|---|
| 15 | 2 | 1375, 28,440 | |
| 14 | 3 | 68,741, 72,734, 106,240 | |
| 12 | 2 | 10,316, 110,251 | |
| 11 | 4 | 10,755, 26,980, 109,368, 11,873 | |
| 10 | 8 | 16,119, 22,252, 48,967, 83,427, 86,798, 88,062, 102,308, 111,412 | |
| 9 | 15 | 40,699, 41,827, 45,769, 70,952, 80,498, 80,744, 95,259, 102,053, | |
| 8 | 31 | 4819, 10,738, 10,950, 16,110, 17,113, 30,189, 30,427, 30,701, 31,373, 33,345, 38,678, 41,893, 50,753, 51,485, 52622, 55,355, 56,042, 63,021, 64,394, 64,437, 92,458, 94,554, 95,822, 97,307, 103,868, 108,971, 114,282, 117065, 118885, 119,819, 120,893 | |
| 9 | 4 | 16,101, 22,497, 71,353, 105,552 | |
| 8 | 2 | 31,381, 120,721 | |
| 13 | 1 | 41,344 | |
| 10 | 4 | 26,392, 96,162, 104,388, 113,787 | |
| 9 | 6 | 19,814, 24,397, 34,072, 42,422, 48,777, 74,253 | |
| 8 | 7 | 19,352, 19,904, 80,532, 83,639, 99,803, 105,218, 110,933 | |
| 9 | 10 | 8774, 22,311, 26,631, 47,568, 51,573, 52,520, 65,195,79,220, 80,699, 106,488, | |
| 8 | 10 | 14,675, 22,384, 30,793, 42,926, 51,556, 69,139, 75,721, 83,721, 90,777, 91,093 | |
| 11 | 1 | 78,353 | |
| 10 | 1 | 42,354 | |
| 9 | 8 | 13,934, 49,935, 65,369, 66,308, 71,749, 94,150, 98,727, 109,563 | |
| 10 | 5 | 3167, 22,135, 106,110, 108,709, 120,693 | |
| 9 | 5 | 28,380, 79,051, 79,226, 81,004, 100,527 | |
| 10 | 1 | 77,667 | |
| 9 | 6 | 2957, 16,215, 21,127, 75,445, 77,964, 111,780 | |
| 9 | 1 | 32,982 | |
| 9 | 2 | 43,692, 94,864 | |
| 9 | 2 | 43,798, 89,223 | |
| 9 | 2 | 54,293, 94,538 | |
| 9 | 2 | 60,538, 80,037 | |
| 9 | 1 | ||
| 17 | 1 | 48,863 | |
| 14 | 1 | 90,739 | |
| 13 | 1 | 51,753 | |
| 12 | 1 | 42,147 | |
| 23 | 1 | 117,038 |
Fig 6Phylogenetic trees of 15 Pinus species.
The entire genome dataset was analyzed using four different methods: Bayesian inference (BI), maximum parsimony (MP), maximum likelihood (ML), and neighbor-joining (NJ). Numbers above the branches represent bootstrap values in the MP, ML, and NJ trees and posterior probabilities in the BI trees, whereas the number below the branches represents branch length. The red dot represents the position of P. taeda (KY964286).