Literature DB >> 29596414

Complete chloroplast genome sequence and comparative analysis of loblolly pine (Pinus taeda L.) with related species.

Sajjad Asaf1, Abdul Latif Khan1, Muhammad Aaqil Khan2, Raheem Shahzad2, Sang Mo Kang2, Ahmed Al-Harrasi1, Ahmed Al-Rawahi1, In-Jung Lee2,3.   

Abstract

Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29596414      PMCID: PMC5875761          DOI: 10.1371/journal.pone.0192966

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Gymnosperms are represented by a diverse and magnificent group of coniferous species distributed across eight families, consisting of 70 genera containing more than 630 species [1]. They are thought to have arisen from seed plants approximately 300 million years ago and are one of the ancient main plant clades. Gymnosperms possess larger genomes than flowering plants [2-5]. Recently, rapid progress has been made in angiosperm genome sequencing and analysis, but because of the complexity and order of magnitude increase in genome sizes, similar progress has not been attained for gymnosperms. Furthermore, comparative studies revealed that transposable elements, repetitive sequences, and gene duplication are common in gymnosperm genomes [4, 6–8]. Conifers are the main representatives of the gymnosperms, predominant in various ecosystems and representing 82% of terrestrial biomass [9]. Pinus taeda (loblolly pine) is a model species for the largest genus in the division Coniferae. It is an economically important and relatively fast-growing representative of conifers native to the southeastern United States. Previously, the loblolly pine was famous for providing pulp, lumber, and paper to commercial markets, but recently became a main bioenergy feedstock in lignocellulosic ethanol production [10]. Moreover, loblolly pine is considered an important species for comparative genomic studies between angiosperms and gymnosperms [8]. For example, microsatellites and single-nucleotide polymorphisms (SNPs) have been studied to determine population genetic parameters and the associations of phenotypes [11-13], create genetic maps [14-16], and develop genomic selection prediction models [17]. However, the number of available genetic markers remains small, particularly considering the large size of the pine genome. According to recent evaluations [18], the loblolly pine nuclear genome size is 21–24 Gbp. This is approximately four-fold larger than that of the angiosperm with the largest genome, Hordeum vulgare (barley), for which a reference genome is available, and approximately 7–8-fold larger than the human genome [19]. Chloroplasts are known to be derived from cyanobacterium through endosymbiosis and co-evaluation over time [20]. The gymnosperm chloroplast (cp) genome, particularly in conifers, has distinguishing characteristics among angiosperms. These features such as the high levels of variation (intra-specific) [21-24], paternal inheritance [25-28], and a different RNA editing pattern [29] were observed in studies. Generally, in angiosperms, cp genomes range from 130,000 to 160,000 base pairs (bp), with two duplicate inverted repeats (IRs) containing large single copy (LSC) and small single copy (SSC) regions. However, the comparative sizes of IRs, SSC, and LSC, are nearly unchanged, while the gene order and content are significantly conserved [30]. In contrast, the IR sizes of species form gymnosperms highly fluctuate among taxa [31-33]. Similarly, previous reports showed that the IR size for Cycas taitungensis is 23 kbp [34] and Ginkgo biloba is 17 kbp [35]. In contrast, P. thunbergii has a very small IR of 495 bp [36, 37]. Furthermore, in synergism with P. thunbergii, various conifer species have been found to lack the comparatively large IRs typically found in gymnosperms [31, 33, 38, 39]. This decrease in IR size is thought to cause extensive rearrangement in conifer cp genomes [33]. Based on the IRs, the cp genomes can be classified into three categories: (i) with two IRs, (ii) with one IRs, and (iii) with additional tandem repeats [30]. The cp genomes are essential and extremely valuable for understanding the phylogenetic relationships and designing specific molecular markers because of their firm mode of inheritance. Using a total evidence approach [40], the cp genomes or various concatenated sequences were studied to elucidate the phylogeny among various species [41-43]. Similarly, Steane [44] showed that the organization of the P. thunbergii cp genome differs from that of other related angiosperms. The advent of high-throughput next-generation sequencing technologies from Illumina, Pacific Biosciences, Life Technologies, and Roche, among others, have rapidly improved genomic studies [45, 46]. In addition to draft or whole genomes of microbes and animals, genomic studies were performed to determine the chromosomal structures and molecular organization of wheat [47, 48] and maize [49]. In addition, these technologies have been extensively used to evaluate organelles, particularly chloroplast. Although the first complete nucleotide sequence of Nicotiana tabacum was generated by clone sequencing of plasmid and cosmid libraries over a long time [50], more than 800 cp genomes (including 300 from crops and trees) have now been sequenced and deposited in the NCBI Organelle Genome Resources database [51]. The evolution of cp genomes in terrestrial plants can now be studied using these database resources [51]. To date, a total of 16 complete chloroplast genomes in the genus Pinus have been sequenced and submitted to NCBI. In the current study, the complete cp genome of P. taeda (GenBank accession number: KY964286) was sequenced using next-generation sequencing tools. The goal of this study was to determine the cp genome organization of P. taeda and its global pattern of structural and comparative variation in the cp genome of P. taeda with 14 Pinus species (P. koraiensis, P. sibirica, P. armandii, P. lambertiana, P. krempfii, P. bungeana, P. gerardiana, P. monophylla, P. nelsonii, P. contorta, P. massoniana, P. tabuliformis, P. taiwanensis, P. strobus, and P. thunbergii).

Materials and methods

Chloroplast genome sequencing and assembly

Plastid DNA was extracted from the fresh needle leaf parts of P. taeda using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany), and the resulting cpDNA was sequenced using an Illumina HiSeq-2000 platform (San Diego, CA, USA) at Macrogen (Seoul, Korea). The P. taeda cp genome was then assembled de novo using a bioinformatics pipeline (http://www.phyzen.com). Specifically, a 400-bp paired-end library was produced according to the Illumina standard method, which generated 28,110,596 bp of sequence data with a 100-bp average read length. Raw reads with Phred scores of ≤20 were removed from the total PE reads using the CLC-quality trim tool, and de novo assembly of trimmed reads was accomplished using CLC Genomics Workbench v7.0 (CLC Bio, Aarhus, Denmark) with a minimum overlap of 200–600 bp. The resulting contigs were compared against the P. thunbergii and P. contorta plastomes using BLASTN with an E-value cutoff of 1e-5, and five contigs were identified and temporarily arranged based on their mapping positions on the reference genome. After initial assembly, primers were designed (S1 Table) based on the terminal sequences of adjacent contigs, and PCR amplification and subsequent DNA sequencing were conducted to fill in the gaps. PCR amplification was performed in 20-μL reactions containing 1× reaction buffer, 0.4 μL dNTPs (10 mM), 0.1 μL Taq (Solg h-Taq DNA Polymerase), 1 μL (10 pm/μL) primers, and 1 μL (10 ng/μL) DNA, using the following conditions: initial denaturation at 95°C for 5 min; 32 cycles of 95°C for 30 s, 60°C for 20 s, and 72°C for 30 s; and a final extension step of 72°C for 5 min. After incorporating the additional sequencing results, the complete cp genome was used as a reference to map the remaining unmapped short reads to improve the sequence coverage of the assembled genome.

Analysis of gene content and sequence architecture

The P. taeda cp genome was annotated using DOGMA [52], checked manually, and the codon positions were adjusted by comparison with homologs in the cp genome of P. taeda and P. contorta. Transfer RNA sequences of the P. taeda cp genome were verified using tRNAscan-SE version 1.21 [53] with default settings, and the structural features were illustrated using OGDRAW [54]. To examine deviations in synonymous codon usage by avoiding the influence of amino acid composition, the relative synonymous codon usage was determined using MEGA 6 software [55], and finally the divergence of the P. taeda cp genome from six other Pinus species (five from subgenus Pinus and one from subgenus Strobus) cp genomes was assessed using mVISTA [56] in Shuffle-LAGAN mode and using the P. taeda genome as a reference.

Elucidation of repeat sequences and simple sequence repeat (SSRs)

Repeat sequences, including direct, reverse, and palindromic repeats, were identified within the cp genome using REPuter [57] with the following settings: Hamming distance of 3, ≥90% sequence identity, and minimum repeat size of 30 bp. Furthermore, SSRs were detected using Phobos version 3.3.12 [58] with the search parameters set to ≥10 repeat units for mononucleotide repeats, ≥8 repeat units for dinucleotide repeats, ≥4 repeat units for trinucleotide and tetranucleotide repeats, and ≥3 repeat units for pentanucleotide and hexanucleotide repeats. Tandem repeats were identified using Tandem Repeats Finder version 4.07 b [59] with default settings.

Sequence divergence and phylogenetic analyses

The average pairwise sequence divergence of 60 shared genes and complete plastomes of 15 Pinus species was analyzed, using data from P. taeda, P. koraiensis, P. sibirica, P. armandii, P. lambertiana, P. krempfii, P. bungeana, P. gerardiana, P. monophylla, P. nelsonii, P. contorta, P. massoniana, P. tabuliformis, P. taiwanensis, P. strobus, and P. thunbergii. In cases of missed and unclear genes, annotation was confirmed by comparison with the reference sequence after assembling a multiple sequence alignment tool. The complete genome data set was aligned using MAFFT version 7.222 [60] with default parameters. For pairwise sequence divergence, a Kimura’s model was used [61]. Indel polymorphisms among the complete genomes were identified using DnaSP 5.10.01 [62], and a custom Python script (https://www.biostars.org/p/119214/) was used to identify SNPs. To resolve the phylogenetic position of P. taeda within the genus Pinus, 14 published Pinus species plastomes were downloaded from the NCBI database for phylogenetic analysis. Multiple alignments of the complete plastomes were constructed based on the conserved structure and gene order of the plastid genomes [63], and four methods were employed to construct phylogenetic trees, including Bayesian inference (BI), which was implemented using MrBayes 3.1.2 [64], maximum parsimony (MP), which was implemented using PAUP 4.0 [65], and maximum likelihood (ML) and neighbor-joining (NJ), which were implemented using MEGA 6 [55] using previously described settings [66, 67]. In a second phylogenetic analysis, 60 shared cp genes from 15 Pinus species, including P. taeda, and one outgroup species (Juniperus bermudiana) were aligned using ClustalX with default settings, followed by manual adjustment to preserve the reading frames. Finally, the same four phylogenetic inference methods were used to infer trees from the 60 concatenated genes using the same settings [66, 67].

Results and discussion

The P. taeda cp genome was assembled by mapping all Illumina sequence reads into a draft cp genome. Approximately 2,5131,617 reads with 100-bp average lengths were retrieved to obtain 1619.4X coverage of the cp genome. The complete cp genome of P. taeda was 121,131 bp, with 38.5% GC content and only one bp less than the previously sequenced P. taeda cp genome (Table 1). The cp genome size of P. taeda was within the expected range (116–121 Kb) of other sequenced cp genomes of Pinaceae members [41, 68, 69]. The P. taeda cp genome was circular and contained two short-inverted repeats (IRa and IRb) of 830 bp, divided into SSC (42,258 bp) and LSC (77,614 bp) (Fig 1). The P. taeda cp genome encodes 120 genes, including 81 protein-coding genes, four ribosomal RNA (rRNA) genes, and 35 tRNA genes (Table 2). Of these genes, 11 genes (atpF, petB, petD, rpoC1, rpl2, rpl16, trnI-GAU, trnG-UCC, trnA-UGC, trnV-UAC, and trnL-UAA) contained one intron and two genes (rps12 and ycf3) harbored two introns (Table 3). Furthermore, trnK-UUU was identified as the gene containing the longest intron (3,307 bp), which included matK (Table 3); similarly, rps12 was recognized as a trans-spliced gene, with the N-terminal exon-I located at 92 Kb from C-terminal exons-II and III as reported previously for various gymnosperms [70].
Table 1

Summary of complete chloroplast genomes for 15 Pinus species.

P. taeP.tae*P.armP. bungP. contP. gerarP. korP. kremP. lambP. massP. monoP. nelP. sibP. tabP. taiwP. stroP. thu
Size (bp)121,531121,530117,265117,861120,438117,618117,190116,989117,239119,739116,479116,834116,635119,646119,741115,576119,707
Overall GC contents38.538.538.838.138.438.738.838.738.738.538.6-38.738.538.538.838.5
LSC size in bp77,61477,61564,54865,37359,591-64,523-64,75051,45874,357-64,08075,62865,67074,63465,696
SSC size in bp42,25842,53251,76751,53860,131-51,717-51,71543,19741,691-51,78242,32953,08040,31053,020
IR size in bp830693475475358-475-387378431-387845409467495
Protein coding regions size in bp61,69160,76561,22760,70258,46960,36460,49659,75360,84760,51960,01569,59862,98860,54965,13353,91970,395
tRNA size in bp2,6612,5872,7782,7252,5822,5832,7782,4282,5112,7252,5772,5752,1312,7252,7852,6572,652
rRNA size in bp4,5174,5174,5554,5154,5174,5154,5554,5144,5154,5154,5154,5154,5554,5184,5184,5164,518
Number of genes122111115113110110110108110109111111113116137111171
Number of protein coding genes83717471707070697173707081749270123
Number of rRNA44444444444444444
Number of tRNA3534363634343632333634342836363535
Genes duplicated in IR32221411113112
Genes with introns1313131413131513131513131314131315

P. tae = P. taeda; P. tae* = P. taeda (old); P.arm = P. armandii; P. bung = P. bungeana; P. cont = P. contorta; P. gerar = P. gerardiana; P. kor = P. koraiensis; P. krem = P. krempfii; P. lamb = P. lambertiana; P. mass = P. massoniana; P. mono = P. monophylla; P. nel = P. nelsonii; P. sib = P. sibirica; P. tab = P. tabuliformis; P. taiw = P. taiwanensis; P. stro = P. strobus; P. thu = P. thunbergii

Fig 1

Gene map of the Pinus taeda plastid genome.

Thick lines in the red area indicate the extent of the inverted repeat regions (IRa and IRb; 850 bp), which separate the genome into small (SSC; 42,258 bp) and large (LSC; 77,614 bp) single copy regions. Genes drawn inside the circle are transcribed clockwise, and those outside are transcribed counter clockwise. Genes belonging to different functional groups are color-coded. The dark grey in the inner circle corresponds to the GC content and the light grey corresponds to the AT content.

Table 2

Genes in the sequenced P. taeda chloroplast genome.

CategoryGroup of genesName of genes
Self-replicationLarge subunit of ribosomal proteinsrpl2, 14, 16, 20, 22, 23, 32, 33, 36
Small subunit of ribosomal proteinsrps2, 3, 4, 7, 8, 11, 12, 14, 15, 18, 19
DNA-dependent RNA polymeraserpoA, B, C1, C2
rRNA genesRNA
tRNA genestrnA-UGC, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-UCC, trnH-GUG, trnI-CAU, trnI-GAU, trnK-UUU, trnL-CAA, trnL-UAA, trnL-UAG, trnM-CAU, trnN-GUU, trnP-GGG, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC, trnW-CCA, trnY-GUA
PhotosynthesisPhotosystem IpsaA, B, C, I, J, M
Photosystem IIpsbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z
Cytochrome b6/f complexpetA, B, D, G, L, N
ATP synthaseatpA, B, E, F, H, I
RubiscorbcL
Chlorophyll biosynthesischlB, L, N
Other genes
MaturasematK
ProteaseclpP
Envelop membrane proteincemA
Subunit acetyl-CoA-carboxylateaccD
c-Type cytochrome synthesis geneccsA
UnknownConserved open reading framesycf1, 2, 3, 4, 12, 68
Table 3

Genes with introns in the Pinus taeda chloroplast genome and length of exons and introns.

GeneLocationExon I (bp)Intron 1 (bp)Exon II (bp)Intron II (bp)Exon III (bp)
atpFLSC159740408
petBLSC6799648
petDLSC8698667
rpl2IR402668429
rpl16LSC9835396
rpoC1LSC4326741665
rps12114-23254026
ycf3LSC124726230709156
trnA-UGCIR3877035
trnI-GAUIR4297435
trnL-UAALSC5048835
trnK-UUULSC35330737
trnV-UACLSC3954137

Gene map of the Pinus taeda plastid genome.

Thick lines in the red area indicate the extent of the inverted repeat regions (IRa and IRb; 850 bp), which separate the genome into small (SSC; 42,258 bp) and large (LSC; 77,614 bp) single copy regions. Genes drawn inside the circle are transcribed clockwise, and those outside are transcribed counter clockwise. Genes belonging to different functional groups are color-coded. The dark grey in the inner circle corresponds to the GC content and the light grey corresponds to the AT content. P. tae = P. taeda; P. tae* = P. taeda (old); P.arm = P. armandii; P. bung = P. bungeana; P. cont = P. contorta; P. gerar = P. gerardiana; P. kor = P. koraiensis; P. krem = P. krempfii; P. lamb = P. lambertiana; P. mass = P. massoniana; P. mono = P. monophylla; P. nel = P. nelsonii; P. sib = P. sibirica; P. tab = P. tabuliformis; P. taiw = P. taiwanensis; P. stro = P. strobus; P. thu = P. thunbergii The protein coding regions containing 81 genes were 61,691 bp and accounted for 50.76% of the P. taeda cp genome. In the P. taeda cp genome, the gene proportion for tRNA was 2.18% and for rRNA it was 3.71%. A total of 43.35% of the non-coding region was composed of introns and intergenic spacers. The total protein-coding sequences encoded 20,563 codons (Table 4). The codon-usage frequency was calculated based on protein-coding and tRNA gene sequences (Table 5). Leucine was the most coded (2,067, 10.1%) and cysteine was the least coded (244, 1.2%) amino acid (Fig 2). Similar ratios for amino acids were found in previously reported cp genomes [71, 72]. The maximum GAA (835; 4.06%) and minimum TGC (65; 0.316%) codons used coded for glutamic acid and encoding cysteine, respectively. The A-T content was 50.6%, 59.99%, and 69.97% at the three consecutive codon positions (Table 4). The preference for the high A-T content at the 3rd codon position is similar to the A and T concentrations reported in various terrestrial plant cp genomes [72-74].
Table 4

Base compositions in the Pinus taeda chloroplast (cp) genome.

T/UCAGLength (bp)
Genome30.819.330.719.3121,531
LSC30.719.030.320.077,614
SSC31.319.531.018.342,258
IR31.120.231.117.6830
tRNA23.724.922.429.02661
rRNA18.823.626.431.14517
Protein coding genes30.518.130.520.961,691
1st position20.416.0330.2628.320,563
2nd position31.520.728.4918.220,563
3rd position38.1813.9431.7916.0720,563
Table 5

Codon–anticodon recognition pattern and codon usage for the Pinus taeda chloroplast genome.

Amino acidCodonNoRSCUtRNAAmino acidCodonNoRSCUtRNA
PheUUU13941.11TyrUAC5620.66trnY-GUA
PheUUC11080.89trnF-GAATyrUAU11371.34
LeuUUA8411.23trnL-UAAStopUAA7761.05
LeuUUG8151.19trnL-CAAStopUGA7811.06
LeuCUU8181.2StopUAG6620.89
LeuCUC5330.78CycUGC3780.9trnC-GCA
LeuCUA6420.94trnL-UAGTrpUGG6771trnW-CCA
LeuCUG4440.65HisCAU8391.43
IleAUU12331.09HisCAC3370.57trnH-GUG
IleAUC9630.85trnI-GAUGlnCAA8421.27trnQ-UUG
IleAUA11941.06trnI-CAUGlnCAG4810.73
MetAUG8071trn(f)M-CAUAsnAAU13181.34
ValGUU6521.29AsnAAC6440.66trnN-GUU
ValGUC3650.72trnV-GACLysAAA14441.3trnK-UUU
ValGUA6061.2trnV-UACLysAAG7700.7
ValGUG3910.78AspGAU9171.43
SerUCC7521.22trnS-GGAAspGAC3680.57trnD-GUC
SerUCA7671.25trnS-UGAGluGAA10431.33trnE-UUC
SerUCG4310.7GluGAG5290.67
ProCCU5161.11ArgCGU2780.67trnR-ACG
ProCCC4000.86trnP-GGGArgCGC1630.39
ProCCA6241.35trnP-UGGArgCGA4391.06
ProCCG3130.68ArgCGG2840.68
ThrACU4481.05SerAGU4990.81
ThrACC4971.17SerAGC3870.63trnS-GCU
ThrACA4411.03trnT-UGUArgAGA8211.97trnR-UCU
ThrACG3200.75ArgAGG5111.23
AlaGCU3971.38GlyGGU4560.99
AlaGCC2330.81GlyGGC2140.46trnG-GCC
AlaGCA3471.21trnA-UGCGlyGGA7281.57trnG-UCC
AlaGCG1720.6GlyGGG4510.98
Fig 2

Amino acid frequencies of the Pinus taeda chloroplast (cp) protein coding sequences.

The frequencies of amino acids were calculated for all 81 protein-coding genes from the start to the stop codon.

Amino acid frequencies of the Pinus taeda chloroplast (cp) protein coding sequences.

The frequencies of amino acids were calculated for all 81 protein-coding genes from the start to the stop codon.

Difference in gene contents of P. taeda

We selected 16 cp genomes in the Pinus genus (P. taeda (old), P. koraiensis, P. sibirica, P. armandii, P. lambertiana, P. krempfii, P. bungeana, P. gerardiana, P. monophylla, P. nelsonii, P. contorta, P. massoniana, P. tabuliformis, P. taiwanensis, P. strobus, and P. thunbergii) for comparison with P. taeda (new) (121,531 bp). Pinus taeda had the largest genome. The differentiation can be ascribed to the variation in size of LSC (Table 1). Analysis of known genes functions revealed that P. taeda shared 60 different protein-coding genes with 15 other Pinus species. Furthermore, pairwise alignment between the cp genome of P. taeda and six related cp genomes showed the highest synteny. Annotation of the P. taeda cp genome was used for plotting the total sequence identity of the six cp genomes of Pinus species in mVISTA (Fig 3). The results revealed high sequence identity with five species from the subgenus Pinus (P. contorta, P. massoniana, P. tabuliformis, P. taiwanensis, and P. thunbergii) compared to P. armandii from the subgenus Strobus. However, for all species, relatively lower identity was observed in various comparable genomic regions, particularly the trnK-UUU, matK, atpI, rpl16, petB, petD, ycf1, and ycf2 regions (Fig 3). Similarly, non-coding regions exhibited greater bifurcation than the coding-regions. Among the diverging regions, psbA-chlB, psbM- clpP, ycf4-accD, ycf3- psaA, psaC-ccsA, ndhH- psaC, ycf3-psaA, trnG-UUU- chlL, and petL- psbF were significant. The current findings agree with the results previously reported for these genes in angiosperm cp genomes [43, 72]. Our results confirmed similar variations among the coding-regions of the investigated species. This was also suggested by Kumar et al. [75]. Furthermore, comparison of the P. taeda whole cp genome with those of related species revealed lower SNP and indel substitutions for the subgenus Pinus cp genomes, which ranged from 809 in P. taeda (old) to 2,636 in P. thunbergii. However, the results revealed higher SNP and indel substitutions within the subgenus Strobus cp genomes, which ranged from 9,211 in P. gerardiana to 19,196 in P. monophylla (S2 Table). These results indicate the presence of interspecific mutations in the highly conservative cp genome that may be useful for analyzing genetic diversity and evolution. Similarly, we evaluated pairwise-sequence differentiation among the 16 pine species (S3 Table). The results showed that the P. taeda genome had 0.0274 average sequence divergences, high divergence was detected for P. nelsonii (0.0402), and P. taeda (old) had the lowest average sequence divergence (0.00321) followed by P. contorta (0.00807).
Fig 3

Visual alignment of plastid genomes from Pinus taeda and six other Pinus species (five from the subgenus Pinus and one from the subgenus Strobus).

VISTA-based identity plot showing sequence identity among seven species, using P. taeda as a reference.

Visual alignment of plastid genomes from Pinus taeda and six other Pinus species (five from the subgenus Pinus and one from the subgenus Strobus).

VISTA-based identity plot showing sequence identity among seven species, using P. taeda as a reference. The gene organization and gene contents of the cp genomes are generally conserved compared with those in the mitochondrial and nuclear genomes [76]. The cp genome organization and structure are extremely conserved in angiosperms, i.e. there is a distinctive quadripartite structure containing an SSC region and LSC region separated by a pair of inverted repeats [77]. In contrast, various genome rearrangements have been detected in various gymnosperms cp genomes [78, 79]. While the P. taeda cp genome shared some similar characteristics with other plants, we detected noticeable differentiation in numerous genes among gymnosperms. For example, significant divergence was noted in the gene content between P. taeda and other gymnosperms. For instance, in Cryptomeria japonica, eleven intact NADH dehydrogenase genes were identified, which were correlated to 5 other plant species [37], but were not present in the P. taeda and P. thunbergii cp genomes [37]. Previously, it was reported that the loss of NADH dehydrogenases was caused by specific mutations in the cp genome of Pinus [79]. In contrast, an essential gene, rps16, was completely absent from the P. taeda cp genome. Similar results were reported for the P. thunbergii and Marchantia polymorpha [36, 80] cp genomes, in addition to various terrestrial plants species, including Eucommia, Epifagus, Fugus, Malpighia, Krameria, Passiflora, Connarus, Linum, Turnera, Securidaca, Medicago, Selaginella, Viola, and Adonis [81-86]. In contrast, rps16 is present in the angiosperms Oryza sativa and E. globulus, in the fern Adiantum capillus, and in the gymnosperms C. japonica and C. taitungensis. However, the position of rps16 is different in gymnosperms from that in angiosperm cp genomes. The position is intermediate between chlB and trnK-UUU in the gymnosperm cp genomes and halfway between trnQ-UUG and trnK-UUU and between chlB and matK in angiosperms and ferns, respectively. Doyle et al. [83] suggested the functional transfer of rps16 to the nucleus from chloroplasts and the absence of this gene from various terrestrial plants. Furthermore, it was reported that the loss of rps16 and its functional transfer to the nucleus may have occurred autonomously in gymnosperms, particularly in coniferous species. trnR-CCG and trnP-GGG are also found in P. taeda cp genomes. These genes are reported as pseudo genes and are likely relics of cp genome evolution in mosses and gymnosperms [29, 87, 88]. trnP-GGG was previously reported in two gymnosperms, C. taitungensis and P. thunbergii, as well as in C. japonica, in the fern A. capillus and liverwort M. polymorpha, and but was absent from the cp genomes of angiosperms. This gene was also identified in Ginkgo and Gnetum [34], revealing that the gene is common in numerous gymnosperm species. Similarly, trnR-CCG in P. taeda was previously reported in C. taitungensis, A. capillus, P. thunbergii, and M. polymorpha. However, the absence of this gene in C. japonica and various cp genomes of angiosperms suggests that trnR-CCG is not well-maintained in the cp genomes of all gymnosperms and may have been lost in various taxa during plant evolution [79]. Furthermore, clpP, which encodes a proteolytic subunit of the ATP-dependent clpP protease, contains no intron in the P. taeda cp genome. Similar results were previously reported for P. thunbergii, P. mugo, P. dabeshanensis, and P. taiwanensis [37, 41, 68, 89]. In contrast, clpP is found in the cp genome of other land plants, such as A. capillus, E. globulus, M. polymorpha, and C. taitungensis with two or three exons [29]. However, in the P. taeda cp genome, only the clpP second exon remained, and as such, it occurs as a pseudogene. Similarly, the rpl20 and clpP order is conserved in the P. taeda cp genome and clpP is co-transcribed with the 5’-end of rps12 and rpl20, as reported previously for the cp genomes of various gymnosperms [90, 91] [92]. accD encodes acetyl-CoA-carboxylase and has been found in the P. taeda cp genome. The reading frame length of accD was similar to that of the cp genomes of other Pinaceae members and has 321 codons, which is fewer than that in C. japonica (700 codons) and more than the 309 codons of A. capillus and 316 codons of M. polymorpha. Furthermore, in angiosperms, particularly monocots, the reading-frame size of accD has been reduced from 106 codons in Oryza sativa to none in Zea mays. This has also been suggested as reason for the loss of accD in monocot plant species [93]. In contrast, the accD reading-frame in gymnosperms, particularly in coniferous species and C. japonica, may have diverted in the ascending direction.

Loss of large IR region within the P. taeda cp genome

The large inverted repeat regions, which have been reported in various land plant cp genomes, were reduced to two very short inverted repeat (IRa and IRb) regions of 830 bp in P. taeda, and were separated by a SSC region of 42,258 bp and LSC region of 77,614 bp (Fig 1). However, in the previously sequenced P. taeda cp genome submitted to NCBI, the short inverted repeat regions were 693 bp (Table 1). Similar results were observed in other Pinaceae members, such as P. taiwanensis, P. armandii, and P. dabeshanensis, where the inverted repeat sizes were reduced to 513, 475, and 473 bp, respectively [68, 69, 89]. The IR of P. taeda contained duplicated psaM and trnS-GCU and partial ycf12, apparently caused by incomplete loss of the large IR, as reported previously for various gymnosperms [36, 37]. Detailed comparison of four junctions (JLA, JLB, JSA, and JSB) between the two IRs (IRa and IRb) and two single-copy regions (LSC and SSC) was performed between Pinus species (P. contorta, P. tabuliformis, P. massoniana, P. taiwanensis, and P. thunbergii) and P. taeda by carefully analyzing the exact IR border positions and adjacent genes (Fig 4). Some IR expansion and contraction were observed in the P. taeda cp genome compared to that of the other five Pinus species, which ranged from 358 bp (P. contorta) to 845 bp (P. tabuliformis) (Fig 4). The genes marking the beginning and end of the IRs were only partially duplicated. psbI in P. taeda was located 9 bp from JLB in the LSC region. In P. contorta, P. tabuliformis, and P. taeda (old), this distance was 6 bp, whereas in P. massoniana and P. taiwanensis the distances were 26 and 338 bp, respectively. However, variation was found in P. thunbergii, and rpl23 was 100 bp away from JLB in the LSC region. Similarly, hypothetical chloroplast ycf12 was partially duplicated by 47 bp (P. taeda) and 35 bp in P. tabuliformis. However, in P. massoniana, ycf12 was located in the SSC region, 385 bp away from JSB. In P. taeda and P. tabuliformis, JLA was located between psaM and psbB and the difference in distance between psaM and JLA was 395 bp. However, in P. contorta and P. taiwanensis, psaM was located in the SSC region, whereas in P. massoniana, it was located at the JSA border (Fig 4). Similarly, in P. taeda, P. contorta, P. tabuliformis, P. massoniana, and P. taiwanensis, psbB was located in the LSC region at 478, 477, 505, 526, and 843 bp away from the JLA border, respectively.
Fig 4

Distance between adjacent genes and junctions of the small single-copy (SSC), large single-copy (LSC), and two inverted repeat (IR) regions among plastid genomes from six Pinus species.

Boxes above and below the main line indicate the adjacent border genes. The figure is not to scale regarding sequence length, and only shows relative changes at or near the IR/SC borders.

Distance between adjacent genes and junctions of the small single-copy (SSC), large single-copy (LSC), and two inverted repeat (IR) regions among plastid genomes from six Pinus species.

Boxes above and below the main line indicate the adjacent border genes. The figure is not to scale regarding sequence length, and only shows relative changes at or near the IR/SC borders. Large IRs play a significant role in stabilizing and maintaining the conserved structure of the cp genomes [94]. Various studies have reported that during the evolutionary process of angiosperms, a copy of an IR was lost, particularly in the subfamily Papilionoideae [95-97], and rearrangement in the chloroplast genome was observed because of IR loss in these genomes as compared to cp genomes with normal IRs [94]. Similarly, in gymnosperms, complete IRs were lost in conifers, particularly in cupressophytes and Pinaceae cp genomes, and greater rearrangement was observed in these genomes compared to in higher plants [33]. The remaining IR parts in various Pinaceae member and cupressophyte cp genomes were shown to differ, suggesting that these two conifer clades lost their large IRs independently during evolution from a common ancestor [78, 98]. Previously, it was reported that specific repeats in Pinaceae replaced the reduced IRs [99]. Compared to other conifers, a greater number of rearrangements occurred in Pseudotsuga menziesii and P. radiate cp genomes because of the lack of a large IR in these cp genomes [33]. Therefore, variation in the genome structure between P. taeda and related terrestrial plants, such as C. japonica, suggest that an IR is essential for structural stability of the cp genome.

Repeat analysis

Repeat analysis of the P. taeda cp genome revealed six palindromic repeats, 34 forward repeats, and 22 tandem repeats (S1 Fig and Table 6). Among these, three forward repeats were 45–59 bp in length, with 14 tandem repeats of 15–29 bp in length (S1 Fig). Additionally, two palindromic repeats were 75–89 bp and four repeats were >90 bp (S1 Fig). Overall, 62 repeats were found in the P. taeda cp genome. Among tandem repeats, 12 repeats were in coding regions, eight repeats in intergenic regions, one repeat extending from an intergenic region into a coding region, and one repeat in the petB intron region (Table 7). The length of tandem repeats in these regions varied between eight and 14, and up to 10 repeat units were present. Various numbers of repeats have been identified in conifer cp genomes [100, 101] and the mechanisms implicit in the origin of these tandem repeats remain unclear. Nevertheless, they are known to be associated with chloroplast DNA rearrangement [102], gene expansion [100, 101], and gene duplication [103]. Previous reports suggested that repeat sequences, which play a role in genome rearrangement, are very helpful in phylogenetic studies [74, 104]. Furthermore, analyses of different cp genomes revealed that repeat sequences are important causes of indels and substitutions [101]. Sequence variation and cp genome re-arrangement occurs because of the slipped strand mis-pairing and improper recombination of repeat sequences [104-106]. The presence of such repeats shows that the locus is an important hotspot for cp genome re-configuration [74, 107]. In addition, such repeats contain crucial information for developing genetic markers for phylogenetic and population studies [74].
Table 6

Repeat sequences in the Pinus taeda chloroplast genome.

Repeat typeRepeat sizeRepeat Position 1Repeat location 1Repeat Position 2Repeat location 2
P8308692psbl-psbM-ycf1251,779ycf12-psbM
P39966,445psbA-atpF121,132IGS
P30450,503IGS120,845IGS
P27750,530IGS120,845IGS
P860psbA66,359psbA
P799017IGS52,205psbM-IGS
F800175psbA1815IGS
F376109,649ycf2120,134ycf2
F28850,861IGS84,618IGS
F28450,843IGS84,600IGS
F27550,825IGS84,582IGS
F24751,131rps470,403rps4
F18550,964IGS84,721IGS
F17151,207rps470,479rps4
F165100,638ycf1100,659ycf1
F124101,059IGS-ycf1101,068IGS-ycf1
F979677IGS30,444IGS
F97101,059IGS-ycf1101,113IGS-ycf1
F859737IGS30,504IGS
F70100,733ycf1100,754ycf1
F799017IGS52,205psbM
F739701IGS30,468IGS
F71100,638ycf1100,701ycf1
F70100,712ycf1100,754IGS
F70101,059IGS-ycf1101,122ycf1
F70101,086ycf1101,140ycf1
F6293,524IGS93,579IGS
F69115,329ycf2115,395ycf2
F719777ycf130,544IGS
F71101,086ycf1101,149ycf1
F70101,077ycf1101,140ycf1
F699714IGS30,481IGS
F5871,811IGS71,831IGS
F67101,149ycf1101,167ycf1
F61101,059ycf1101,131ycf1
F64101,057ycf1101,138ycf1
F63101,057ycf1101,147ycf1
F59101,043ycf1101,133ycf1
F55100,895ycf1 intron100,976ycf1 intron
F61101,068ycf1101,149ycf1
Table 7

Tandem repeat sequences in the Pinus taeda chloroplast genome.

Serial NoIndicesRepeat LengthSize of repeat unit × Copy numberACGTLocation
19274–9310362 × 1816161650PsaM/ycf12 (IGS)
215,199–15,235362 × 184482323atpI (CDS)
320,648–20,678302 × 1550102020rpoC2 (CDS)
428,466–28,534682 × 3430241233petN/psbM (IGS)
531,275–31,313382 × 1923133626clpP/IGS
633,103–33,166633 × 2129161933rps18 (CDS)
743,597–43,625282 × 144601043accD/rbcL (IGS)
843,615–43,659442 × 224012838accD/rbcL (IGS)
945,578–45,620422 × 213122441rbcL/atpB (IGS)
1051,993–52,029362 × 1850161616ycf12/psbM (IGS)
1156,031–56,069382 × 1918121257petB (intron)
1293,544–93,631873 × 2937161035ycf68/chlL (IGS)
1393,525–93,6351102 × 5535151136ycf68/chlL (IGS)
1497,002–97,056542 × 2728202426ycf1(CDS)
15100,583–100,631482 × 245491816ycf1(CDS)
16100,639–100,8281899 × 214592816ycf1(CDS)
17100,827–101,0251986 × 333114323ycf1(CDS)
18100,866–101,01615010 × 153014423ycf1(CDS)
19100,827–101,9531262 × 633114323ycf1(CDS)
20100,823–101,9851622 × 813224222ycf1(CDS)
21100,939–101,0471082 × 543443822ycf1(CDS)
22115,330–115,4521222 × 6621221145ycf2 (CDS)

SSR analysis

SSRs are repeating sequences of typically 1–6 bp that are distributed throughout the genome. SSRs generally have a high mutation rate compared to neutral DNA regions because of slipped-strand mispairing. Because these short repeats are uniparentally inherited and haploid, they can be used as molecular markers in genetic studies analyzing population structures [108, 109]. In this study, we detected perfect SSRs in the P. taeda cp genome (Fig 5). Specific attributes were set for the analysis because SSRs (10 bp or longer) are exposed to slipped strand mis-pairing, the main mechanism of SSR polymorphisms [110-112]. A total of 151 perfect microsatellites were found in the P. taeda cp genome (Fig 5). Most (71) SSRs in this cp genome possessed a mononucleotide repeat motif. Dinucleotide SSRs were the second most common repeat motif (Fig 5B). Using our search criterion, four tetranucleotide SSRs and one hexanucleotide SSR were detected in the P. taeda cp genome (Fig 5A). In P. taeda, most mononucleotide SSRs were A (92.5%) and C (8.45%) motifs, with most dinucleotide SSRs being A/T (47.3%) and A/G (52.63%) motifs (Fig 5B and Table 8). Approximately 59.60% of SSRs were in non-coding regions, approximately 2.64% were present in rRNA sequences, and 1.98% were in tRNA genes (Fig 5A). These results are similar to those of previous reports showing that SSRs were unevenly distributed in cp genomes, and these findings may provide more information for selecting effective molecular markers for detecting intra- and interspecific polymorphisms [113-116]. Furthermore, analysis of various gymnosperm cp genomes revealed that most mononucleotides and dinucleotides are composed of A and T, which may contribute to bias in base composition, which is consistent with other cp genomes [117-119]. For SSR identification, although different criteria and algorithms were used, their distribution and characteristics were similar to the cp genomes of conifers [71, 119], 30 asterid [72], and 14 monocot [112]. Our findings were comparable to those of previous reports in which SSRs in cp genomes were found to be largely composed of polythymine (polyT) or polyadenine (polyA) repeats, and infrequently contained tandem cytosine (C) and guanine (G) repeats [118, 120]. Therefore, these SSRs contributed to the A-T richness of the P. taeda cp genome, which was also previously observed in the cp genomes of plant species [43, 71, 120]. The SSRs identified in the cp genome of P. taeda can be evaluated for polymorphisms at the intra-specific levels and used as markers for evaluating the genetic diversity of wild populations of plants from the Pinaceae family.
Fig 5

Analysis of simple sequence repeat (SSR) in the Pinus taeda plastid genome.

A, Number of SSR types in complete genome, coding, and non-coding regions; B, Frequency of identified SSR motifs in different repeat class types.

Table 8

Simple sequence repeats (SSRs) in the Pinus taeda chloroplast genome.

UnitLengthNoSSR start
A1521375, 28,440
14368,741, 72,734, 106,240
12210,316, 110,251
11410,755, 26,980, 109,368, 11,873
10816,119, 22,252, 48,967, 83,427, 86,798, 88,062, 102,308, 111,412
91540,699, 41,827, 45,769, 70,952, 80,498, 80,744, 95,259, 102,053,108,265, 110,985, 112,374, 113,688, 117,432, 119,716, 120,740
8314819, 10,738, 10,950, 16,110, 17,113, 30,189, 30,427, 30,701, 31,373, 33,345, 38,678, 41,893, 50,753, 51,485, 52622, 55,355, 56,042, 63,021, 64,394, 64,437, 92,458, 94,554, 95,822, 97,307, 103,868, 108,971, 114,282, 117065, 118885, 119,819, 120,893
C9416,101, 22,497, 71,353, 105,552
8231,381, 120,721
AT13141,344
10426,392, 96,162, 104,388, 113,787
9619,814, 24,397, 34,072, 42,422, 48,777, 74,253
8719,352, 19,904, 80,532, 83,639, 99,803, 105,218, 110,933
AG9108774, 22,311, 26,631, 47,568, 51,573, 52,520, 65,195,79,220, 80,699, 106,488,
81014,675, 22,384, 30,793, 42,926, 51,556, 69,139, 75,721, 83,721, 90,777, 91,093
AAT11178,353
10142,354
9813,934, 49,935, 65,369, 66,308, 71,749, 94,150, 98,727, 109,563
AAG1053167, 22,135, 106,110, 108,709, 120,693
9528,380, 79,051, 79,226, 81,004, 100,527
ATC10177,667
962957, 16,215, 21,127, 75,445, 77,964, 111,780
AAC9132,982
ACT9243,692, 94,864
AGC9243,798, 89,223
ACC9254,293, 94,538
AGG9260,538, 80,037
CCG91
ATCC17148,863
ACCT14190,739
AGAT13151,753
AAAT12142,147
AAGAGG231117,038

Analysis of simple sequence repeat (SSR) in the Pinus taeda plastid genome.

A, Number of SSR types in complete genome, coding, and non-coding regions; B, Frequency of identified SSR motifs in different repeat class types.

Phylogenetic analysis

In plants, the cp genome is a valuable resource for exploring intra- and interspecific evolutionary histories [121-127]. Compared to nuclear genomes in chloroplasts, the uniparental inheritance (for exceptions, see [122, 128]) is systematically striking because a single, independent genealogical history can be readily obtained for developing hypotheses [129-131]. Moreover, in some land plants (a few flowering plant lineages and conifers), the chloroplast is paternally inherited and independent of the nuclear and mitochondrial genome [132]. Recently, cp genomes have shown significant power in phylogenetic, evolution, and molecular systematics studies. During the last decade, various analyses have revealed the phylogenetic relationships at deep nodes based on comparisons of multiple protein coding genes, intergenic spacers [133, 134], and complete genome sequences in chloroplast genomes [135] that have enhanced our understanding of the evolutionary relationships among angiosperms and gymnosperms. According to the most recent classification, the genus Pinus is comprised of approximately 110 species and is shared by two subgenera, Strobus and Pinus, which are divided into further sections [136]. Furthermore, some evolutionary hypotheses suggest that the subgenera Strobus and Pinus originated from the Eocene [137, 138], whereas others indicated these subgenera were already present during the Cretaceous [138-140]. The Pinus subgenus has undergone significant distributional as well as environmental changes during their evolution, such as moving multiple times between America and Eurasia [140]. Chloroplast DNA polymorphisms in P. taeda have been used in numerous studies to assess paternal inheritance lineage and cytoplasmic diversity [141-146]. Continued efforts have expanded our ability to differentiate and understand the genomic structure and phylogenetic relationships of Pinus species [147]. The phylogeny and taxonomy of Pinus species have largely relied on chloroplast markers [140, 148, 149]. However, compared to nuclear genes, these markers are linked and offer independent information on species phylogeny. Previously, the phylogenetic study of pine based on multiple nuclear genes was reported by Syring et al. [150], where four low-copy nuclear loci were analyzed in 12 pine species and combined with internal transcribed spacers and chloroplast data. Various studies revealed that the addition of more genes increased the chance for improving the phylogenetic tree [151-153]. However, this does not resolve all phylogenetic problems [154, 155]. Complete genome sequencing provides detailed insight into an organism [43, 66, 156]. In this study, the phylogenetic position of P. taeda within the Pinus genus was established by employing the complete cp genome and 60 shared genes of 16 species. Phylogenetic analyses using Bayesian inference, maximum parsimony, maximum likelihood, and neighbor-joining methods were performed. The phylogenetic analysis revealed that the complete dataset and 60 shared genes of P. taeda contained the same phylogenetic signals. In the datasets for the genome and 60 shared genes, P. taeda formed a single clade with P. contorta with high Bayesian interference and bootstrap support using the four different methods (Fig 6 and S2 Fig). Moreover, tree topology confirmed the relationship inferred from the phylogenetic work previously conducted based on cp genomes [89, 141, 157], in which P. taeda was genetically similar to P. contorta. These results revealed good agreement with classical taxonomy, where similar concordance was observed in the cp genome and mitochondrial genome-based reconstructions of Pinus phylogeny [136, 140]. Furthermore, these results are in broad agreement with previous results reported by Niu et al., where P. taeda formed a single clade with P. contorta based on pairwise non-synonymous substitution rates of orthologous transcripts [158]. Additionally, the results suggest that there is no conflict between the entire genome dataset and 60 shared genes in these cp genomes.
Fig 6

Phylogenetic trees of 15 Pinus species.

The entire genome dataset was analyzed using four different methods: Bayesian inference (BI), maximum parsimony (MP), maximum likelihood (ML), and neighbor-joining (NJ). Numbers above the branches represent bootstrap values in the MP, ML, and NJ trees and posterior probabilities in the BI trees, whereas the number below the branches represents branch length. The red dot represents the position of P. taeda (KY964286).

Phylogenetic trees of 15 Pinus species.

The entire genome dataset was analyzed using four different methods: Bayesian inference (BI), maximum parsimony (MP), maximum likelihood (ML), and neighbor-joining (NJ). Numbers above the branches represent bootstrap values in the MP, ML, and NJ trees and posterior probabilities in the BI trees, whereas the number below the branches represents branch length. The red dot represents the position of P. taeda (KY964286).

Conclusion

The current study determined the complete genome sequence of the chloroplast from P. taeda (121,531 bp). The gene order and genome structure of P. taeda was similar to that of cp genomes of other Pinus species. Furthermore, the distribution and location of repeat sequences were determined, and average pairwise sequence divergences among cp genomes of related species were identified. SSR, SNP, and phylogenetic analyses were performed on 16 Pinus species cp genomes. No major structural rearrangement of Pinus species cp genomes was observed. Phylogenetic analyses revealed that the dataset based on 60 shared genes and that of the entire genome generated trees with the same topologies regarding the placement of P. taeda. Such investigations are an essential source of important information on the complete cp genome of P. taeda and related species, which can be used to facilitate biological study, identify species, and clarify taxonomic questions.

Primers used for gap closing and sequencing verification in Pinus taeda.

(DOCX) Click here for additional data file.

Indel and SNP analysis of plastid genomes from Pinus taeda and 15 other Pinus species.

(XLSX) Click here for additional data file.

Average pairwise distance of plastid sequences from Pinus taeda and 15 other Pinus species.

(XLS) Click here for additional data file.

Analysis of repeated sequences in Pinus taeda plastid genome.

Total forward, tandem, and palindromic repeat sequences in the genome and their length distributions. (TIF) Click here for additional data file.

Phylogenetic trees were constructed for 15 species in the genus Pinus using different methods and the Bayesian tree is shown for the entire genome sequence.

Data for 60 shared genes were used with four different methods: Bayesian inference (BI), maximum parsimony (MP), maximum likelihood (ML), and neighbor-joining (NJ). Numbers above the branches represent bootstrap values in the MP, ML, and NJ trees and posterior probabilities in the BI trees. The red dot represents the position of P. taeda (KY964286). (TIF) Click here for additional data file.
  125 in total

1.  Glacial refugia: hotspots but not melting pots of genetic diversity.

Authors:  Rémy J Petit; Itziar Aguinagalde; Jacques-Louis de Beaulieu; Christiane Bittkau; Simon Brewer; Rachid Cheddadi; Richard Ennos; Silvia Fineschi; Delphine Grivet; Martin Lascoux; Aparajita Mohanty; Gerhard Müller-Starck; Brigitte Demesure-Musch; Anna Palmé; Juan Pedro Martín; Sarah Rendell; Giovanni G Vendramin
Journal:  Science       Date:  2003-06-06       Impact factor: 47.728

2.  Complete nucleotide sequence of the chloroplast genome from the Tasmanian blue gum, Eucalyptus globulus (Myrtaceae).

Authors:  Dorothy A Steane
Journal:  DNA Res       Date:  2005       Impact factor: 4.458

3.  Selection on the codon bias of chloroplast and cyanelle genes in different plant and algal lineages.

Authors:  B R Morton
Journal:  J Mol Evol       Date:  1998-04       Impact factor: 2.395

4.  Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales' closest relatives are conifers.

Authors:  L M Bowe; G Coat; C W dePamphilis
Journal:  Proc Natl Acad Sci U S A       Date:  2000-04-11       Impact factor: 11.205

5.  Polymorphic simple sequence repeat regions in chloroplast genomes: applications to the population genetics of pines.

Authors:  W Powell; M Morgante; R McDevitt; G G Vendramin; J A Rafalski
Journal:  Proc Natl Acad Sci U S A       Date:  1995-08-15       Impact factor: 11.205

6.  A physical, genetic and functional sequence assembly of the barley genome.

Authors:  Klaus F X Mayer; Robbie Waugh; John W S Brown; Alan Schulman; Peter Langridge; Matthias Platzer; Geoffrey B Fincher; Gary J Muehlbauer; Kazuhiro Sato; Timothy J Close; Roger P Wise; Nils Stein
Journal:  Nature       Date:  2012-10-17       Impact factor: 49.962

7.  Paternally inherited chloroplast polymorphism in Pinus: estimation of diversity and population subdivision, and tests of disequilibrium with a maternally inherited mitochondrial polymorphism.

Authors:  J Dong; D B Wagner
Journal:  Genetics       Date:  1994-03       Impact factor: 4.562

8.  Relationships of wild and domesticated rices (Oryza AA genome species) based upon whole chloroplast genome sequences.

Authors:  Peterson W Wambugu; Marta Brozynska; Agnelo Furtado; Daniel L Waters; Robert J Henry
Journal:  Sci Rep       Date:  2015-09-10       Impact factor: 4.379

Review 9.  Chloroplast genomes: diversity, evolution, and applications in genetic engineering.

Authors:  Henry Daniell; Choun-Sea Lin; Ming Yu; Wan-Jung Chang
Journal:  Genome Biol       Date:  2016-06-23       Impact factor: 13.583

10.  Next-generation sequencing of flow-sorted wheat chromosome 5D reveals lineage-specific translocations and widespread gene duplications.

Authors:  Stuart J Lucas; Bala Anı Akpınar; Hana Šimková; Marie Kubaláková; Jaroslav Doležel; Hikmet Budak
Journal:  BMC Genomics       Date:  2014-12-09       Impact factor: 3.969

View more
  14 in total

1.  The first complete chloroplast genome of Vicatia thibetica de Boiss.: genome features, comparative analysis, and phylogenetic relationships.

Authors:  Yun-Hui Guan; Wen-Wen Liu; Bao-Zhong Duan; Hai-Zhu Zhang; Xu-Bing Chen; Ying Wang; Cong-Long Xia
Journal:  Physiol Mol Biol Plants       Date:  2022-03-04

2.  A comparative analysis of the complete chloroplast genomes of three Chrysanthemum boreale strains.

Authors:  Swati Tyagi; Jae-A Jung; Jung Sun Kim; So Youn Won
Journal:  PeerJ       Date:  2020-07-03       Impact factor: 2.984

3.  The complete chloroplast genome sequences of three Spondias species reveal close relationship among the species.

Authors:  Vanessa Santos; Cícero Almeida
Journal:  Genet Mol Biol       Date:  2019-03-11       Impact factor: 1.771

4.  Araucaria angustifolia chloroplast genome sequence and its relation to other Araucariaceae.

Authors:  José Henrique S G Brandão; Nureyev F Rodrigues; Maria Eguiluz; Frank Guzman; Rogerio Margis
Journal:  Genet Mol Biol       Date:  2019-11-14       Impact factor: 1.771

5.  Mangrove tree (Avicennia marina): insight into chloroplast genome evolutionary divergence and its comparison with related species from family Acanthaceae.

Authors:  Sajjad Asaf; Abdul Latif Khan; Muhammad Numan; Ahmed Al-Harrasi
Journal:  Sci Rep       Date:  2021-02-11       Impact factor: 4.379

6.  The complete chloroplast genomes of three Hamamelidaceae species: Comparative and phylogenetic analyses.

Authors:  NingJie Wang; ShuiFei Chen; Lei Xie; Lu Wang; YueYao Feng; Ting Lv; YanMing Fang; Hui Ding
Journal:  Ecol Evol       Date:  2022-02-16       Impact factor: 2.912

7.  Comparative analyses of chloroplast genomes in 'Red Fuji' apples: low rate of chloroplast genome mutations.

Authors:  Haoyu Miao; Jinbo Bao; Xueli Li; Zhijie Ding; Xinmin Tian
Journal:  PeerJ       Date:  2022-02-21       Impact factor: 2.984

8.  Phylogenetic analysis of Fritillaria cirrhosa D. Don and its closely related species based on complete chloroplast genomes.

Authors:  Qi Chen; Xiaobo Wu; Dequan Zhang
Journal:  PeerJ       Date:  2019-08-21       Impact factor: 2.984

9.  New Insight into Taxonomy of European Mountain Pines, Pinus mugo Complex, Based on Complete Chloroplast Genomes Sequencing.

Authors:  Joanna Sokołowska; Hanna Fuchs; Konrad Celiński
Journal:  Plants (Basel)       Date:  2021-06-29

10.  Complete Chloroplast Genomes of Fagus sylvatica L. Reveal Sequence Conservation in the Inverted Repeat and the Presence of Allelic Variation in NUPTs.

Authors:  Bartosz Ulaszewski; Joanna Meger; Bagdevi Mishra; Marco Thines; Jarosław Burczyk
Journal:  Genes (Basel)       Date:  2021-08-29       Impact factor: 4.096

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.