Literature DB >> 30285615

Transcriptome sequencing of Pinus kesiya var. langbianensis and comparative analysis in the Pinus phylogeny.

You-Jie Zhao1,2, Yong Cao2, Juan Wang3, Zhi Xiong4.   

Abstract

BACKGROUND: Pines are widely distributed in the Northern Hemisphere and have a long evolutionary history. The availability of transcriptome data has facilitated comparative transcriptomics for studying the evolutionary patterns associated with the different geographical distributions of species in the Pinus phylogeny.
RESULTS: The transcriptome of Pinus kesiya var. langbianensis was sequenced using the Illumina HiSeq 2000 platform, and a total of 68,881 unigenes were assembled by Trinity. Transcriptome sequences of another 12 conifer species were downloaded from public databases. All of the pairwise orthologues were identified by comparative transcriptome analysis in 13 conifer species, from which the rate of diversification was calculated and a phylogenetic tree inferred. All of the fast-evolving positive selection sequences were identified, and some salt-, drought-, and abscisic acid-resistance genes were discovered.
CONCLUSIONS: mRNA sequences of P. kesiya var. langbianensis were obtained by transcriptome sequencing, and a large number of simple sequence repeat and short nucleotide polymorphism loci were detected. These data can be used in molecular marker-assisted selected in pine breeding. Divergence times were estimated in the 13 conifer species using comparative transcriptomic analysis. A number of positive selection genes were found to be related to environmental factors. Salt- and abscisic acid-related genes exhibited different selection patterns between coastal and inland Pinus. Our findings help elucidate speciation patterns in the Pinus lineage.

Entities:  

Keywords:  Comparative transcriptomics; Pinus kesiya var. langbianensis; Pinus phylogeny; Transcriptome sequencing

Mesh:

Year:  2018        PMID: 30285615      PMCID: PMC6171231          DOI: 10.1186/s12864-018-5127-6

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Pines (Pinus) are widely distributed in the Northern Hemisphere and are the largest extant genus of conifers, constituting the most important source of wood in the forestry industry [1, 2]. The 115 species of Pinus are divided into two subgenera [3] and possess a rich fossil record dating back as much as 130–140 million years ago (Mya) [4, 5]. Many studies have focused on this genus, particularly with regards to its phylogenetic relationships [6, 7] and the timing of diversification events [6, 8–10]. However, no study has examined how pines adapted to their varied ecological environments over evolutionary history. Transcriptome sequencing technology can obtain all of the RNA information of an organism at a point in time, thereby providing a large amount of information for molecular biology studies [11-13]. In particular, Illumina SOLEXA sequencing has been widely used for the excavation and discovery of functional genes. Transcriptome sequencing can also obtain a large number of single nucleotide polymorphism (SNP), simple sequence repeat (SSR), and other molecular markers [14, 15]. Molecular markers with good polymorphism can help researchers shorten the genetic breeding cycle [16]. As increasing numbers of species have been included in transcriptome sequencing projects, comparative transcriptomics has received greater attention from researchers [9, 17–19]. Comparative transcriptomics can elucidate the phylogenetic relationships of multiple species and can assess the functional differences between orthologous genes following divergence. The functional differences between orthologous genes constitute important evidence for studying patterns of evolution in different Pinus species from different geographical regions. Pinus species have adapted to different habitats, including low-latitude and high-latitude environments, and coastal and inland areas (Table 3). In Eurasia, P. sylvestris is mainly distributed in inland and high-latitude regions, whereas P. pinea, P. halepensis, and P. pinaster are coastal species distributed in the Mediterranean region and southern Europe. In North America, P. banksiana is distributed in the inland and high-latitude areas of North American and Canada, whereas P. taeda and P. palustris occur in the coastal areas of the southeastern United States. In Asia, P. tabuliformis is an inland and high-latitude Pinus species that occurs mainly in northern China.
Table 3

Transcriptome sequencing in 13 conifer species

Pinus/Picea spp.Main distributionData sourceNumber of UnigenesTotal length(bp)Mean length(bp)
P. kesiya Southwestern China southeastern AsianIllumina Sequencing68,88158,975,225856
P. tabuliformis Northern China northern KoreaNCBI SRA(SRR546476)10,2858,669,639843
P. sylvestris Middle and high latitudes of EurasiaPlantGDB73,60929,552,311401
P. pinea Mediterranean region southern EuropeNCBI SRA(SRR445497, SRR445498)11,40310,122,178888
P. halepensis Mediterranean regionNCBI SRA(SRR942848)72,02847,224,748656
P. pinaster Mediterranean regionPlantGDB15,64811,498,176735
P. contorta Western North AmericaPlantGDB13,57015,334,6001130
P. banksiana North American CanadaPlantGDB13,04014,756,1171132
P. taeda Southeastern United StatesTreegenesdb (v2.01)91,04675,408,866828
P. palustris Southeastern United StatesNCBI SRA(SRR065012)15,01314,111,464940
P. monticola Mountains of western United States and CanadaNCBI SRA(SRR1013836)86,23033,420,814388
P. lambertian Mountains of western North AmericaNCBI SRA(SRR064207)30,98133,041,7891067
Picea glauca Northern United States southern/central CanadaPlantGDB48,61954,962,8811130
Pinus kesiya var. langbianensis (P. kesiya var. langbianensis) is a coastal species in Asia, and it is mainly distributed in southwest China and southeast Asia. In this study, we performed transcriptome sequencing of P. kesiya var. langbianensis using the Illumina HiSeq2000 platform. Following sequence assembling and analysis, unigenes and some molecular markers in P. kesiya var. langbianensis were obtained. Comparative transcriptomics was subsequently used in 13 conifer species. A number of positive selection genes were determined to be related to environmental factors in the Pinus species from different geographical regions.

Results

Transcriptome sequencing and de novo assembly

The cDNA samples obtained from pine needles of P. kesiya var. langbianensis and other individuals were normalized to increase the sequencing efficiency of rare transcripts. A total of about 141 million reads with an average insert size of 200 bp were generated from the Illumina HiSeq 2000 platform. After cleaning and quality control, about 133 million high-quality reads were obtained with a total length of 11.99 Gb (Table 1). Cleaned reads were de novo assembled to 68,881 unigenes with a mean size of 821 bp, representing a total length of about 42.5 Mb and a mean size of 713 bp. More than half of the total assembly length of the unigenes was > 1402 bp (N50 = 1402). The coverage distribution revealed that most unigenes had a read-depth coverage of 20-fold, and 29% unigenes were up to 100-fold (Additional file 1: Figure S1).
Table 1

Transcriptome sequencing and assembly of P. kesiya var. langbianensis transcriptome

Sequencing resultsAssembly results
Total number of raw reads141,234,042Total number of unigenes68,881
Total number of clean reads133,236,316Total unigenes length (bp)56,537,842
Total clean Nucleotides (bp)11,991,268,440Contig N50 (bp)1402
Q20 (%)98.25%Mean length (bp)821
N content(%)0.01%Max length (bp)15,498
GC content (%)45.02%Min length (bp)200
Transcriptome sequencing and assembly of P. kesiya var. langbianensis transcriptome

Functional annotation of the transcriptome

A total of 48,035 (69.74%) P. kesiya var. langbianensis unigenes were significantly matched to known genes in the public databases (Table 2). Approximately 37,757 unigenes were aligned to the NCBI Non-redundant (Nr) protein database. We examined the taxonomic distribution of the BLASTX best hits group by genus. As a result, only 8482 unigenes had specific functional annotations, 28.18% were within Picea, and 7.99% were within Pinus (Fig. 1). About 23,491 unigenes (34.10%) were assigned to 42 functional groups based on Gene Ontology (GO) assignments (Additional file 2: Table S1), including biochemistry, growth, development, metabolism, apoptosis, and immune defense.
Table 2

Unigene annotation of P. kesiya var. langbianensis transcriptome

UnigenesNRNTSwiss-ProtKEGGCOGGOALL
68,88137,75742,86924,09321,43313,52823,49148,035
Fig. 1

Summary and taxonomic source of the BLASTX matches to unigenes. Number of unique best BLASTX matches of unigenes grouped by genus

Unigene annotation of P. kesiya var. langbianensis transcriptome Summary and taxonomic source of the BLASTX matches to unigenes. Number of unique best BLASTX matches of unigenes grouped by genus

Identification of SSRs and SNPs

A total of 2382 distinct loci were identified, and the incidences of different repeat types were determined (Fig. 2). Among the different classes of SSRs, the tri-nucleotide repeats were the most abundant (40.22%), followed by di-nucleotides (28.76%). Similar results were found in other conifer species, with the exception of P. pinaster (Additional file 2: Table S2). More than 94,545 SNPs were identified from the P. kesiya var. langbianensis transcriptome (Fig. 3). Among all of the SNPs, transitions (61.61%) were more frequent than transversions (38.39%). A and G were the most frequent SNPs (31.20%), and A and G were the second-most frequent (30.41%).
Fig. 2

Distribution of SSRs in the P. kesiya var. langbianensis transcriptome

Fig. 3

Distribution of SNPs in the P. kesiya var. langbianensis transcriptome

Distribution of SSRs in the P. kesiya var. langbianensis transcriptome Distribution of SNPs in the P. kesiya var. langbianensis transcriptome

Orthologue identification and functional characterization of 13 conifer species

Public transcriptome sequences were downloaded from the PlantGDB (/) and NCBI databases (Table 3). The average number of unigenes was about 42,335 in the 13 conifer species, and P. taeda had the largest number of unigenes (91,046), while P. tabuliformis had the least (10,285). The average unigene length mostly exceeded 656 bp, except for P. sylvestris and P. monticola. All of the pairwise orthologues were identified by comparative analysis between the 13 conifer species (Table 4). The results showed that P. taeda had the maximum average number (7571) of orthologous genes, whereas P. sylvestris had the minimum average number (1552). The highest number of orthologous genes (16,051) was found between P. kesiya var. langbianensis and P. halepensis, while the lowest number (727) was found between P. sylvestris and P. halepensis. One hundred and thirty shared orthologues were found in all 13 conifer species (Fig. 4). The orthologues were annotated with GO terms (Additional file 2: Table S3), and 119 orthologues were involved in biological processes, cellular components, and molecular functions, and the other 11 orthologues had unknown biological functions.
Table 4

Number and Ks peaks of orthologous genes in 13 conifer species

P. kesiya P. tabuliformis P. sylvestris P. pinea P. halepensis P. pinaster P. contorta P. banksiana P. taeda P. palustris P. monticola P. lambertian Picea glauca
P. kesiya
P. tabuliformis 6351/0.01
P. sylvestris 4331/0.01898/0.01
P. pinea 8112/0.034144/0.031153/0.02
P. halepensis 16,051/0.035498/0.03727/0.025839/0.03
P. pinaster 7561/0.023311/0.03944/0.023844/0.036419/0.03
P. contorta 8922/0.033953/0.031129/0.034392/0.046578/0.044011/0.03
P. banksiana 7784/0.033577/0.031374/0.033815/0.045642/0.043610/0.044996/0.01
P. taeda 13,493/0.044981/0.032624/0.036114/0.0414,088/0.046496/0.046836/0.015744/0.01
P. palustris 9311/0.033514/0.031092/0.034097/0.048425/0.044035/0.044321/0.013994/0.017551/0.01
P. monticola 7472/0.083538/0.081433/0.064481/0.085940/0.083790/0.094773/0.074769/0.064847/0.095087/0.08
P. lambertian 11,185/0.084477/0.081964/0.065233/0.088269/0.094783/0.085308/0.084722/0.088349/0.086428/0.087169/0.01
Picea glauca 13,691/0.144808/0.15959/0.146493/0.1512,041/0.155082/0.157999/0.147092/0.149724/0.156019/0.147710/0.159969/0.15
Fig. 4

Functional annotation and divergence between the orthologues of 12 pine and one spruce species. The heat map is based on the 130 putative orthologous transcripts of the 13 species. The orthologues were annotated to different functions based on GO terms (Additional file 3: File S1). Colors indicate similarity from yellow (highly similar) to red (weakly similar)

Transcriptome sequencing in 13 conifer species Number and Ks peaks of orthologous genes in 13 conifer species Functional annotation and divergence between the orthologues of 12 pine and one spruce species. The heat map is based on the 130 putative orthologous transcripts of the 13 species. The orthologues were annotated to different functions based on GO terms (Additional file 3: File S1). Colors indicate similarity from yellow (highly similar) to red (weakly similar)

Phylogenetic and divergence time analysis

The genetic distance of species is related to the synonymous mutation rate calculated by the orthologous genes, and thus the synonymous mutation rate was estimated for all of the pairs of orthologues in the 13 conifer species (Table 4). Pinus kesiya var. langbianensis and P. taeda yielded 13,493 orthologues with the synonymous substitutions per synonymous site (Ks) values exhibiting a normal distribution with a peak value of 0.04. The minimum Ks peak was detected in P. kesiya var. langbianensis (< 0.01) with P. tabuliformis and P. sylvestris, followed by a 0.03 Ks peak with P. banksiana and P. contorta, and a 0.08 Ks peak with P. monticola and P. lambertiana, and the maximum Ks peak of 0.15 with the outgroup Picea glauca (Fig. 5). The relationship reflected by the Ks distance of P. kesiya var. langbianensis corroborated the Pinus phylogeny [6, 7]. As observed in previous studies of six conifer species [9], P. tabuliformis had the same Ks peak of 0.03, 0.03, and 0.01 with P. taeda, P. contorta, and P. sylvestris, respectively.
Fig. 5

Distribution of Ks values of the orthologous pairs between P. kesiya var. langbianensis and others

Distribution of Ks values of the orthologous pairs between P. kesiya var. langbianensis and others Using Picea glauca as an outgroup species, the phylogenetic tree of Pinus was constructed based on the Ks peak distance matrix (Fig. 6). The phylogenetic relationships were highly consistent with the phylogenetic tree obtained from cpDNA and mtDNA analyses in previous studies [6, 7]. In the phylogenetic tree, the average Ks value was 0.146 between Pinus and Picea; 0.078 between subgenus Strobus and subgenus Pinus; and 0.035 between section Trifoliae and section Pinus. In the previous study of Picea sitchensis and P. taeda, it was calculated that the synonymous mutation rate of the molecular clock in genus Pinus was about 0.68*10− 9 substitutions/site/year, and the divergence time could be expressed as Ks*10− 6/0.68*10− 9. Thus, the divergence time was estimated at about 214 Mya (Triassic) between Pinus and Picea; about 115 Mya (Cretaceous) between subgenus Strobus and subgenus Pinus; and about 51 Mya (Paleogene) between section Trifoliae and section Pinus.
Fig. 6

Phylogenetic tree of 12 Pinus and one Picea species. The phylogram was derived using the pairwise synonymous substitution rates of the orthologous transcripts as a distance metric and the neighbor-joining method

Phylogenetic tree of 12 Pinus and one Picea species. The phylogram was derived using the pairwise synonymous substitution rates of the orthologous transcripts as a distance metric and the neighbor-joining method

Evolutionary pattern of Pinus spp. genes

The number of pairwise synonymous (Ks) and non-synonymous (Ka) substitutions per site can reflect the evolutionary patterns of species. Ka/Ks > 1 indicates that the gene has been involved in positive selection during evolution (Table 5). A number of positive selection genes were found to be related to environmental factors (Table 6).
Table 5

Number and contained function of positive selection genes in Genus Pinus

P.kesiya P.tabuliformis P.sylvestris P.pinea P.halepensis P.pinaster P.contorta P.banksiana P.taeda P.palustris P.monticola P.lambertian
P.kesiya
P.tabuliformis 2750/AS
P.sylvestris 1700/D422
P.pinea 949505267
P.halepensis 2173720/A154697
P.pinaster 12675612275731032
P.contorta 1169519299411720511
P.banksiana 960/A453338361/A603/A4441685/A
P.taeda 1823/S64869751217988871685/S1386/AS
P.palustris 1101444/A26832375449713351110/A2839
P.monticola 271189135177209195235266/A209215
P.lambertian 398154/A2001222212051561662962042688/D

A Abscisic acid stress, D Drought stress, S Salt stress

Table 6

Information of resistance genes involved in positive selection in Genus Pinus

FuntionNr AnnotationSpecie1Gene1Specie2Gene2Ka/KsKaKs
Salt stressNP_001105589.1 P. tabuliformis isotig03867 P. kesiya CL1082.Contig1_All> 10.00130
Salt stressNP_001105589.1 P. taeda PITA_000000350-RA P. banksiana PUT-177a-Pinus_banksiana-82134.38930.08730.0199
Salt stressNP_001105589.1 P. taeda PITA_000000350-RA P. kesiya CL1082.Contig1_All4.91340.09940.0202
Salt stressNP_001105589.1 P. taeda PITA_000000350-RA P. contorta PUT-175a-Pinus_contorta-55144.38930.08730.0199
Drought stressAAD37375.1 P. sylvestris PUT-187a-Pinus_sylvestris-52,858 P. kesiya Unigene23170_All1.00470.00830.0083
Drought stressAAD37375.1 P. lambertian isotig24007 P. monticola comp65821_c3> 10.00460
ABA stressNP_001105719.1 P. palustris isotig03955 P. banksiana PUT-177a-Pinus_banksiana-10,2541.20720.00540.0045
ABA stressNP_001105719.1 P. kesiya Unigene11406_All P. banksiana PUT-177a-Pinus_banksiana-10,2531.1660.02620.0224
ABA stressNP_001105719.1 P. contorta PUT-175a-Pinus_contorta-11,400 P. banksiana PUT-177a-Pinus_banksiana-10,2531.20.01620.0135
ABA stressNP_001105719.1 P. taeda PITA_000045763-RA P. banksiana PUT-177a-Pinus_banksiana-10,254> 10.0040
ABA stressNP_001105719.1 P. pinea isotig05514 P. banksiana PUT-177a-Pinus_banksiana-10,2531.01450.050.0493
ABA stressNP_001105719.1 P. halepensis comp18337_c1 P. banksiana PUT-177a-Pinus_banksiana-10,2531.20390.06010.0499

ABA is Abscisic acid. Sequences showed in Additional file 4: File S2

Number and contained function of positive selection genes in Genus Pinus A Abscisic acid stress, D Drought stress, S Salt stress Information of resistance genes involved in positive selection in Genus Pinus ABA is Abscisic acid. Sequences showed in Additional file 4: File S2 In section Pinus, 2750 positive selection genes were found between P. tabuliformis and P. kesiya var. langbianensis, including one salt stress-related gene (Nr annotation: NP_001105589.1) via the production of the NADP-dependent protein [20, 21]. In section Trifoliae, P. taeda shared 1386 and 1685 positive selection genes with P. banksiana and P. contorta, including the same salt stress-related gene (Nr annotation: NP_001105589.1). Furthermore, 1700 positive selection genes were identified between P. kesiya var. langbianensis and P. sylvestris, including a drought stress-related gene (Nr annotation: AAD37375.1) via the production of the peroxidase protein [22, 23]. In section Quinquefoliae, 2688 positive selection genes were discovered between P. lambertiana and P. monticola, including one drought stress-related gene (Nr annotation: AAD37375.1). In genus Pinus, P. banksiana shared 361, 603, 960, 1685, 1386, 1110, and 266 positive selection genes with P. pinea, P. halepensis, P. kesiya var. langbianensis, P. contorta, P. taeda, and P. palustris, and one gene related to abscisic acid stress (Nr annotation: NP_001104929.1) was commonly found in all of the pairs. The results indicate that salt and abscisic acid stress genes were obviously involved in positive selection between coastal and inland Pinus species.

Discussion

Paleogeographic events in the divergence of the Pinus phylogeny

Pinus and Picea diverged about 214 Mya at the end of the Triassic (252–201 Mya). Compared with previous studies [8, 9], the time between Pinus and Picea was older than the previous result of 174–190 Mya. At the beginning of the Triassic, only one supercontinent Pangea was present on earth, and the climate was singly warm and dry [24, 25]. The ancient continent was rifting during the late of Triassic with the climate becoming more diverse, during which Pinus and Picea diverged. The divergence time of the subgenera Strobus and Pinus was about 115 Mya, in the middle of the Cretaceous (145–66 Mya). Compared with previous studies, the time was between the previous results of 85 Mya and 128 Mya [6, 10]. During the Cretaceous period, the distribution of the land and sea changed rapidly as a result of tectonic activity in the Atlantic Ocean, and large-scale transgression occurred in the coastal areas (a transgressive event) [26]. The species of subgenus Strobus were mainly distributed in the coastal mountains of western North America. In the middle of Cretaceous period, broad shallow seas advanced across central North America. This transgressive event might reflect the geographical speciation of subgenus Strobus and shown to be consistent with the divergence time 115 Mya in our results. The divergence time of sections Pinus and Trifoliae was about 51 Mya during the Paleogene (66–23 Mya) [27]. There was a dispute over this time in previous studies [6, 8–10]. One suggests the divergence time of section Pinus and section Trifoliae was occurred in the beginning of Paleogene, and another suggests it was in the Late Cretaceous (~ 100–66 Ma). Our result supports the former conclusion, but older than the former’s result of 44–45 Mya [8, 9]. The divergence of sections Pinus and Trifoliae may be related to the separation of Laurasia [28] into Eurasia and Laurentia during the Paleocene (66–56 Mya) which is the first epoch of Paleogene.

Salt stress gene selection between coastal and inland Pinus

Paleogeography (http://www.scotese.com) shows the southern areas of North America had been covered by shallow water during the Paleogene. P. taeda is mainly distributed in the coastal areas of southeastern US, while P. banksiana is an inland Pinus species in the North America and Canada. The regressive event altered the habitat of P. taeda from shallow sea into land, which may explain the involvement of the salt stress gene in the selective evolution between P. taeda and inland P. banksiana. It might also explain the positive selection of the salt stress gene between coastal P. kesiya var. langbianensis (southwest China) and P. tabuliformis (inland of Northern China).

Abscisic acid stress gene selection between coastal and inland Pinus

By 50 Mya, the global climate departed from the hot and humid conditions and began a cooling and drying trend towards a series of ice age [29]. Most species of sections Pinus and Trifoliae had gone through this cool stage especially the high-latitude. Production of abscisic acid is increased by cold and drought and acts to help plants withstand these conditions [30-32]. Pinus banksiana occurs in the high-latitude inland regions of North America and Canada, which is possibly related to the positive selection of the abscisic acid gene in this species in comparison to the coastally distributed species P. taeda (southeastern US), P. palustris (southeastern US), and P. kesiya var. langbianensis (southwestern China, Southeast Asia).

Conclusions

In the present study, the phylograms and divergence times were estimated by comparative transcriptomic analysis in 13 conifer species. A number of positive selection genes were found to be associated with environmental factors. The divergence times suggest that plate movement and transgression events caused the geographical speciation which might be the key drives in the divergence of Pinus phylogeny. The analysis of selection evolution suggests salt and abscisic acid-related genes were involved in positive selection between coastal and inland Pinus species. These data are useful for evaluating the different evolutionary patterns between inland and coastal Pinus species. The study shows that tectonic plate movement, and transgression and regression events resulted in changes to the land and sea, and adaptive evolution may have played an important role in the divergence of the Pinus species.

Methods

RNA extraction and sequencing

Pine needle tissue, including the cambium, of P. kesiya var. langbianensis was collected from Puer City, Yunnan Province, China. The specimen was identified by Yong-chun Xu and Juan Wang, whose deposition numbers are 0000651 and 0000652 from the herbarium of Southwest Forestry University. Total RNA was isolated using RNAiso Plus (TaKaRa, Japan). RNA quality was characterized initially on an agarose gel and NanoDrop ND1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Illumina sequencing based on a GAII platform was performed at the Beijing Genomics Institute (Shenzhen, China; http://www.genomics.cn), following the manufacturer’s protocols. A fragmentation buffer was added to interrupt the mRNA and thereby generate fragments in the size of range 200 bp. The resulting fragments served as a template for the synthesis of the first-strand cDNA, employing random hexamer primers (N6). Second-strand cDNA was synthesized using a SuperScript Double-Stranded cDNA Synthesis kit (Invitrogen, Camarillo, CA), after which it was purified using a QiaQuick PCR extraction kit (Qiagen, Hilden, Germany) and resolved with EB buffer for end repair and poly (A) addition. The products were ligated with one another using sequencing adapters, and a suitable fragment size range was selected for PCR amplification following agarose gel electrophoresis. The resulting library was sequenced using an Illumina HiSeqTM 2000 platform.

Data filtering and de novo assembly

Image data output from the sequencing device were transformed into raw reads and stored in FASTQ format. These data were filtered to remove raw reads that included adapter sequences or those that were of low quality. Transcriptome assembly was achieved using the short-read assembly program Trinity [33]. The unigenes were divided into either clusters or singletons. BLASTX [34] alignment between each unigene sequence and those lodged in the Nr, Nt (Nucleotide database, NCBI), Swiss-Prot, GO (http://www.geneontology.org/), and COG (clusters of orthologous groups) databases were performed, and the best alignments were used to infer the directionality of the unigene. Where the outcomes from the various databases conflicted with one another, the priority order applied was: Nr, Swiss-Prot, and COG. Where no alignment was possible, the software tool ESTScan [35] was used to assign directionality.

Gene annotation and analysis

Functional annotation was assigned using the protein (Nr and Swiss-Prot), COG, and GO databases. BLASTX was employed to identify related sequences in the protein databases. The COG database attempts to classify proteins from completely sequenced genomes on the basis of the orthology concept [36]. The aim of GO is to standardize the representation of genes and their products by insisting on a controlled vocabulary and a strictly defined concept [37, 38]. The annotations acquired from Nr were processed through the Blast2GO program [39] to obtain the relevant GO terms, and these were then analyzed by WEGO software [40] to assign a GO functional classification and to illustrate the distribution of the gene functions. Unigenes containing putative SSRs and SNPs were identified by MISA and SOAPsnp [41] software. Mono- to hexa-nucleotide SSRs with a minimum repeat unit size of 12 (for mono-), 6 (for di-), 5 (for tri- and tetra-), and 4 (for penta- and hexa-) were identified based on the analysis of the assembled P. kesiya unigenes.

Identification of orthologues between the 13 conifer species

In order to discover the evolutionary patterns of orthologues in Pinus, transcriptome sequences of 11 Pinus and Picea glauca (outgroup) accessions were downloaded from the PlantGDB and NCBI databases (Table 3). Pinus banksiana, P. contorta, P. sylvestris, P. pinaster, and Picea glauca were directly derived from PlantGDB. Pinus taeda was obtained from the NCBI genome database [42, 43]. Pinus monticola, P. lambertiana, P. pinea, P. halepensis, P. tabuliformis, and P. palustris were obtained from the NCBI SRA database. The 454 RNA dataset was assembled using Newbler (http://roche-applied-science.com/) software, and the Illumina RNA dataset was assembled by Trinity software. The assembled sequences were combined and clustered with CD-HIT (version 4.0) [44, 45]. Sequences with similarity > 95% were divided into one class, and the longest sequence of each class was treated as a unigene during later processing. The transcribed sequences were clustered using UCLUST software [46]. Aligned sequences showing 90% identity were defined as pairs of putative orthologues among the 13 species. The best-hit sequence of each cluster was then used in the subsequent analyses.

Estimation at the synonymous substitution and non-synonymous substitution levels between orthologues

Since unigenes are derived from expressed sequence tag (EST) sequences, have no annotated open reading frames, and may contain frame shift sequencing errors, each member of a pair of sequences was searched using BLASTX against all of the plant protein sequences available in GenBank. The approach used was as described previously [47]. PAML (http://abacus.gene.ucl.ac.uk/software/paml.html) software was used to estimate the non-synonymous substitutions per non-synonymous site (Ka) and the synonymous substitutions per synonymous site (Ks) [48].

Phylogenetic analysis

As the phylogenetic relationships in Pinus are well understood [6, 7], the precise topology is not critical for the purposes of this study. We chose to focus our analyses on the evolutionary patterns and rate of genetic divergence. The synonymous substitutions and non-synonymous substitutions between the orthologues of the 13 conifer species were analyzed as described previously. Phylograms were derived using the pairwise Ks values of the orthologous transcripts as a distance metric based on the neighbor-joining method [49]. Picea glauca was used as an outgroup to root the trees. The distance of two branches was estimated based on the average of all of the pairwise Ks values in two branches of the phylogenetic tree. Figure S1. Distribution of unigene length and depth. (TIFF 7129 kb) Table S1. Gene Ontology (GO) distributions for P. kesiya var. langbianensis. Table S2. SSR markers identified in 13 conifer species. Table S3. Gene Ontology (GO) distributions for shared orthologues in 13 conifer species. (XLSX 16 kb) File S1. Go annotation of 130 shared unigenes (PDF 17 kb) File S2. Sequences of resistance unigenes (PDF 14 kb)
  37 in total

1.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

2.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

3.  Primary carbon metabolism-related changes in cucumber exposed to single and sequential treatments with salt stress and bacterial infection.

Authors:  Joanna Chojak-Koźniewska; Elżbieta Kuźniak; Anna Linkiewicz; Sławomir Sowa
Journal:  Plant Physiol Biochem       Date:  2017-12-07       Impact factor: 4.270

4.  Next-generation sequencing of the Chrysanthemum nankingense (Asteraceae) transcriptome permits large-scale unigene assembly and SSR marker discovery.

Authors:  Haibin Wang; Jiafu Jiang; Sumei Chen; Xiangyu Qi; Hui Peng; Pirui Li; Aiping Song; Zhiyong Guan; Weimin Fang; Yuan Liao; Fadi Chen
Journal:  PLoS One       Date:  2013-04-23       Impact factor: 3.240

5.  EuroPineDB: a high-coverage web database for maritime pine transcriptome.

Authors:  Noé Fernández-Pozo; Javier Canales; Darío Guerrero-Fernández; David P Villalobos; Sara M Díaz-Moreno; Rocío Bautista; Arantxa Flores-Monterroso; M Ángeles Guevara; Pedro Perdiguero; Carmen Collada; M Teresa Cervera; Alvaro Soto; Ricardo Ordás; Francisco R Cantón; Concepción Avila; Francisco M Cánovas; M Gonzalo Claros
Journal:  BMC Genomics       Date:  2011-07-15       Impact factor: 3.969

6.  Transcriptome-wide identification and characterization of miRNAs from Pinus densata.

Authors:  Li-Chuan Wan; Haiyan Zhang; Shanfa Lu; Liang Zhang; Zongbo Qiu; Yuanyuan Zhao; Qing-Yin Zeng; Jinxing Lin
Journal:  BMC Genomics       Date:  2012-04-06       Impact factor: 3.969

7.  WEGO: a web tool for plotting GO annotations.

Authors:  Jia Ye; Lin Fang; Hongkun Zheng; Yong Zhang; Jie Chen; Zengjin Zhang; Jing Wang; Shengting Li; Ruiqiang Li; Lars Bolund; Jun Wang
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

8.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

9.  Sequencing and assembly of the 22-gb loblolly pine genome.

Authors:  Aleksey Zimin; Kristian A Stevens; Marc W Crepeau; Ann Holtz-Morris; Maxim Koriabine; Guillaume Marçais; Daniela Puiu; Michael Roberts; Jill L Wegrzyn; Pieter J de Jong; David B Neale; Steven L Salzberg; James A Yorke; Charles H Langley
Journal:  Genetics       Date:  2014-03       Impact factor: 4.562

10.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies.

Authors:  David B Neale; Jill L Wegrzyn; Kristian A Stevens; Aleksey V Zimin; Daniela Puiu; Marc W Crepeau; Charis Cardeno; Maxim Koriabine; Ann E Holtz-Morris; John D Liechty; Pedro J Martínez-García; Hans A Vasquez-Gross; Brian Y Lin; Jacob J Zieve; William M Dougherty; Sara Fuentes-Soriano; Le-Shin Wu; Don Gilbert; Guillaume Marçais; Michael Roberts; Carson Holt; Mark Yandell; John M Davis; Katherine E Smith; Jeffrey F D Dean; W Walter Lorenz; Ross W Whetten; Ronald Sederoff; Nicholas Wheeler; Patrick E McGuire; Doreen Main; Carol A Loopstra; Keithanne Mockaitis; Pieter J deJong; James A Yorke; Steven L Salzberg; Charles H Langley
Journal:  Genome Biol       Date:  2014-03-04       Impact factor: 13.583

View more
  7 in total

1.  Applications of machine learning in pine nuts classification.

Authors:  Biaosheng Huang; Jiang Liu; Junying Jiao; Jing Lu; Danjv Lv; Jiawei Mao; Youjie Zhao; Yan Zhang
Journal:  Sci Rep       Date:  2022-05-25       Impact factor: 4.996

2.  Comparative Transcriptomics and Metabolites Analysis of Two Closely Related Euphorbia Species Reveal Environmental Adaptation Mechanism and Active Ingredients Difference.

Authors:  Han Zheng; Mu-Yao Yu; Yang Han; Badalahu Tai; Sheng-Fa Ni; Rui-Feng Ji; Chun-Juan Pu; Kang Chen; Fu-Quan Li; Hua Xiao; Ye Shen; Xiu-Teng Zhou; Lu-Qi Huang
Journal:  Front Plant Sci       Date:  2022-05-31       Impact factor: 6.627

3.  The complete chloroplast genome sequence of Pinus kesiya var. langbianensis.

Authors:  Yi Wang; Xiaolong Yuan; Wei Chen; Jiang Li
Journal:  Mitochondrial DNA B Resour       Date:  2019-07-11       Impact factor: 0.658

4.  Inferring the Phylogenetic Positions of Two Fig Wasp Subfamilies of Epichrysomallinae and Sycophaginae Using Transcriptomes and Mitochondrial Data.

Authors:  Dan Zhao; Zhaozhe Xin; Hongxia Hou; Yi Zhou; Jianxia Wang; Jinhua Xiao; Dawei Huang
Journal:  Life (Basel)       Date:  2021-01-11

5.  Morphological and Comparative Transcriptome Analysis of Three Species of Five-Needle Pines: Insights Into Phenotypic Evolution and Phylogeny.

Authors:  Xiang Li; Kewei Cai; Qiushuang Zhao; Hanxi Li; Xuelai Wang; Mulualem Tigabu; Ronald Sederoff; Wenjun Ma; Xiyang Zhao
Journal:  Front Plant Sci       Date:  2022-02-10       Impact factor: 5.753

Review 6.  Pharmaceutical resource discovery from traditional medicinal plants: Pharmacophylogeny and pharmacophylogenomics.

Authors:  Da-Cheng Hao; Pei-Gen Xiao
Journal:  Chin Herb Med       Date:  2020-03-12

7.  Comprehensive collection of genes and comparative analysis of full-length transcriptome sequences from Japanese larch (Larix kaempferi) and Kuril larch (Larix gmelinii var. japonica).

Authors:  Kentaro Mishima; Hideki Hirakawa; Taiichi Iki; Yoko Fukuda; Tomonori Hirao; Akira Tamura; Makoto Takahashi
Journal:  BMC Plant Biol       Date:  2022-10-04       Impact factor: 5.260

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.