| Literature DB >> 29559535 |
Ethan A G Baker1, Jill L Wegrzyn2, Uzay U Sezen1, Taylor Falk1, Patricia E Maloney3, Detlev R Vogler4, Annette Delfino-Mix4, Camille Jensen4, Jeffry Mitton5, Jessica Wright6, Brian Knaus7, Hardeep Rai8, Richard Cronn9, Daniel Gonzalez-Ibeas1, Hans A Vasquez-Gross10, Randi A Famula10, Jun-Jun Liu11, Lara M Kueppers12,13, David B Neale14.
Abstract
Conifers are the dominant plant species throughout the high latitude boreal forests as well as some lower latitude temperate forests of North America, Europe, and Asia. As such, they play an integral economic and ecological role across much of the world. This study focused on the characterization of needle transcriptomes from four ecologically important and understudied North American white pines within the Pinus subgenus Strobus The populations of many Strobus species are challenged by native and introduced pathogens, native insects, and abiotic factors. RNA from the needles of western white pine (Pinus monticola), limber pine (Pinus flexilis), whitebark pine (Pinus albicaulis), and sugar pine (Pinus lambertiana) was sampled, Illumina short read sequenced, and de novo assembled. The assembled transcripts and their subsequent structural and functional annotations were processed through custom pipelines to contend with the challenges of non-model organism transcriptome validation. Orthologous gene family analysis of over 58,000 translated transcripts, implemented through Tribe-MCL, estimated the shared and unique gene space among the four species. This revealed 2025 conserved gene families, of which 408 were aligned to estimate levels of divergence and reveal patterns of selection. Specific candidate genes previously associated with drought tolerance and white pine blister rust resistance in conifers were investigated.Entities:
Keywords: Pinus albicaulis; Pinus flexilis; Pinus lambertiana; Pinus monticola; de novo transcriptome assembly; five-needle pines; white pines; wpbr
Mesh:
Substances:
Year: 2018 PMID: 29559535 PMCID: PMC5940140 DOI: 10.1534/g3.118.200257
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1White Pine Range Map and Plant Material Source Locations. Shading indicates typical habitat in western North America for indicated white pine species. Points indicate sampling sites (or common garden sources) for needle tissue used in sequencing.
Summary of Illumina short read sequencing for the four white pines
| western white pine | Needle | Illumina GA IIx | PE, 76bp | 398,534,772 | 208,059,003 |
| whitebark pine | Needle | Illumina HiSeq | PE, 100bp | 1,257,388,110 | 839,389,034 |
| sugar pine | Needle | Illumina GA IIx | SE, 80bp | 91,223,401 | 66,894,169 |
| limber pine | Needle | Illumina HiSeq | PE, 100bp | 669,904,522 | 374,191,816 |
Summary of Trinity de novo transcriptome assembly statistics
| WWP | 60,458 | 49,964 | 1353 | 465 | 800.46 | 48,394,102 | 1271 | 412 | 741.58 |
| WBP | 146,063 | 145,987 | 1468 | 369 | 751.68 | 109,792,255 | 1468 | 369 | 751.57 |
| SP | 53,821 | 33,533 | 1321 | 651 | 945.83 | 50,905,778 | 1312 | 615 | 922.39 |
| LP | 51,694 | 51,684 | 1067 | 559 | 819.16 | 42,345,743 | 1067 | 559 | 819.09 |
Summary of assembled transcripts annotated as partial and full-length unigenes
| WBP | 23,932 | 2915 | 1722 | 1227 | 43.88 | 23,862 | 16,107 | 2769 | 1821 | 1323 | 43.96 |
| WWP | 10,534 | 2055 | 1473 | 1107 | 44.13 | 10,494 | 4,735 | 2040 | 1500 | 1161 | 44.63 |
| LP | 14,288 | 1941 | 1362 | 1020 | 44 | 14,238 | 5,000 | 2157 | 1530 | 1161 | 44.45 |
| SP | 10,395 | 2088 | 1464 | 1086 | 44.04 | 9,362 | 3,927 | 2223 | 1572 | 1188 | 44.4 |
Figure 3Annotation Quality Summary. Normalized quantity of assembled transcripts that were classified by the EnTAP annotation pipeline for each species as: uninformative (significant alignment that does not contain descriptive function), informative (significant alignment that does contain a descriptive function), or unknown (no significant alignment observed).
Figure 2Functional Annotation by Species Homology. A: Top five species sharing annotations with white pine transcriptome assemblies based on the NCBI curated plant full-length protein database as derived from Usearch results. B: Top five species sharing annotations with white pine transcriptome assemblies based on the NCBI RefSeq database as derived from Usearch results.
Figure 4Transcripts Aligned to Conifer Reference Genomes. A: Percent of assembled transcripts by species mapping back to loblolly pine reference genome at 90% identity/90% coverage and 98% identity/90% coverage. B: Percent of assembled transcripts by species mapping back to sugar pine reference genome at 90% identity/90% coverage and 98% identity/90%coverage.
Figure 5Orthologous Gene Families. TRIBE-MCL evaluation of a total of 58,148 translated transcripts reveals shared and unique gene families. Integer counts in the Venn indicate number of unique families shared between each combination for white pine proteins. A total of 2,025 gene families were conserved across all four species.
Figure 6Distribution of dN and dS. A: Pairwise alignments across the four species for 408 gene families. dN/dS values > 1 indicating positive selection are shown above the dashed line. Candidate genes with previous associations to drought tolerance/aridity and rust resistance are highlighted. B: Averaged values for dN/dS across all species for each of the 408 gene families. dN/dS values > 1 indicating positive selection are shown above the dashed line. Candidate genes with previous associations to drought tolerance/aridity and rust resistance are highlighted.
Summary of conserved gene families under positive selection
| formyltetrahydrofolate deformylase mitochondrial isoform x1 | 1053 | formyltetrahydrofolate deformylase activity; amino acid binding; hydroxymethyl-, formyl- and related transferase activity; |
| f-box kelch-repeat protein skip6-like | 1122 | protein degradation tagging activity |
| low quality protein: nitrate reductase | 2778 | oxidoreductase activity; metal ion binding; organic cyclic compound binding; heterocyclic compound binding |
| arogenate dehydrogenase chloroplastic | 1278 | prephenate dehydrogenase activity |
| carrier protein chloroplastic | 2283 | ATP:ADP antiporter activity; ATP binding |
| flowering time control protein fpa | 3279 | |
| transcription initiation factor tfiid subunit partial | 1818 | |
| e3 ubiquitin-protein ligase keg isoform x2 | 4917 | protein degradation tagging activity |
| PREDICTED: uncharacterized protein LOC18435046 | 1524 | |
| two-component response regulator-like prr37 | 2856 | |
| isoamylase chloroplastic | 2769 | |
| probable u3 small nucleolar rna-associated protein 7 | 1623 | 18S ribosomal rna processing |
| PREDICTED: uncharacterized protein LOC103493568 | 1407 | metal ion binding;sequence-specific DNA binding transcription factor activity |
| calcium-transporting atpase plasma membrane-type-like isoform x1 | 3192 | calcium-transporting ATPase activity; calmodulin binding; ATP binding; metal ion binding |
| protein notum homolog | 1263 | |
| PREDICTED: uncharacterized protein LOC104607701 | 1380 | |
| clathrin assembly protein at5g35200 | 1644 | 1-phosphatidylinositol binding; clathrin binding |
| PREDICTED: kanadaptin | 2274 | |
| family 18 glycoside hydrolase | 1236 | chitinase activity; chitin binding |
| dead-box atp-dependent rna helicase 13 | 1176 | |
| probable inactive purple acid phosphatase 27 | 1977 | acid phosphatase activity; metal ion binding; dephosphorylation |
| arginine decarboxylase | 2283 | carboxy-lyase activity |
| PREDICTED: uncharacterized protein LOC104591536 | 1626 | |
| erythronate-4-phosphate dehydrogenase-like protein | 975 | |
| interferon-induced guanylate-binding protein 2-like | 3207 | GTPase activity; GTP binding |
| nf-x1-type zinc finger protein nfxl1 | 4290 | metal ion binding |
| transmembrane protein 87b-like | 1560 | |
| fructokinase-like chloroplastic | 1644 | kinase activity; phosphotransferase activity, alcohol group as acceptor |
| bel1-like homeodomain protein 1 | 2517 | DNA binding |
| cbs domain-containing protein cbsx6 | 1311 | |
| duf21 domain-containing protein at4g14240 | 1623 | |
| probable wrky transcription factor 14 | 1431 | |
| myeloid leukemia factor 1-like isoform x2 | 1059 | |
| mannose-1-phosphate guanylyltransferase 1 | 948 | |
| phytoene synthase chloroplastic | 1308 | geranylgeranyl-diphosphate geranylgeranyltransferase activity; phytoene synthase activity |
| unknown | 336 | |
| PREDICTED: myosin-10-like | 1803 | |
| universal stress protein a-like protein | 366 | |
| PREDICTED: uncharacterized protein LOC104602728 | 966 |
Figure 7Alignment of FBK-SKiP6-like proteins estimated to be under positive selection and associated with WPBR response in conifers. White pine FBK-SKiP6-like proteins are aligned against sequences available in sitka spruce (Picea sitchensis) and Arabidopsis. In the alignment, red regions are 100% conserved residues, yellow is conserved at 70% identity or greater, and black regions represent similar residues. Secondary structure is based on WWP protein model generated by I-Tasser. The alignment was generated by ESpript (Robert and Gouet 2014).