| Literature DB >> 24119028 |
Ishminder K Mann1, Jill L Wegrzyn, Om P Rajora.
Abstract
BACKGROUND: EST (expressed sequence tag) sequences and their annotation provide a highly valuable resource for gene discovery, genome sequence annotation, and other genomics studies that can be applied in genetics, breeding and conservation programs for non-model organisms. Conifers are long-lived plants that are ecologically and economically important globally, and have a large genome size. Black spruce (Picea mariana), is a transcontinental species of the North American boreal and temperate forests. However, there are limited transcriptomic and genomic resources for this species. The primary objective of our study was to develop a black spruce transcriptomic resource to facilitate on-going functional genomics projects related to growth and adaptation to climate change.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24119028 PMCID: PMC4007752 DOI: 10.1186/1471-2164-14-702
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of EST sequencing and assembly results
| Total EST sequences | 4,594 |
| Number of 5′ sequences | 2,455 |
| Number of 3′ sequences | 2,139 |
| Number of contigs | 497 |
| Number of singletons | 2,234 |
| Average assembled EST length | 532.5 |
| Number of full-length cDNA sequences | 216 |
| Number of assembled ESTs with: | |
| Significant BLASTX annotations | 1,717 |
| Significant BLASTX annotations with known function | 1,478 |
| No BLASTX annotation information | 1,014 |
| Average number of sequences per contig | 4.75 |
| Number of contigs containing: | |
| 2 ESTs | 237 |
| 3 ESTs | 91 |
| 4 ESTs | 51 |
| 5 ESTs | 28 |
| 6 ESTs | 17 |
| 7 ESTs | 20 |
| 8 ESTs | 7 |
| 9-11 ESTs | 19 |
| 12-14 ESTs | 9 |
| 15-17 ESTs | 6 |
| 18-20 ESTs | 5 |
| 22-25 ESTs | 3 |
| >25 ESTs | 4 |
Figure 1Distribution of individual 5′ and 3′ EST sequences among the clustered contigs.
Comparison of EST sequencing statistics with representative and studies
| Sanger | 532 | 4,594 | 2,234 | 497 | 2,731 | 51% | 4.7 | Current study | |
| Sanger | 615 | 49,101 | 7,224 | 9,354 | 16,578 | 85% | 4.5 | [ | |
| Sanger | - | 147,146 | 26,804 | 19,941 | 46,745 | 82% | 6.0 | [ | |
| Sanger | 364 | 59,797 | 12,307 | 8,070 | 20,377 | 79% | 5.9 | [ | |
| 454 | 306 | 586,732 | 239,793 | 63,657 | 303,450 | 59% | 5.5 | [ |
Figure 2Conservation between unique sequences and peptides from five model plant species and non-redundant protein database: BLASTX searches were performed against: , and . (A) Percentage of Picea mariana unique sequences showing similarity with peptides from six databases (> 50 score). Contigs showed the highest number of hits to all five model plant species databases followed by 5' singletons and then 3' singletons. (B-D). 3' singletons, 5' singletons and contigs were analyzed separately using low (score > 50), medium (score > 100) and high (score > 200) BLAST stringency thresholds. (B) The 3' EST singletons had a much greater number of annotations with the model databases in the > 50 score category ranging from 49% to 64%. (C) The 5' EST singletons had the greatest number of hits with scores > 100. (D) The EST contigs had the largest number of high scoring hits (> 50) with representation across most of the five model plant species databases.
Figure 3Gene ontology classification of unique sequences: BLASTX at an e-value cut-off of 1e - 05 were performed against UniProt’s plant protein databank. This was used in conjunction with InterProScan to identify domains linked to GO terms. Terms were standardized to hierarchy level 4 and 5 in order to compare the results across the annotations. (A) A total of 533 unique sequences were associated with at least one molecular function term. (B) A total of 572 unique sequences were associated with at least one biological process term.
EST singletons and contigs annotated as putative transcription factors
| estPama_needle_Contig7 | gi|356520192|ref|XP_003528748.1| | Transcription factor 25-like |
| estPama_needle_Contig151 | gi|225438387|ref|XP_002275126.1| | RNA polymerase II transcriptional coactivator KIWI |
| estPama_needle_2137-700_5 | gi|297794475|ref|XP_002865122.1| | Myb family transcription factor |
| estPama_needle_2089-700_5 | gi|357125278|ref|XP_003564322.1| | Transcription elongation factor 1 homolog isoform |
| estPama_needle_BSc1-839-700_5 | gi|356500773|ref|XP_003519205.1| | Transcription initiation factor IIA subunit 2-like |
| estPama_needle_Contig232 | gi|357144617|ref|XP_003573355.1| | Transcription factor ILR3-like |
| estPama_needle_Contig326 | gi|255547594|ref|XP_002514854.1| | Transcription factor, putative |
| estPama_needle_Contig314 | gi|358346858|ref|XP_003637481.1| | Transcription factor |
| estPama_needle_Contig320 | gi|115470937|ref|NP_001059067.1| | Similar to C-Myc binding protein |
| estPama_needle_BSC1-987-800_3 | gi|225461347|ref|XP_002281902.1| | PAP-specific phosphatase HAL2-like |
| estPama_needle_1584-700_5 | gi|115470937|ref|NP_001059067.1| | Similar to C-Myc binding protein |
| estPama_needle_1817-700_5 | gi|225437251|ref|XP_002282315.1| | Thylene-responsive transcription factor 7-like |
| estPama_needle_2311-700_5 | gi|302840754|ref|XP_002951923.1| | Transcription factor jumonji domain-containing protein |
| estPama_needle_1788-700_5 | gi|15233968|ref|NP_195574.1| | Transcription repressor MYB4 |
| estPama_needle_Contig490 | gi|255547594|ref|XP_002514854.1| | Transcription factor, putative |
| estPama_needle_BSc1-785-700_5 | gi|255585312|ref|XP_002533354.1| | WRKY transcription factor, putative |
| estPama_needle_BSC1-935-800_3 | gi|225438387|ref|XP_002275126.1| | RNA polymerase II transcriptional coactivator KIWI |
| estPama_needle_1371-700_5 | gi|255561893|ref|XP_002521955.1| | Associate of C-myc, putative |
| estPama_needle_2143-800_3 | gi|357494843|ref|XP_003617710.1| | mTERF domain-containing protein |
| estPama_needle_2512-700_5 | gi|359494595|ref|XP_002262881.2| | Nuclear transcription factor Y subunit C-9 |
| estPama_needle_1540-700_5 | gi|334183649|ref|NP_001185317.1| | Transcription elongation factor SPT6 |
| estPama_needle_2467-700_5 | gi|302840754|ref|XP_002951923.1| | Transcription factor jumonji domain-containing protein |
| estPama_needle_BSc1-839-800_3 | gi|255572854|ref|XP_002527359.1| | Transcription initiation factor iia (tfiia), gamma chain, putative |
| estPama_needle_BSc1-781-700_5 | gi|356547095|ref|XP_003541953.1| | WRKY transcription factor 6 |
Figure 4Distribution of full-length cDNAs according to BLASTX hits in the number of species-specific repositories: Of the 216 full-length genes identified, the majority (90%) had significant annotations in all five datasets at an e value (expected value) cut-off of 1e - 05.
Estimation of gene expression: unique EST sequences with > 10 ESTs
| Non-specific lipid-transfer protein | 10 | 77 |
| Bet v I allergen family protein. (Os04t0465600-01) | 2 | 40 |
| Ribulose bisphosphate carboxylase small chain 1A | 4 | 35 |
| Antimicrobial peptide 1 | 2 | 30 |
| Non-protein coding transcript. (Os07t0139600-01), partial | 9 | 28 |
| Histone H3 | 4 | 24 |
| Light-harvesting complex | 4 | 24 |
| Germin-like protein 8-14-like | 2 | 21 |
| Metallothionein-like protein 3B-like | 6 | 21 |
| Photosystem I reaction center subunit V | 2 | 21 |
| Photosystem II subunit X | 3 | 21 |
| Cell wall-associated hydrolase, partial | 8 | 20 |
| LOW QUALITY PROTEIN: photosystem II 10 kDa Polypeptide, chloroplastic-like | 1 | 18 |
| Translation machinery associated protein TMA7 | 2 | 17 |
| Photosystem I reaction center subunit N, chloroplast precursor, putative | 2 | 15 |
| Protein ralf-like 34 | 3 | 15 |
| Transmembrane protein TPARL, putative | 1 | 13 |
| Hypothetical protein BrnapMp036 (mitochondrion) | 3 | 13 |
| Hypothetical protein EAAG1_11607 | 1 | 13 |
| ATP synthase subunit beta | 3 | 11 |
| Metallothionein-like protein 2-like isoform 2 | 1 | 11 |
| Ribosomal protein S14 (chloroplast) | 3 | 11 |
| Auxin-binding protein ABP19a precursor, putative | 1 | 10 |
| Photosystem II 5 kDa protein, chloroplastic-like | 1 | 10 |
| Similar to Anth (Pollen-specific desiccation-associated LLA23 protein). (Os11t0167800-01) | 2 | 10 |
| Chlorophyll a-b binding protein M9, chloroplastic precursor | 1 | 10 |
| Photosystem I subunit O | 2 | 10 |
Figure 5Conservation between unique sequences and cDNA sequences from other conifer species: (A) BLASTN searches were performed against NCBI’s est_others repository and those with hits with a minimum e-value of 1e - 05 were organized into , and Other to examine similarity by genus. 18% of the sequences (493) had no hits within Picea and are therefore likely to be unique to Picea mariana. (B) The same BLASTN searches were divided by species and organized into categories by score: > 200, > 1,000, and > 1,500. The greatest number of hits was observed with Picea glauca and Picea sitchensis. These two species also had the greatest number of nearly identical hits (scores > 1500).
Types and distribution of simple sequence repeats
| Pentanucleotide repeats | (CGCAG)4 | 0.07 |
| | (TCAGA)4 | 0.07 |
| | (TGGTC)4 | 0.07 |
| Tetranucleotide repeats | (ACAT)4 | 0.07 |
| | (CCTG)5 | 0.07 |
| | (TAAT)4 | 0.07 |
| Trinucleotide repeats | (AAC)4 | 0.15 |
| | (AAG)4 | 0.15 |
| | (AAT)4 | 0.07 |
| | (ACG)4 | 0.07 |
| | (AGA)4 | 0.07 |
| | (AGC)4or6 | 0.11 or 0.07 |
| | (AGG)4or5 | 0.22 or 0.07 |
| | (ATA)5 | 0.07 |
| | (ATC)4 | 0.11 |
| | (ATG)4 | 0.11 |
| | (CAC)5or6 | 0.07 or 0.07 |
| | (CAG)4or5or6 | 0.15 or 0.11 or 0.07 |
| | (CAT)4 | 0.11 |
| | (CCA)5 | 0.07 |
| | (CCG)4 | 0.07 |
| | (CGC)5 | 0.07 |
| | (CGG)4 | 0.07 |
| | (CTC)4or5 | 0.07 or 0.11 |
| | (CTG)4or5 | 0.11 or 0.11 |
| | (CTT)4or6 | 0.11 or 0.07 |
| | (GAA)4or5 | 0.22 or 0.07 |
| | (GAC)8 | 0.07 |
| | (GAG)4or8 | 0.11 or 0.07 |
| | (GAT)4 | 0.07 |
| | (GCA)4 | 0.15 |
| | (GCC)6 | 0.07 |
| | (GCT)4 | 0.11 |
| | (GGA)4or5 | 0.22 or 0.07 |
| | (TAA)4 | 0.07 |
| | (TAG)4 | 0.07 |
| | (TAT)5or6 | 0.07 or 0.07 |
| | (TCA)4 | 0.11 |
| | (TCC)4or5 | 0.15 or 0.07 |
| | (TCT)4or5 | 0.11 or 0.15 |
| | (TGA)4 | 0.07 |
| | (TGC)4 | 0.07 |
| | (TGG)6 | 0.07 |
| | (TTC)4 | 0.26 |
| | (TTG)4 | 0.11 |
| Dinucleotide repeats | (AC)4or5or6 | 0.77 or 0.18 or 0.11 |
| | (AG)4or5 | 1.39 or 0.15 |
| | (AT)4or5or6or7or8 | 2.27 or 0.70 or 0.11 or 0.15 or 0.15 |
| | (CA)4or5or6 | 0.81 or 0.07 or 0.11 |
| | (CG)4or5or6 | 0.29 or 0.11 or 0.07 |
| | (CT)4or5 | 0.84 or 0.18 |
| | (GA)4or5or6 | 0.99 or 0.07 or 0.18 |
| | (GC)4or5or6 | 0.40 or 0.15 or 0.07 |
| | (GT)4or5or6 | 0.37 or 0.07 or 0.07 |
| | (TA)4or5or6or7 | 2.05 or 0.37 or 0.11 or 0.11 |
| | (TC)4or5or6 | 1.28 or 0.22 or 0.07 |
| (TG)4or5 | 0.73 or 0.15 |