| Literature DB >> 28161492 |
Sim Lin Lim1, Haylee M D'Agui1, Neal J Enright2, Tianhua He3.
Abstract
Banksia is a significant element in vegetation of southwestern Australia, a biodiversity hotspot with global significance. In particular, Banksia hookeriana represents a species with significant economic and ecological importance in the region. For better conservation and management, we reported an overview of transcriptome of B. hookeriana using RNA-seq and de novo assembly. We have generated a total of 202.7 million reads (18.91 billion of nucleotides) from four leaf samples in four plants of B. hookeriana, and assembled 59,063 unigenes (average size=1098bp) through de novotranscriptome assembly. Among them, 39,686 unigenes were annotated against the Swiss-Prot, Clusters of Orthologous Groups (COG), and NCBI non-redundant (NR) protein databases. We showed that there was approximately one single nucleotide polymorphism (SNP) per 5.6-7.1kb in the transcriptome, and the ratio of transitional to transversional polymorphisms was approximately 1.82. We compared unigenes of B. hookeriana to those of Arabidopsis thaliana and Nelumbo nucifera through sequence homology, Gene Ontology (GO) annotation, and KEGG pathway analyses. The comparative analysis revealed that unigenes of B. hookeriana were closely related to those of N. nucifera. B. hookeriana, N. nucifera, and A. thaliana shared similar GO annotations but different distributions in KEGG pathways, indicating that B. hookeriana has adapted to dry-Mediterranean type shrublands via regulating expression of specific genes. In total 1927 potential simple sequence repeat (SSR) markers were discovered, which could be used in the genotype and genetic diversity studies of the Banksia genus. Our results provide valuable sequence resource for further study in Banksia.Entities:
Keywords: Banksia hookeriana; Gene annotation; RNA-seq; SSR marker; Transcriptome
Mesh:
Substances:
Year: 2017 PMID: 28161492 PMCID: PMC5339403 DOI: 10.1016/j.gpb.2016.11.001
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1De novo assembly pipeline of Banksia hookeriana leaf transcriptome
The initial sequencing output statistics in four B. hookeriana leaf samples
| Sample | Total No. of raw reads | Total No. of clean reads | Total No. of clean bases | Base call accuracy (%) | GC content (%) |
|---|---|---|---|---|---|
| A | 50,007,092 | 46,429,128 | 4,642,912,800 | 97.90 | 45.53 |
| B | 49,999,398 | 46,813,630 | 4,681,363,000 | 97.90 | 45.66 |
| C | 50,696,470 | 47,472,990 | 4,747,299,000 | 97.88 | 45.88 |
| D | 51,769,470 | 48,432,518 | 4,843,251,800 | 97.88 | 45.84 |
| Mean | 50,618,108 | 47,287,067 | 4,728,706,650 | 97.89 | 45.73 |
Contig and unigene assembly of B. hookeriana leaf transcriptome
| Assembly | Sample | Total No. of contigs/unigenes | Total length (nt) | Mean length (nt) | N50 | Total No. of consensus sequences | Total No. of distinct clusters | Total No. of distinct singletons |
|---|---|---|---|---|---|---|---|---|
| Contigs | A | 93,281 | 37,550,807 | 400 | 880 | – | – | – |
| B | 104,947 | 41,313,733 | 394 | 869 | – | – | – | |
| C | 98,063 | 40,097,562 | 409 | 925 | – | – | – | |
| D | 100,923 | 40,874,608 | 405 | 909 | – | – | – | |
| Mean | 99,304 | 39,959,178 | 402 | 896 | – | – | – | |
| Unigenes | A | 53,873 | 40,258,014 | 747 | 1440 | 53,873 | 17,236 | 36,637 |
| B | 59,340 | 46,401,909 | 782 | 1528 | 59,340 | 19,229 | 40,111 | |
| C | 55,176 | 43,281,707 | 784 | 1504 | 55,176 | 18,388 | 36,788 | |
| D | 56,904 | 44,344,372 | 779 | 1507 | 56.904 | 18,814 | 38,090 | |
| Overall | 59,063 | 64,827,597 | 1098 | 1813 | 59,063 | 25,912 | 33,151 |
Note: Overall values were calculated based on the entire library.
SNP discovery from B. hookeriana leaf transcriptome
| Sample | Total No. of SNPs | No. of SNPs per 1 kb | Heterozygosity | Transition | Transversion | No. of unigenes |
|---|---|---|---|---|---|---|
| A | 39,485 | 0.609 | 28,618 | 25,069 | 14,416 | 13,818 |
| B | 36,611 | 0.565 | 16,424 | 23,287 | 13,324 | 13,122 |
| C | 44,330 | 0.684 | 31,948 | 28,224 | 16,106 | 15,255 |
| D | 46,170 | 0.712 | 33,373 | 29,363 | 14,416 | 13,818 |
Note: SNP, single nucleotide polymorphism.
Figure 2GO annotation analysis of B. hookeriana, N. nucifera, and A. thaliana transcriptomes
Top five KEGG pathways enriched in B. hookeriana
| Pathway name | Pathway ID | Genes contained, number (%) |
|---|---|---|
| Metabolic pathways | ko01100 | 3704 (13.86%) |
| Biosynthesis of secondary metabolites | ko01110 | 2235 (8.36%) |
| Plant–pathogen interaction | ko04626 | 1418 (5.31%) |
| Plant hormone signal transduction | ko04075 | 1039 (3.88%) |
| Spliceosome | ko03040 | 666 (2.49%) |