| Literature DB >> 28879061 |
Santiago Montero-Mendieta1, Manfred Grabherr2, Henrik Lantz2, Ignacio De la Riva3, Jennifer A Leonard1, Matthew T Webster4, Carles Vilà1.
Abstract
Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembled de-novo. We used RNA-seq to obtain the transcriptomic profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.Entities:
Keywords: Clusters of Orthologous Groups; Frog transcriptome; Gene ontology; Genomics; Kyoto encyclopedia of genes and genomes; Protein domain identification; RNA-seq; Transcriptomics; Trinity
Year: 2017 PMID: 28879061 PMCID: PMC5582611 DOI: 10.7717/peerj.3702
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Overall pipeline for the annotation of RNA-seq data.
Boxes with curved sides represent sequence datasets. Red boxes represent analyses, and the software used for those analyses is indicated outside the box. Reference databases are indicated as blue boxes.
Summary of the transcriptome data assembly for Oreobates cruralis.
| Length of raw reads (bp) | 125 |
| Total number of raw reads | 522,877,358 |
| Total number of clean reads | 426,003,462 |
| Total number of normalized reads | 36,428,858 |
| Total number of all transcripts/unigenes | 550,871/422,999 |
| GC-content of all transcripts/unigenes (%) | 45.88/45.39 |
| Total length of all transcripts/unigenes (bp) | 299,133,111/188,399,293 |
| N50 length of all transcripts/unigenes (bp) | 731/467 |
| Mean length of all transcripts/unigenes (bp) | 543/445 |
| Median length of all transcripts/unigenes (bp) | 309/290 |
Figure 2Length distribution of unigenes from Oreobates cruralis.
Figure 3Distribution of BLASTX alignment coverage for O. cruralis unigenes against SwissProt and Xenopus databases.
A high number of orthologous proteins in the databases fully or nearly fully corresponded (>80% coverage) to unigenes in O. cruralis.
Figure 4Top-hit species distribution for unigenes from the transcriptome of O. cruralis in the SwissProt database.
Top 10 Pfam domains identified in the transcriptome of O. cruralis.
| No | Pfam domain | Pfam ID | |
|---|---|---|---|
| 1 | Zinc finger, C2H2 type | PF00096.23 | 961 |
| 2 | WD domain, G-beta repeat | PF00400.29 | 840 |
| 3 | Protein kinase domain | PF00069.22 | 643 |
| 4 | Protein tyrosine kinase | PF07714.14 | 608 |
| 5 | C2H2-type zinc finger | PF13912.3 | 593 |
| 6 | C2H2-type zinc finger | PF13894.3 | 570 |
| 7 | Ankyrin repeat | PF00023.27 | 553 |
| 8 | Immunoglobulin I-set domain | PF07679.13 | 549 |
| 9 | Immunoglobulin domain | PF00047.22 | 517 |
| 10 | Leucine rich repeat | PF13855.3 | 482 |
Figure 5Distribution of top-10 gene ontology GO terms in the transcriptome of O. cruralis identified by homology with the databases via SwissProt and Pfam. Categories shown correspond to gene ontology level 2.
Figure 6Distribution of Clusters of Orthologous Groups (COG) categories in the transcriptome of O. cruralis.
J, Translation, ribosomal structure and biogenesis; A, RNA processing and modification; K, Transcription; L, Replication; recombination and repair; B, Chromatin structure and dynamics; D, Cell cycle control, cell division, chromosome partitioning; Y, Nuclear structure; V, Defense mechanisms; T, Signal transduction mechanisms; M, Cell wall/membrane/envelope biogenesis; N, Cell motility; Z, Cytoskeleton; W, Extracellular structures; U, Intracellular trafficking, secretion, and vesicular transport; O, Posttranslational modification, protein turnover, chaperones; X, Mobilome: prophages, transposons; C, Energy production and conversion; G, Carbohydrate transport and metabolism; E, Amino acid transport and metabolism; F, Nucleotide transport and metabolism; H, Coenzyme transport and metabolism; I, Lipid transport and metabolism; P, Inorganic ion transport and metabolism; Q, Secondary metabolites biosynthesis, transport and catabolism; R, General function prediction only; S, Function unknown.
Figure 7Distribution of KEGG Orthology (KO) categories in the transcriptome of O. cruralis.