| Literature DB >> 25086984 |
Bei Gao, Daoyuan Zhang1, Xiaoshuang Li, Honglan Yang, Andrew J Wood.
Abstract
BACKGROUND: Syntrichia caninervis is a desiccation-tolerant moss and the dominant bryophyte of the Biological Soil Crusts (BSCs) found in the Mojave and Gurbantunggut deserts. Next generation high throughput sequencing technologies offer an efficient and economic choice for characterizing non-model organism transcriptomes with little or no prior molecular information available.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25086984 PMCID: PMC4124477 DOI: 10.1186/1756-0500-7-490
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Summary of sequence assembly after Illumina sequencing
| Sequenced reads | Total number | 58,031,432 |
| Total read length (bp) | 4,642,514,580 | |
| Reads length | 90 + 70 | |
| GC content | 55.09% | |
| Q20 persentage | 97.55% | |
| Contigs | Total number | 162,865 |
| Total length (bp) | 46,952,370 | |
| Mean length (bp) | 288 | |
| Contig N50 (bp) | 429 | |
| Unigenes | Total number | 92,240 |
| Total length (bp) | 45,480,162 | |
| Mean length (bp) | 493 | |
| Unigene N50 (bp) | 662 | |
| Minimum length (bp) | 150 | |
| Maximum length (bp) | 4,909 |
Figure 1Overview of the transcriptome sequencing and assembly. (A) Histogram of the length of unigenes that returned BLASX hits in public protein databases and the sequences with no hit. (B) Distribution of unigene RPKM values for identified protein coding sequences and putative non-coding sequences. (C) Histogram of the average read depth for unigenes. Sequencing depth values above 100× was binned. (D) Ortholog hit ratio analysis for S. caninervis unigene sequences.
Figure 2Venn diagram showing the BLASTX results of the transcriptome against five protein databases. Using BLASTX search, de novo reconstructed unigene sequences were queried against the following public databases: NCBI-NR, Swiss-Prot, COSMOSS, KEGG and COG. The number of transcripts that have significant hits (E-value ≤ 1e-5) against the five databases is shown in each intersection of the Venn diagram.
Figure 3Species distribution of the top BLASTX hits obtained using the transcriptome. Top scoring BLASTX hits against the NCBI-NR protein database are depicted. The number of BLAST hits per species is shown on the x-axis. The 12 most represented species with proportions of more than 1% are shown in this graph.
COG functional classification of transcripts
| | | |
| RNA processing and modification | A | 125 |
| Chromatin structure and dynamics | B | 342 |
| Translation, ribosomal structure and biogenesis | J | 4,202 |
| Transcription | K | 2,927 |
| Replication, recombination and repair | L | 2,876 |
| | | |
| Cell cycle control, cell division, chromosome partitioning | D | 1,812 |
| Cell wall/membrane/envelope biogenesis | M | 1,774 |
| Cell motility | N | 348 |
| Posttranslational modification, protein turnover, chaperones | O | 2,825 |
| Signal transduction mechanisms | T | 2,069 |
| Intracellular trafficking, secretion, and vesicular transport | U | 1,249 |
| Defense mechanisms | V | 477 |
| Extracellular structures | W | 19 |
| Nuclear structure | Y | 9 |
| Cytoskeleton | Z | 496 |
| | | |
| Energy production and conversion | C | 1,782 |
| Amino acid transport and metabolism | E | 1,584 |
| Nucleotide transport and metabolism | F | 447 |
| Carbohydrate transport and metabolism | G | 2,573 |
| Coenzyme transport and metabolism | H | 685 |
| Lipid transport and metabolism | I | 1,247 |
| Inorganic ion transport and metabolism | P | 1,126 |
| Secondary metabolites biosynthesis, transport and catabolism | Q | 932 |
| | | |
| General function prediction only | R | 5,211 |
| Function unknown | S | 2,619 |
n = number of unigenes.
Figure 4Gene ontology classification and comparison between the mosses and . Gene ontology annotation results of the genes from the P. patens genome and S. caninervis transcriptome were mapped to categories within the second level of GO terms. GO terms that contain more than 1% of total genes were included in this graph. Abiotic stress related subcategories of the term “response to stimulus” were shown in the box.
Figure 5Protein families and transcription factors in the transcriptome. (A) The 10 most abundant protein families in the S. caninervis transcriptome. (B) Relationship between the occurrence of S. caninervis transcripts and the number of Pfam families in the S. caninervis transcriptome. (C) The 23 most abundant predicted transcription factor protein families. The number of members in each TF family is presented within the brackets. A total of 778 TFs were predicted and classified into 49 TF families (Additional file 1).
Figure 6Detection of homologous genes in mosses and : comparison of orthologous gene groups and protein sequence identity. (A) OrthoMCL was used to identify orthologous groups (OGs) among S. caninervis (Sc), A. thaliana (At) and P. patens (Pp). (B) Density plot of the protein identity between S. caninervis and the model plants.