| Literature DB >> 23122049 |
Jun Chen1, Severin Uebbing, Niclas Gyllenstrand, Ulf Lagercrantz, Martin Lascoux, Thomas Källman.
Abstract
BACKGROUND: A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23122049 PMCID: PMC3543189 DOI: 10.1186/1471-2164-13-589
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of the collected sequence data
| Light extraction 1 | 28,073,980 | 5,668,652 | 3,229,442 | 14,751,144 | 3,738,405 |
| Light extraction 2 | 20,442,438 | 4,589,450 | 2,930,651 | 9,693,554 | 3,077,721 |
| Light extraction 3 | 23,554,246 | 5,183,788 | 2,929,447 | 12,575,038 | 3,217,660 |
| Dark extraction 1 | 23,885,794 | 5,004,782 | 3,054,332 | 12,080,884 | 3,359,437 |
| Dark extraction 2 | 22,157,136 | 4,929,488 | 2,929,710 | 11,349,636 | 3,084,920 |
| Dark extraction 3 | 21,643,358 | 4,981,140 | 2,933,669 | 11,307,910 | 3,060,296 |
Collected number of sequences from the 6 different sequence libraries. The number of reads used for assembly and mapping divided as pair-end (PE) and single-end (SE) reads retained after cleaning and filtering as described in Material and methods.
Figure 1Distribution of length and coverage of PUTs. Histogram of length (left) and number of reads per PUT (right). To aid visualization both PUT length and read count values were log10 transformed. Note that all PUTs shorter than 150bp were excluded from the histograms.
Figure 2Length distributions of different mRNA features in the PUTs.a) 5’-UTR (untranslated region). b) ORFs (open reading frame). c) 3’-UTR. d) Comparison of putative full-length ORFs in P. abies and P. glauca.
Figure 3Histogram showing domain predictions from ORFs in and . The proteins domain names are from the PFAM-A database.
Putative single nucleotide polymorphisms at different quality criteria
| Sanger Qual > = 60 | 20,781 | 11,389 | 9,392 | 1.21 |
| Sanger Qual > = 60 and AF [0.25, 0.75] | 14,745 | 8,372 | 6,372 | 1.31 |
| Sanger Qual > = 60 and AF [0.25, 0.75] | 9,394 | 5,622 | 3,770 | 1.49 |
| and Depth > = 20, DP4> = 10 |
The effect of using different quality criteria for identifying SNPs. Sanger Qual = Sanger sequence quality, AF = Allele frequency, Depth = sequence depth at the base DP4 = Depth of the two variants according to the read direction. For details see the samtools specifications.
Figure 4Phylogenetic tree showing the relationship between the five conifer species included in the study. Branch lengths represent inferred genetic distance.
Figure 5Histograms showing the distribution of dN/dS values from pairwise comparisons of species. The left plot shows pattern for all potential orthologous sequences in the data set whereas the right plot shows the pattern when restricting the data sets to only putative full-length ORFs.
Figure 6Volcano plot of gene expression pattern. The y-axis represent the posterior probability of differential expression and the fold change in log2 on the x-axis. Dots are colored to represent estimated expression level. The red line indicate a cut-off value of 0.5 for the posterior probability of differential expression.