| Literature DB >> 22449401 |
Xiaoli Tang1, Libin Deng, Dake Zhang, Jiari Lin, Yi Wei, Qinqin Zhou, Xiang Li, Guilin Li, Shangdong Liang.
Abstract
For transcriptome analysis, it is critical to precisely define all the transcripts across the whole genome. More and more digital gene expression (DGE) scannings have indicated the presence of huge amount of novel transcripts in addition to the known gene models. However, almost all these studies still depend crucially on existing annotation. Here, we present Gene2DGE, a Perl software package for gene model renewal with DGE data. We applied Gene2DGE to the mouse blastomere transcriptome, and defined 98,532 read-enriched regions (RERs) by read clustering supported by more than four reads for each base pair. Taking advantage of this ab initio method, we refined 2,104 exonic regions (4% of a total of 48,501 annotated transcribed regions) with remarkable extension into un-annotated regions (>50 bp). For 5% of uniquely mapped reads falling within intron regions, we identified 13,291 additional possible exons. As a result, we renewed 4,788 gene models, which account for 39% of a total of 12,277 transcribed genes. Furthermore, we identified 12,613 intergenic RERs, suggesting the possible presence of novel genes outside the existing gene models. In this study, therefore, we have developed a suitable tool for renewal of known gene models by ab initio prediction in transcriptome dissection. The Gene2DGE package is freely available at http://bighapmap.big.ac.cn/.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22449401 PMCID: PMC5054491 DOI: 10.1016/S1672-0229(11)60033-8
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Summary of read-enriched regions (RERs) across the mouse genome. A. Distribution of read counts within RERs demonstrates possible transcription in previously non-annotated regions. B. Deviation between exon ends and corresponding RER boundaries. The minus numbering indicates RERs are shorter than known exons, while the positive numbering indicates RERs are longer than known exons. The apparent shortness of both first 5’ and last 3’ ends is possibly caused by transcript degradation.
Figure 2Transcriptome features of mouse blastomere illustrated by DGE data based on annotation available. Read distribution on chr 7 (56900000-57200000) in upper panel (Top/Reverse) was shown using sequencing data obtained from mouse blastomere. Boundaries of RERs were generated using Gene2DGE based on the read distribution. Improved annotation of gene models and novel transcriptions were also illustrated.