| Literature DB >> 17504516 |
Céline Keime1, Marie Sémon, Dominique Mouchiroud, Laurent Duret, Olivier Gandrillon.
Abstract
BACKGROUND: SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts.Entities:
Mesh:
Year: 2007 PMID: 17504516 PMCID: PMC1884178 DOI: 10.1186/1471-2105-8-154
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Localization of the LongSAGE tags in the human genome. Results of the mapping of the reliable tags from all publicly available human LongSAGE libraries (148,553 tags) to the human nuclear and mitochondrial genomes. Only matches with 100% similarity over 21 bp are included in the dataset.
Figure 2Tags that do not match the human genome sequence, but are derived from human mRNAs. This figure shows different situations which lead to tags that do not match the human genome sequence, even if they are derived from human mRNAs. The expected proportion of such tags among all different tags (calculated using known transcript sequences) are shown in brackets.
Figure 3Proportion of tags mapping to the murine genome. In order to search for probable murine contaminants among our set of tags, we mapped to the murine genome all tags that do not match the human genome sequence and for which we could not find any other human origin. This figure shows the percentage of these tags that could be localized on the murine nuclear and mitochondrial genome, for each public LongSAGE library. The identification number of the LongSAGE libraries in the Gene Expression Omnibus repository is indicated on the x axis. All embryonic stem cells were propagated on murine embryonic fibroblasts excepted the ones used to construct the library indicated by an arrow.
Figure 4Tags that match uniquely the human genome sequence. Proportion of tags found in different genic regions among the tags that match uniquely the human genome sequence.
Distribution of tags that match once the genome sequence outside annotated transcripts
| Number | % of tags matching outside annotated transcripts | |
| Transposable element | 2874 | 12 |
| EST | 18042 | 78 |
| Transfrag | 2017 | 9 |
| EST transfrag | 18180 | 79 |
| Transposable element EST transfrag | 21054 | 83 |
Number of tags outside annotated transcripts. The percentage values correspond to the proportion of each of these sets of tags compared to the set of all tags matching once in the genome, outside an annotated transcript.