| Literature DB >> 16931874 |
Fabrice Lopez1, Samuel Granjeaud, Takeshi Ara, Badih Ghattas, Daniel Gautheret.
Abstract
The termination of mature eukaryotic mRNAs occurs at specific polyadenylation sites located downstream from stop codons in the 3'-untranslated region (UTR). An accurate delineation of these sites is essential for the study of 3'-UTR-based gene regulation and for the design of pertinent probes for transcriptome analysis. Although typical poly(A) sites are located between 0 and 2 kb from the stop codon, EST sequence analyses have identified sites located at unexpectedly long ranges (5-10 kb) in a number of genes. Here we perform a complete mapping of EST and full-length cDNA sequences on the mouse and human genome to observe putative poly(A) sites extending beyond annotated 3'-ends and into the intergenic regions. We introduce several quality parameters for poly(A) site prediction and train a classification tree to associate P-values to predicted sites. We observe a higher than background level of high-scoring sites up to 12-15 kb past the stop codon, both in human and mouse. This leads to an estimate of about 5000 human genes having unreported 3'-end extensions and about 3500 novel polyadenylated transcripts lying in present "intergenic" regions. These high-scoring, long-range poly(A) sites corresponding to novel transcripts and gene extensions should be incorporated into current human and mouse gene repositories.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16931874 PMCID: PMC1581981 DOI: 10.1261/rna.136206
Source DB: PubMed Journal: RNA ISSN: 1355-8382 Impact factor: 4.942