| Literature DB >> 20346121 |
Lieven Thorrez1, Leon-Charles Tranchevent, Hui Ju Chang, Yves Moreau, Frans Schuit.
Abstract
BACKGROUND: The 3' untranslated regions (UTRs) of transcripts are not well characterized for many genes and often extend beyond the annotated regions. Since Affymetrix 3' expression arrays were designed based on expressed sequence tags, many probesets map to intergenic regions downstream of genes. We used expression information from these probesets to predict transcript extension beyond currently known boundaries.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20346121 PMCID: PMC2858751 DOI: 10.1186/1471-2164-11-205
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Identification of unannotated extended probe sets. A. At the core of this analysis is our murine mRNA expression database comprising 22 different tissues. Unannotated probe sets binding putative 3' UTR extensions were identified as shown in the flow chart. The number of RefSeq transcripts retained after each filtering step is shown, finally resulting in 922 transcripts corresponding to 845 unique gene symbols for which 3' UTR extensions are predicted. B. Histogram with Pearson's correlations. The red histogram depicts correlations between the 1849 probe set pairs selected as shown in panel A. The blue histogram depicts correlations between random pairs of probe sets from the same microarray platform. As a cut-off for statistically significant co-expression, a Pearson correlation of 0.6 was chosen, resulting in an estimated false positive rate of 2%.
Figure 2Expression and sequence conservation of extended probesets. A. An example of two probe sets with concordant expression profiles. Expression of the 2 correlating probe sets is shown across all 70 microarrays, which are 3-5 biological replicates from 22 different tissues. B. Genomic context of the probe sets shown in panel A. Note that transcriptional direction is from right to left (negative strand). The red probe set (1416008_at) is targeting an intergenic region, which according to our algorithm likely is an extended 3'UTR of the upstream gene Sat1b. The region immediately downstream of the known 3' UTR; this region is highly conserved and contains 2 predicted polyA signals (indicated by stars). C. Conservation score distribution of mouse 3' UTR regions (green) and intergenic regions (red). Black arrow indicates the conservation score of the extended regions.
Figure 3Detection of transcripts containing the predicted 3'UTR extension. A. Schematic overview of the validation PCR setup. For each of the eight tested genes, two primer pairs were designed with the forward primer in common. The reverse primer was binding either within the known 3' UTR region (reverse short) or in the predicted extension (reverse long). Amplified regions were termed S (short) or L (long), respectively and the expected size of these fragments for each of the genes is displayed below. In case of false positive prediction, no PCR fragment is expected for the L fragment, since the reverse long primer then has no template to bind to. Contamination of genomic DNA was excluded because primer pairs were spanning at least one intron. B. Results of PCR amplification visualized by gel electrophoresis. For each gene, four lanes represent amplification of the short and long fragment in two tissues: liver and muscle. Next to these four lanes a size marker was included with corresponding fragment sizes indicated left of the image.