| Literature DB >> 21470962 |
Yuri Kapustin1, Elcie Chan, Rupa Sarkar, Frederick Wong, Igor Vorechovsky, Robert M Winston, Tatiana Tatusova, Nick J Dibb.
Abstract
We describe a new program called cryptic splice finder (CSF) that can reliably identify cryptic splice sites (css), so providing a useful tool to help investigate splicing mutations in genetic disease. We report that many css are not entirely dormant and are often already active at low levels in normal genes prior to their enhancement in genetic disease. We also report a fascinating correlation between the positions of css and introns, whereby css within the exons of one species frequently match the exact position of introns in equivalent genes from another species. These results strongly indicate that many introns were inserted into css during evolution and they also imply that the splicing information that lies outside some introns can be independently recognized by the splicing machinery and was in place prior to intron insertion. This indicates that non-intronic splicing information had a key role in shaping the split structure of eukaryote genes.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21470962 PMCID: PMC3152350 DOI: 10.1093/nar/gkr203
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) CSF searches for transcript alignments that form one of four patterns (i–iv). All of these patterns contain a group of major transcripts that share a common deletion and a minor transcript that shares only one of the deletion endpoints. CSF defines the common deletion endpoints as authentic splice sites and the less common deletion endpoint of the minor transcript(s) as cryptic or alternative splice sites (arrowed). (B) Schematic of the HBB gene for human β-globin, which contains two introns that are constitutively spliced from pre-mRNA. As illustrated the vast majority of ESTs align as shown and define the three exons of this gene. The circle shows a pattern of ESTs that CSF is designed to recognize and that is reported in Figure 2A. The numbers in brackets show the genome coordinates of the three splice sites that are identified and listed by CSF (Figure 2A).
Figure 2.(A) CSF output for the human gene HBB for β-globin. (B) CSF output for WT1 (see text). The coordinates that are used refer to the NCBI36/hg18 human genome assembly. It should be noted that HBB and WT1 genes align in a 3′–5′ direction with respect to their genome coordinates.
The alignment of css predictions by CSF reveals consensus sequences typical of splice sites
This Table is compiled from 169 and 179 examples of 5′ and 3′ css predictions, respectively, that are supported by only single ESTs (Supplementary Table S1). The relative proportions of the bases T, C, G and A are shown as a percentage at five positions both upstream and downstream of the predicted cryptic cleavage site. The most frequently occurring bases are shaded.
Figure 3.Experimental confirmation of CSF predictions. Predicted css are shown by the vertical boxes for the indicated genes. Active css would be expected to generate PCR fragments of the sizes shown in the gene diagrams. PCR products marked with asterisks were sequenced in order to confirm the use of the predicted css (Supplementary Figure S3). Messenger RNA was prepared from the human cell lines K562 (lane 1); HEPG2 (lanes 2, 3) and primary mesenchymal stem cells (lanes 4–6) and used for RT–PCR with the indicated primers (see Supplementary Data).
Figure 4.Comparison of intron and css positions for a small part of the ribosomal protein gene database (23). (A) An alignment of 14 amino acids of the RPS5 gene that marks the position of introns for 23 species. (B) The same alignment including two css positions identified by CSF. (C) An alignment of part of RPL7A that illustrates the conservation of two css that also match an intron in Chlamydomonas reinhardtii (Cr). * - marks the position of css that match introns; ^ - marks conserved css.