Literature DB >> 18285363

Human branch point consensus sequence is yUnAy.

Kaiping Gao1, Akio Masuda, Tohru Matsuura, Kinji Ohno.   

Abstract

Yeast carries a strictly conserved branch point sequence (BPS) of UACUAAC, whereas the human BPS is degenerative and is less well characterized. The human consensus BPS has never been extensively explored in vitro to date. Here, we sequenced 367 clones of lariat RT-PCR products arising from 52 introns of 20 human housekeeping genes. Among the 367 clones, a misincorporated nucleotide at the branch point was observed in 181 clones, for which we can precisely pinpoint the branch point. The branch points were comprised of 92.3% A, 3.3% C, 1.7% G and 2.8% U. Our analysis revealed that the human consensus BPS is simply yUnAy, where the underlined is the branch point at position zero and the lowercase pyrimidines ('y') are not as well conserved as the uppercase U and A. We found that the branch points are located 21-34 nucleotides upstream of the 3' end of an intron in 83% clones. We also found that the polypyrimidine tract spans 4-24 nucleotides downstream of the branch point. Our analysis demonstrates that the human BPSs are more degenerative than we have expected and that the human BPSs are likely to be recognized in combination with the polypyrimidine tract and/or the other splicing cis-elements.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18285363      PMCID: PMC2367711          DOI: 10.1093/nar/gkn073

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In higher eukaryotes, pre-mRNA splicing is mediated by degenerative splicing cis-elements comprised of the branch point sequence (BPS), the polypyrimidine tract (PPT), the 5′ and 3′ splice sites and exonic/intronic splicing enhancers/silencers. Stepwise assembly of the spliceosome starts from recruitment of U1 snRNP, SF1, U2AF65 and U2AF35 to the 5′ splice site, the branch site, the PPT and the 3′ end of an intron, respectively (Complex E). SF1, a 75-kDa polypeptide, is a mammalian homolog of yeast BBP (branch point-binding protein). U2AF65 and U2AF35 bring U2snRNP to the BPS in place of SF1 (1,2). The BPS establishes base pairing interactions with a stretch of ‘GUAGUA’ of U2 snRNA (3,4), which then bulges out the branch site nucleotide, usually an adenosine (Complex A) (5). Thereafter, pre-mRNAs are spliced in two sequential transesterification reactions mediated by the spliceosome. In the first step, the 2′-OH moiety of the branch site nucleotide carries out a nucleophilic attack against a phosphate at the 5′ splice site, generating a free upstream exon, as well as a lariat carrying the intron and the downstream exon. In the second step, the 3′-OH moiety of the upstream exon attacks the 3′ splice site leading to intron excision and ligation of the upstream and downstream exons (6). The branch site is thus involved in the first step of splicing, and potentially in the second step of splicing, although the detailed molecular mechanisms of contribution to the second step remain elusive (7). The BPS is strictly conserved in yeast and has the sequence of UACUAAC, where the branch point adenosine is underlined. On the other hand, the human BPSs are degenerative. No extensive in vitro identification of human BPSs has been reported. Five communications address the mammalian consensus BPSs (Table 1). Three reports are based on 11–20 in vitro identified BPSs, and two are dependent on the in silico analysis of the human genome.
Table 1.

Previously reported consensus BPSs

Consensus BPSNoteReferences
S. cerevisiae
UACUAACInvariant BPS(36,37)
Mammals
YNYURAY11 mammalian BPSs(25)
YNCURAC20 mammalian BPSs(38)
YNCURAY15 BPSs of human HBB(39)
CURAYIn silico homology search(40)
YUVAYa CUSAYbIn silico homology search(41)

Branch point ‘A’ is underlined. Y, U or C; R, A or G; S, G or C; V, A, C or G.

aLow GC% region.

bHigh GC% region.

Previously reported consensus BPSs Branch point ‘A’ is underlined. Y, U or C; R, A or G; S, G or C; V, A, C or G. aLow GC% region. bHigh GC% region. In an effort to establish the human consensus BPS based on in vitro experiments, we analyzed 367 clones of lariat RT-PCR products arising from 52 introns of 20 human housekeeping genes. We found that the human consensus BPS is yUnAy. Our analysis demonstrates that the human BPSs are more degenerative than we have expected and that the BPS is likely recognized in combination with the PPT and/or the other splicing cis-elements.

MATERIALS AND METHODS

Lariat RT-PCR primers for human housekeeping genes

Among the 575 human housekeeping genes registered at http://www.compugen.co.il/supp_info/Housekeeping_genes.html (8), we excluded 82 genes, for which we could not find entries in the EST profile viewer of the NCBI UniGene database (http://www.ncbi.nlm.nih.gov/UniGene/). Among 4188 introns of the remaining 493 human housekeeping genes, we excluded introns with a size of less than 300 nucleotides or with multiple repeated segments, because it was difficult to design appropriate PCR primers for such introns. We next sorted the 493 genes in the order of skin expression levels according to the EST profile viewer, and picked up the 20 best genes. We thus analyzed 52 introns of the 20 human housekeeping genes (Supplementary Table 1). We placed the sense primers at least 100 nucleotides upstream of the 3′ end of an intron, and the antisense primers at least 10 nucleotides downstream of the 5′ end of an intron. The melting temperatures of the primers were designed to be 64–67°C according to the nearest neighbor method. Gene symbols, intron numbers and primer sequences are indicated in Supplementary Table 1.

Lariat RT-PCR to identify the branch point

We performed nested lariat RT-PCR to amplify a fragment spanning the 2′–5′ phosphodiester bond at the branch point (9). We isolated total RNA from HEK293 cells grown to confluency in DMEM medium (Sigma-Aldrich) supplemented with 10% fetal bovine serum (Sigma-Aldrich) and penicillinstreptomycin (Invitrogen). First-strand cDNA was synthesized with SuperScript II reverse transcriptase (Invitrogen) using an intron-specific antisense primer C (Figure 1) located close to the 5′ end of an intron. The first round of lariat RT-PCR was performed using primers C and D with Taq HS DNA polymerase (Takara) in 25 µl. The nested lariat RT-PCR was carried out with primers A and B using 0.2 µl of the first-round lariat RT-PCR product in 50 µl. The first-round PCR program was comprised of an initial denaturation step at 94°C for 3 min, followed by 30 cycles of 94°C for 30 s, 55°C for 30 s and 72°C for 1 min. For the nested PCR, we performed 35 cycles of amplification.
Figure 1.

(A) Lariat RT-PCR of PGK1 intron 6 indicates a misincorporated ‘A’ nucleotide at the branch point. We can pinpoint the branch point in this situation. The sequencing is performed with primer B. The small dots indicate the 5′ end of an intron. (B) Lariat RT-PCR of GAPDH intron 2 exhibits no misincorporated nucleotide. The branch point can be either at ‘C’ or upstream ‘T’ depending on whether skipping of the reverse transcriptase occurs or not. We cannot locate the exact branch point in this case. The sequencing is performed with primer A.

(A) Lariat RT-PCR of PGK1 intron 6 indicates a misincorporated ‘A’ nucleotide at the branch point. We can pinpoint the branch point in this situation. The sequencing is performed with primer B. The small dots indicate the 5′ end of an intron. (B) Lariat RT-PCR of GAPDH intron 2 exhibits no misincorporated nucleotide. The branch point can be either at ‘C’ or upstream ‘T’ depending on whether skipping of the reverse transcriptase occurs or not. We cannot locate the exact branch point in this case. The sequencing is performed with primer A. We purified the nested lariat RT-PCR products using the Wizard SV Gel and PCR Clean-up system (Promega), and cloned them into the pGEM-T-Easy vector (Promega). We sequenced 10 clones for each intron using the CEQ8000 genetic analyzer (Beckman Coulter).

Presentations of sequence motifs

Sequence motifs are presented using the Pictogram web server at http://genes.mit.edu/pictogram.html (10). To indicate the amount of information conferred by each nucleotide at each position, we employed the WebLogo program at http://weblogo.berkeley.edu/ (11,12). We also calculated the total amount of information content at each position using the following formula: where Pi represents the probability of nucleotide i at each position. Rsequence represents the degree of conservation of a sequence motif at a specific position (13). It becomes 2.0 when a single nucleotide is exclusively observed at a specific position, whereas it becomes 0.0 when four nucleotides are evenly observed.

RESULTS

Collation of previously identified mammalian BPSs

As shown in Table 1, five communications address the mammalian consensus BPSs. In order to understand the in vitro determined mammalian consensus BPS, we collated 29 previously reported BPSs comprised of 25 mammalian and four viral introns (Table 2). Viral introns should be spliced in the same way as the mammalian genes. The branch points are located between positions −38 to −4 (mean and SD, −26.9 ± 7.8). Nucleotides ‘C’, ‘U’, ‘A’ and ‘Y’ at positions −3, −2, 0 and +1 are observed at 21 (72.4%), 21 (72.4%), 23 (79.3%) and 25 sites (86.2%), respectively. The deduced consensus BPS thus becomes CUnAy at positions −3 to +1 (Figure 2A and B), when we arbitrarily assume that positions with the information contents above 0.45 are significant.
Table 2.

Previously identified mammalian and viral BPSs

SpeciesGeneIntronBPSPositionPredicted BPaBPS ScoreaReference
H. sapiensCALCA4CACTCAC−36b−36A3.85(42)
H. sapiensCALCA3TACTGTC−23b−42A2.60(21)
H. sapiensCALCA3GTACTGT−24b−42A2.60(21)
H. sapiensCALCA3GGTGCAT−32b−42A2.60(21)
H. sapiensCSH11CCTCCAT−23b−23A2.75(22)
H. sapiensDQB13CACAGAC−21c−21A3.25(17)
H. sapiensGH11CTCTGTT−22bnana(22)
H. sapiensGH11GGCTCCC−28bnana(22)
H. sapiensGH11TGCTCTC−36bnana(22)
H. sapiensGH14GCCTCTC−24b−37A2.80(22)
H. sapiensGH14ACCCAAG−37b−37A2.80(22)
H. sapiensGH14TACCCAA−38b−37A2.80(22)
H. sapiensHBA1CCCTCAC−19b−37A3.25(25)
H. sapiensHBA2CACTGAC−18b−18A3.95(25)
H. sapiensHBB1CACTGAC−37b−37A3.95(25)
H. sapiensHBE11CTCTAAT−31b−31A3.45(25)
H. sapiensHBG11TTCTGAC−30b−30A3.85(25)
H. sapiensMYH105TGCTAAC−31bnana(15)
H. sapiensXPC3TGTTGAT−4bnana(16)
H. sapiensXPC3TACTGAT−24bnana(16)
M. musculusHbb-b21CACTAAC−36b−36A3.85(25)
M. musculusIgh5AATTCAC−22b−22A3.30(14)
R. norvesicusIns11CCTCAAC−18b−18A3.15(25)
O. cuniculusHbb1TGCTGAC−34b−34A3.85(25)
O. cuniculusHbb2TGCTAAC−32b−32A3.75(28)
Adenovirus 5E1A1GTTTAAA−30b−30A2.70(25)
Adenovirus 2E2a-21GACTGAC−26b−26A3.70(42)
Adenovirus 5Major Late1TACTTAT−24b−24A3.05(42)
SV40T antigen1ATTCTAA−19b−19A2.00(42)

aPredicted BPs and BPS scores are according to the Branch-Site Analyzer at http://ast.bioinfo.tau.ac.il/BranchSite.htm (14). na, not available.

bIdentified by the primer extension method.

cIdentified by lariat RT-PCR.

Figure 2.

Pictogram (A, C and E) and WebLogo (B, D and F) presentations of mammalian BPSs. (A and B) Twenty-nine mammalian and viral BPSs identified by in vitro experiments (Table 2). (C and D) BPSs with a misincorporated nucleotide at the branch point in our studies. (E and F) BPSs without a misincorporated nucleotide at the branch point in our studies. We assume that ‘A’ residue one or two nucleotides downstream of the sequenced branch point is the actual branch point (see Figure 1B). Position 0 represents the branch point.

Pictogram (A, C and E) and WebLogo (B, D and F) presentations of mammalian BPSs. (A and B) Twenty-nine mammalian and viral BPSs identified by in vitro experiments (Table 2). (C and D) BPSs with a misincorporated nucleotide at the branch point in our studies. (E and F) BPSs without a misincorporated nucleotide at the branch point in our studies. We assume that ‘A’ residue one or two nucleotides downstream of the sequenced branch point is the actual branch point (see Figure 1B). Position 0 represents the branch point. Previously identified mammalian and viral BPSs aPredicted BPs and BPS scores are according to the Branch-Site Analyzer at http://ast.bioinfo.tau.ac.il/BranchSite.htm (14). na, not available. bIdentified by the primer extension method. cIdentified by lariat RT-PCR.

Lariat RT-PCR with or without a misincorporated nucleotide at the branch point

To further explore the human consensus BPS, we chose 52 introns of 20 human housekeeping genes. We performed nested lariat RT-PCR and cloned the amplified products. We sequenced ten clones from each intron, and 367 clones carried available inserts, which represented 117 possible branch sites (Table 3). The remaining 153 clones carried either no inserts or PCR artifacts.
Table 3.

Analyzed introns and observed branch points

GeneIntronIntron size (bp)Predicted BPaBPS ScoreaObserved BPNumber of clonesMisincorporated nucleotideIntronic sequence from BPS position −5 to the 3′ end of an intronb
ACTB3441−24A3.10−30A10TCCCCAGTGTGACATGGTGTATCTCTGCCTTACAG
ALOD8984−24A3.05−26T2TGTCTTAATGTTGTTACCCTGACCCCAACAG
−25A2TGTCTTAATGTTGTTACCCTGACCCCAACAG
−5A1ACCCCAACAG
CCT341069−21A3.30−32A3TGAATAGTGTGAATTCACTAGTGATCTACC TTTTTAG
−30T1AATAGTGTGAATTCACTAGTGATCTACCTTTTTAG
−21A2TAATTCACTAGTGATCTACCTTTTTAG
CCT311845−44A2.95−24A6TGCTTCATACTGTCTGTTTGCTTCTCCAAG
−10C1GTTTGCTTCTCCAAG
EEF1A12366−24A2.80−28A2TTAGTAACCAAGTAACGACTCTTAATCCTTACAG
−19C1AGTAACGACTCTTAATCCTTACAG
EEF1A11943−33A3.25−23A10TGGTTCAAAGTTTTTTTCTTCCATTTCAG
ENO122837−33A3.25−27T6ATTGCTACTACATCTTTTTTCCTCTCATCCAG
−25C2TTGCTACTACATCTTTTTTCCTCTCATCCAG
ENO142394−21A3.45−21A10TCCCTCATTCTCCCCTCTCCCTCGTAG
ENO15737−32A3.20−27A6TACTTCATTCCACTCGGTTCTCTTCTGTTCTAG
−24C1TCATTCCACTCGGTTCTCTTCTGTTCTAG
ENO16615−36A2.85−30T2GCCCAGTGCCATGCTTCTCTGCTCTGCTCTCCCCAG
−27C3AAGTGCCATGCTTCTCTGCTCTGCTCTCCCCAG
−26A3GTGCCATGCTTCTCTGCTCTGCTCTCCCCAG
ENO17796−25A3.05−38A3TTACCTACCTGTTTTCCAAACCTGTTGTCACCATC TCTTCCCAG
−28C1TTTTCCAAACCTGTTGTCACCATCTCTTCCCAG
−27A2TTTTCCAAACCTGTTGTCACCATCTCTTCCCAG
−26A2TTTCCAAACCTGTTGTCACCATCTCTTCCCAG
ENO19547−30A2.70−5T2ATGGCTTCCAG
ENO1111457−26A2.95−48T4AGGTCTGACTTTTCTTTTTTCCTCCCCATCTCTTTACC TTTCTCCTTCCCAAG
−47G2GGTCTGACTTTTCTTTTTTCCTCCCCATCTCTTT ACCTTTCTCCTTCCCAAG
−46A3TGTCTGACTTTTCTTTTTTCCTCCCCATCTCTTTACC TTTCTCCTTCCCAAG
G22P116031−28A2.90−30A2AGGACAAACATTTTCTTCCATTTTTTTCCCCATAG
−28A4TGACAAACATTTTCTTCCATTTTTTTCCCCATAG
G22P183212−26A2.50−31A2AAGTCAAATCAAAGAAAATTTATCTCCTTTCTTCAG
−26A1TAAATCAAAGAAAATTTATCTCCTTTCTTCAG
−25A1TAATCAAAGAAAATTTATCTCCTTTCTTCAG
G22P1102978−33A3.60−33A10GACTCACCAGGCCACTCTTCTGTGTTTTGATTT TCTAG
GAPDH21633−26A3.35−6T10TTGTCTCTTAG
HSPA81734−39A3.15−23A1TTTTTAAACCAGATTTTTCTTTTTTTCAG
−19A1AAACCAGATTTTTCTTTTTTTCAG
−17A1CACCAGATTTTTCTTTTTTTCAG
HSPCB6448−18A3.15−22A6TGTACCACTTATTTTTGGTTTCTTTCAG
−23C1TGTACCACTTATTTTTGGTTTCTTTCAG
HSPCB10777−22A3.20−24T1CAATCTAAGGCTTTTGTGATCGTCCACAG
−22A2TATCTAAGGCTTTTGTGATCGTCCACAG
HSPCB11434−22A2.75−18A1TAATTAATGAGATTTTTATTTTAG
LDHB27526−22A3.35−24T4GGTTCTAATGCCTGTTTTTGCGTTTACAG
−23A1TGTTCTAATGCCTGTTTTTGCGTTTACAG
−22A6TTTCTAATGCCTGTTTTTGCGTTTACAG
PGK115461−28A3.00−29G2AAGTTGATCATGGTCTTGCATCTTTCTTTTTTAG
−28A4TAGTTGATCATGGTCTTGCATCTTTCTTTTTTAG
PGK123826−35A3.00−26T4CATTCTGTTTGTTGTCTCTCTTTGGTTGCAG
PGK143151−29A3.35−33C1GGAGCCATCACATTTTCTGTTTTTGTTTTTCTCTA TAG
−29A5TCCATCACATTTTCTGTTTTTGTTTTTCTCTATAG
PGK15635−32A3.40−29A1TTGACTAGAATCTGAATGTCTTTGATCTTTTCTAG
−22G7AATCTGAATGTCTTTGATCTTTTCTAG
−21A2TATCTGAATGTCTTTGATCTTTTCTAG
PGK164664−27A3.15−28A1TTCTTTAAGTGATGATTCTTGCTTTCTCTTGTAG
−23A8TAAGTGATGATTCTTGCTTTCTCTTGTAG
PGK181499−27A3.20−27A10TAGCTCATCTTCTCTTTCACCTCTACCCCTCAG
PGK110364−35A2.75−36A10ATAGTAATGCTGTCTATGTATGTGTGCTCTCTC AAAAACAG
PKM221443−39A3.05−31A2TAATTAATACTTGTGGCTTTAAAACTTTTCCTAATAG
−29A1TTTAATACTTGTGGCTTTAAAACTTTTCCTAATAG
−25G1CTACTTGTGGCTTTAAAACTTTTCCTAATAG
−23G1CCTTGTGGCTTTAAAACTTTTCCTAATAG
PKM236930−21A2.75−25T3ACGCTTGTCATCTTCCTTCTTTTCCCCCAG
−21A4TTTGTCATCTTCCTTCTTTTCCCCCAG
PKM24487−26A2.85−38T1TGGTGTCTCCAGTTTGGACTCTTGCTTACTCTCTTGT CCCTAG
−33A1TTCTCCAGTTTGGACTCTTGCTTACTCTCTTGTCC CTAG
−23C1GGACTCTTGCTTACTCTCTTGTCCCTAG
−16A1TTGCTTACTCTCTTGTCCCTAG
−8G1CTCTTGTCCCTAG
−5C1TTGTCCCTAG
PKM25781nana−32T1CGTGCTCTGCCTCCCCTACTTACCCTTTTTCATACAG
−31C1GTGCTCTGCCTCCCCTACTTACCCTTTTTCATACAG
−28C3CTCTGCCTCCCCTACTTACCCTTTTTCATACAG
−20A1TCCCCTACTTACCCTTTTTCATACAG
−18T1ACCTACTTACCCTTTTTCATACAG
−16A2TTACTTACCCTTTTTCATACAG
PKM261343−29A3.10−39G1CCCTCTGTTCTATATAACCTCTCTCCCCCCAACTTTG TCCATCAG
−34A6TGTTCTATATAACCTCTCTCCCCCCAACTTTG TCCATCAG
−32A2TTCTATATAACCTCTCTCCCCCCAACTTTGTCCATCAG
PKM284107nana−65T2CCTTTTGTGACAAAGCTCTGACAAAGCTCTGTCCC CCTCTCGTCCCTCTGGACGGATGTTGCTCCCCTAG
−52T1AGCTCTGACAAAGCTCTGTCCCCCTCTCGTCCCTC TGGACGGATGTTGCTCCCCTAG
−50A5TCTCTGACAAAGCTCTGTCCCCCTCTCGTCCCTCTGGA CGGATGTTGCTCCCCTAG
PKM210717−25A3.75−27T1TTTACTCACCAACCTCCCTTCTCTTCCTCCAG
−26C2TTACTCACCAACCTCCCTTCTCTTCCTCCAG
−25A6TTACTCACCAACCTCCCTTCTCTTCCTCCAG
PSMB44393−40A3.05−44A4TCTGTTATTCAGCCCAATATCCCCCCATGGTTTTCC CCCAATCTCCCTAG
−40A5TTATTCAGCCCAATATCCCCCCATGGTTTTCCCCCA ATCTCCCTAG
RPL134583−21A3.30−26A1CACCCCACTTAACTCTTCTCATTCACCAACAG
−23T1CCACTTAACTCTTCTCATTCACCAACAG
−22A4TCACTTAACTCTTCTCATTCACCAACAG
RPL135492−22A3.30−22A9GTTTAACAACCTGTCTTTCTCTTCTAG
−20A1TTTAACAACCTGTCTTTCTCTTCTAG
RPL13A12205−26A3.55−22C7GAGTCCTTTTGCCCTTGTCTCCCACAG
−8T1TTGTCTCCCACAG
RPL32770−21A3.70−21A4TGTCTGACTACTGCTTTTTTTTTGCAG
−19T3CTGACTACTGCTTTTTTTTTGCAG
RPL341150−22A3.30−24T10GGAGCTGAGCTGTGTCTACCTTCTCCTAG
RPL35522−22A2.50−29T1GGCGCTGAGGTGAAGTAATGTGTATCCATTCCAG
RPL36477−34A3.45−22T3AGCCTTACACCCTTCTTGTTCATTCAG
−21A3TGCCTTACACCCTTCTTGTTCATTCAG
RPL84806nana−23C5GTTCCCTGAGGTATCTGATCCCCTACAG
SLC25A321591−30A2.85−31A1ATATTAAAATGCATGGTGTGTCTTCTCTTACTACAG
SNRPB12929−24A3.20−24A1TGTCTCATCCCTGTCCATTTCTCCTTGCAG
SNRPB21787−34A3.85−36T2ACCTCTAACACTTTTTTTGTTCCTTCTAAAC CTCTCTTTAG
−35A5CCTCTAACACTTTTTTTGTTCCTTCTAAACC TCTCTTTAG
−34A2TCTCTAACACTTTTTTTGTTCCTTCTAAAC CTCTCTTTAG
SNRPB31808−34A3.10−30G2CACTGGGCATCAGAGCATATTTGTTTATTT TTCAG
−29G3ACTGGGCATCAGAGCATATTTGTTTATTTT TCAG
−28C1GCTGGGCATCAGAGCATATTTGTTTATTTTTCAG
−27A2TGGGCATCAGAGCATATTTGTTTATTTTTCAG
SNRPB4519−25A3.75−27T7TCTTCTAACTCTTTCTTCTTATGTCCTCTTAG
−26A1TCTTCTAACTCTTTCTTCTTATGTCCTCTTAG
−25A5TTTCTAACTCTTTCTTCTTATGTCCTCTTAG
SNRPB6696−25A3.95−27T8GGCACTGACTAAACTTCTTACTCTTACTTCAG
UBB1717−33A3.45−30T6TGAGGTGACACGCTTATGTTTTACTTTTAAA CTAG
−29G1GAGGTGACACGCTTATGTTTTACTTTTAA ACTAG
−28A2TAGGTGACACGCTTATGTTTTACTTTTAAACTAG

aPredicted BPs and BPS scores are according to the Branch-Site Analyzer at http://ast.bioinfo.tau.ac.il/BranchSite.htm (14).

bObserved branch sites are underlined.

Analyzed introns and observed branch points aPredicted BPs and BPS scores are according to the Branch-Site Analyzer at http://ast.bioinfo.tau.ac.il/BranchSite.htm (14). bObserved branch sites are underlined. The 367 clones were divided into two classes: 181 clones carrying misincorporated nucleotides at the branch points, and 186 clones without misincorporated nucleotides. For those carrying misincorporated nucleotides, we could pinpoint the exact branch points (Figure 1A). On the other hand, for those carrying no misincorporated nucleotides, the reverse transcriptase might have skipped one or two nucleotides at the 2′–5′ phosphodiester bond at the branch points (Figure 1B). Among the 367 clones, we observed two or more possible branch sites in 36 of 52 introns. The 36 introns carried a total of 101 possible branch sites. Among the 101 sites, 25 were followed by an immediate downstream branch site, making 25 possible branch-site pairs. Among the 25 upstream branch sites, 19 carried no misincorporated nucleotides. In addition, 13 of the 19 upstream sites were followed by an ‘A’ nucleotide. Furthermore, when we simply deduced the consensus BPS from all 367 clones, the consensus BPS became more degenerative and less informative (data not shown). These findings suggest that the observed upstream branch points are likely due to skipping of a nucleotide in lariat RT-PCR. We thus employed the 181 clones carrying misincorporated nucleotides at the branch points in the following analyses unless otherwise stated. We counted each clone as a single occurrence of a branch point in order to weigh the preferred branch points. For example, in PGK1 intron 6, eight clones mapped to ‘A’ at position −23, whereas one clone pointed to ‘A’ at position −28 (Table 3). We assumed that the branch point at position −23 was eight times more frequently employed than that at position −28. This analysis method might have overweighed introns that gave rise to more clones. An alternative analysis method would be to make the contribution of each intron equal regardless of the number of available clones. The alternative method, however, is also biased in favor of introns with fewer clones. For example, PGK1 intron 8 had a single available clone mapping to position −27, whereas EEF1A1 intron 1 had ten clones all mapping to position −23. A single clone of PGK1 might have arisen from one of many branch points, and we might have sequenced it by chance. On the other hand, it is likely that EEF1A1 intron 1 indeed had a single branch point. We analyzed our data using both methods and obtained similar results (data not shown), except that the frequency of C at position −1 was slightly lower with the alternative method (44.8% versus 36.3%). In the current communication, we employed the former method, in which each clone was counted as a single occurrence of a branch site.

Positions and sequence motif of the branch points

Analysis of the 181 clones revealed that the positions of the branch points were from −50 to −5, where position −1 represents the 3′ end of an intron (Figure 3A). Among the 181 sites, 150 (83%) were at positions −34 to −21.
Figure 3.

(A) Positions and nucleotides of 181 branch points with misincorporated nucleotides in our studies, where position −1 represents the 3′ end of an intron. The median value of the branch points is −26, and the mean and SD is −27.7 ± 7.6. Among the 181 sites with misincorporated nucleotides at the branch points, 150 sites (83%) are at positions −34 to −21 (horizontal bar on top). Native nucleotides, not the misincorporated nucleotides, are indicated. Nucleotide preferences (B and D) and information contents (C and E) are deduced from 181 branch points. (B and C) Plots are aligned in respect to the branch point (closed arrows), which is designated as position 0. Open arrows point to peaks of information contents at positions +7 and +8. A polypyrimidine stretch starts from position +4 down to position +24 (bars). The plots are truncated at position +25, because the numbers of observations fall below 40 after position +25, and the plots become less informative and uneven. The last three nucleotides of introns are excluded from the plots. (D and E) Plots are aligned in respect to the 3′ end of each intron, which is designated as position −1. A polypyrimidine stretch spans positions −19 to −5 (bars).

(A) Positions and nucleotides of 181 branch points with misincorporated nucleotides in our studies, where position −1 represents the 3′ end of an intron. The median value of the branch points is −26, and the mean and SD is −27.7 ± 7.6. Among the 181 sites with misincorporated nucleotides at the branch points, 150 sites (83%) are at positions −34 to −21 (horizontal bar on top). Native nucleotides, not the misincorporated nucleotides, are indicated. Nucleotide preferences (B and D) and information contents (C and E) are deduced from 181 branch points. (B and C) Plots are aligned in respect to the branch point (closed arrows), which is designated as position 0. Open arrows point to peaks of information contents at positions +7 and +8. A polypyrimidine stretch starts from position +4 down to position +24 (bars). The plots are truncated at position +25, because the numbers of observations fall below 40 after position +25, and the plots become less informative and uneven. The last three nucleotides of introns are excluded from the plots. (D and E) Plots are aligned in respect to the 3′ end of each intron, which is designated as position −1. A polypyrimidine stretch spans positions −19 to −5 (bars). We observed U at position −2 in 74.6% branch sites, and A at position 0 in 92.3% branch sites (Table 4). In addition, pyrimidines were observed at positions −3 and +1 in 79.0% and 75.1% branch sites, respectively (Table 4). We can thus conclude that the human consensus BPS is yUnAy at positions −3 to +1 (Figure 2C), where the branch site is underlined and the less conserved nucleotides are indicated in lowercase letters. The information contents of 0.27 and 0.23 at positions −3 and +1, however, were not as high as those of 0.85 and 1.48 at positions −2 and 0 (Figure 2D), or 0.39 ± 0.12 (mean ± SD) of the polypyrimidine tract at positions +4 to +24 (Figure 3C). Therefore, the consensus sequence alternatively becomes UnA according to the information contents (Figure 2D).
Table 4.

Nucleotide frequencies at the 181 branch sites

Position−5−4−3−2−10123
A0.2540.2320.0830.0660.1660.9230.1820.3020.201
C0.2100.2270.4700.1600.4480.0330.3310.2740.391
G0.2540.1930.1270.0280.1770.0170.0660.1120.112
U0.2820.3480.3200.7460.2100.0280.4200.3130.296
Nucleotide frequencies at the 181 branch sites Among the 41 introns yielding the 181 clones, 14 introns carried multiple branch sites. In eight of the 14 introns (57%), the most downstream branch sites were most frequently used (Table 3). Although the ratio of 57% was not high, the downstream branch sites were four to eight times more frequently used than the upstream sites in four of the eight introns. We could not observe this magnitude of differential branch site usage in the remaining six introns, in which the downstream branch points were not overrepresented. Accordingly, when there are multiple branch points, downstream branch points are more likely to be employed than their upstream counterparts. We also predicted BPSs of our housekeeping genes with the Branch Site Analyzer (14), and found that the actual branch sites matched to the predicted positions in 80 of the 181 sites (44.2%) (Table 3).

Alignment of polypyrimidine tract in respect to the branch point

We next aligned the PPT's in respect to the 181 branch points (Figure 3B and C). We observed a polypyrimidine stretch from position +4 down to position +24. The ‘U’ nucleotide was preferred over ‘C’ especially at positions +4 to +12 in the PPT. Alignment of the PPT in respect to the 3′ end of an intron also demonstrated a stretch of pyrimidines from positions −20 to −4 (Figure 3D and E). The information contents at the PPT's were similar between the two alignments. We observed peaks of information contents seven and eight nucleotides downstream of the branch point. The functional significance of these peaks, however, remains elusive.

Information obtained from lariat RT-PCR clones without misincorporated nucleotides

We next asked if we could exploit the 186 clones without misincorporated nucleotides at the branch point. If there was an ‘A’ nucleotide one or two nucleotides downstream of the sequenced branch point and the sequenced branch point was not ‘A’, we assumed that one or two nucleotides were skipped by the reverse transcriptase and that the particular downstream ‘A’ was the actual branch point. A similar assumption has also been applied to three other genes in previous reports (15–17). We aligned the branch points under this assumption, and plotted the nucleotide preferences and the information contents (Figure 2E and F). Compared to those of misincorporated nucleotides, the information contents were generally lower, but the Pictogram and WebLogo presentations resulted in similar patterns. These analyses suggest that one or two nucleotides were skipped when there were no misincorporated nucleotides, but definite experimental evidence is lacking to employ these clones to deduce the human consensus BPS.

DISCUSSION

Highly degenerative human BPS

We determined splicing branch points in 52 introns of 20 human housekeeping genes by lariat RT-PCR. Our analysis disclosed the following features (Figure 4). First, 83% of the branch points are located 21–34 nucleotides upstream of the 3′ end of an intron (Figure 3A). Second, a polypyrimidine stretch spans 4–24 nucleotides downstream of the branch point (Figure 3B and C). Third, the human branch point consensus sequence is yUnAy (Figure 2C and D). The first and the second features underscore the previous in silico observations (6,14), whereas the degeneracy of the human BPS is more than we have expected.
Figure 4.

Representative composition of the branch point sequence (arrow) and the PPT deduced from our studies.

Representative composition of the branch point sequence (arrow) and the PPT deduced from our studies. It is interesting to note that among the six consensus BPSs proposed for the mammalian branch points (Table 1), the shared nucleotides are yUnAy, which is identical to that determined by our analysis. SF1 binds to BPS using its KH domain (18). NMR analysis of SF1 bound to the BPS revealed that a hydrophobic motif of Gly-Pro-Arg-Gly within the KH domain builds hydrogen bonds with ‘UAA’ at positions −2 to 0 of the yeast BPS, ‘UACUAAC’ (19). Our analysis suggests that the binding of the KH domain to position −1 may enhance, but may be dispensable for, the recognition of the BPS. Berglund and colleagues (20) also demonstrate that, in ‘UACUAAC’ at positions −5 to +1, nucleotide substitutions only at position −2 or 0, but not at the other positions, compromise the binding of SF1.

Non-‘A’ nucleotides at position 0

We observed an ‘A’ nucleotide at 92.3% of the branch points. Non-‘A’ nucleotides at the branch point have been reported in CALCA1 (21) and GH1 (22) (Table 2). The two reports demonstrate six such examples in four introns. As these unusual branch points constitute 21% (6/29) of the previously reported in vitro determined branch points, the ratio of ‘A’ at the branch point is reduced to 79% (Table 2). Additionally, the potential observation bias posed by these unusual BPSs may account for the differences in the Pictogram and WebLogo patterns between the previously identified BPSs (Figure 2A and B) and our BPSs (Figure 2C and D).

Disease-causing mutations disrupting BPSs

According to the Human Gene Mutation Database (23), splicing mutations account for 13.7% (1768 of 12 879) of single nucleotide substitutions. Most splicing mutations, however, are at the splice donor or acceptor sites. To our knowledge, sixteen disease-causing mutations and a single polymorphism disrupt BPSs and give rise to aberrant splicings (Table 5). Nine variants are at position 0, and the other eight are at position −2. Among the nine variants affecting position 0, seven are A-to-G mutations, which supports the notion reported by Kralovicova and colleagues (24) that A-to-G transitions at position 0 are more deleterious than A-to-T or A-to-C transversions. For all the variants, aberrant splicings have been determined either in patients or minigenes. The actual branch points, however, have been identified only in two variants by lariat RT-PCR, whereas the remaining fourteen variants have been mapped to putative BPSs. Exclusive confinement of BPS-disrupting nucleotide changes at positions −2 and 0 also underscores our observation that the BPS consensus is yUnAy.
Table 5.

Sixteen mutations and a single polymorphism disrupting BPSs

Gene and intronSequenceConsequenceReference
LCAT intron4
Wild-typeCCCTGAC
MutantCCCCGACIntron retentiona(43,44)
MutantCCCGGACIntron retentionb(45)
MutantCCCAGACIntron retentionb(45)
FBN2 intron30
Wild-typeTACTAAG
MutantTACGAAGExon skippinga(46)
COL5A1 intron32
Wild-typeGACTGAC
MutantGACGGACExon skippinga(47)
ITGB4 intron31
Wild-typeGGCTCAC
MutantGGCACACIntron retention,a cryptic 3′ splice sitea(48)
TH intron10
Wild-typeGGCTGAT
MutantGGCAGATExon skipping,a cryptic 3′ splice sitea(49)
L1CAM intron18
Wild-typeATCCAAG
MutantATCCACGcryptic 3′ splice sitea(50)
LIPC intron1
Wild-typeCCCCAAT
MutantCCCCAGTcryptic 3′ splice sitea(51)
FBN2 intron28
Wild-typeTTGCAAT
MutantTTGCAGTExon skippinga(52)
HEXB intron10
Wild-typeTTGCAAT
MutantTTGCAGTCryptic 3′ splice sitea(53)
NF2 intron5
Wild-typeTTCTAGC
MutantTTCTAACIntron retentiona(54)
TSC2 intron38
Wild-typeGCGTGAC
MutantGCGTGGCCryptic 3′ splice site,a intron retentiona(55)
XPC intron3c
Wild-typeTACTGAT
MutantTACTGGTExon skippinga(16)
NPC1 intron6
Wild-typeCACTAAT
MutantCACTAGTExon skippinga(56)
F9 intron 2
Wild-typeCGTTAAT
MutantCGTTAGTExon skippingb(24,57)
DQB1 intron 3c,d
Genotype ACACAGACExon skippingb(17)
Genotype UCACUGACExon inclusionb(17)

Mutations or a polymorphism are underlined. Aberrant splicings have been determined in patientsa or minigenesb.

cBranch points have been identified by lariat RT-PCR. Others are putative BPSs lacking in vitro evidence.

dPolymorphism.

Sixteen mutations and a single polymorphism disrupting BPSs Mutations or a polymorphism are underlined. Aberrant splicings have been determined in patientsa or minigenesb. cBranch points have been identified by lariat RT-PCR. Others are putative BPSs lacking in vitro evidence. dPolymorphism. Conversely, mutations disrupting yUnAy are not always deleterious. When the branch point ‘A’ is mutated or deleted, a neighboring cryptic ‘A’ residue is employed as a branch point (25–27), or the mutant ‘C’, ‘G’ or ‘U’ residue is used as a surrogate branch point (28). Additionally, we observed two or more branch sites in 15 of 41 introns (Table 3), which also implies that a mutation-harboring BPS can be readily substituted for by another BPS.

How is the highly degenerative BPS recognized?

It is hard to believe that SF1 simply recognizes yUnAy. We expect that SF1 recognizes the BPS along with the other cis-element(s) and their interacting trans-factor(s). The SELEX screening of the yeast BBP binding motifs revealed a stem and loop structure immediately upstream of the BPS of ‘UACUAAC’ in 9 out of 48 selected motifs (29). A gel shift assay also showed preferential binding of human SF1 to ‘UACUAAC’ carrying an upstream stem and loop. Our BPSs, however, had no upstream stem and loop structures (data not shown). An upstream stem and loop may help recognize highly degenerative mammalian BPSs for a subset of introns. In the early step of the spliceosome assembly, SF1, U2AF65 and U2AF35 bind to the BPS, the PPT and AG at the 3′ end of an intron, respectively, to form complex E (1,2). In S. pombe, SF1/BBP is tightly associated with U2AF59, a yeast homolog of mammalian U2AF65 recognizing the PPT, as well as with U2AF23, a yeast homolog of mammalian U2AF35 recognizing the 3′ AG (30). In mammals, the association between SF1 and U2AF65 is mediated by the 28 N-terminal amino acids of the KH domain of SF1(31) and by the third RBD of U2AF65 (32). Wang and colleagues determined that Ser20 in the N-terminal region of the KH domain is essential for binding to U2AF65 and that phosphorylation of Ser20 inhibits its binding and formation of complex A (33). Berglund and colleagues also report that the SF1-U2AF65 interaction promotes cooperative binding to the BPS and the PPT (32). Our analysis also demonstrates positional association of the BPS and the PPT (Figure 3B and C). On the other hand, Kent and colleagues demonstrate that U2AF65 and U2AF35 are dispensable for the binding of SF1 to the BPS (34). Sharma and colleagues similarly show that complex H includes SF1 in the absence of U2AF65 and U2AF35 (35). Although the exact order of the SF1, U2AF65 and U2AF35 assembly remains elusive, the BPS is possibly recognized along with the PPT and the 3′ AG. Alternatively, SF1 is bound to any yUnAy sequences in complex H, and a particular SF1 that successfully associates with the U2AF heterodimer exclusively survives to form complex E.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  55 in total

1.  Conservation of functional domains involved in RNA binding and protein-protein interactions in human and Saccharomyces cerevisiae pre-mRNA splicing factor SF1.

Authors:  J C Rain; Z Rafi; Z Rhani; P Legrain; A Krämer
Journal:  RNA       Date:  1998-05       Impact factor: 4.942

2.  Statistical features of human exons and their flanking regions.

Authors:  M Q Zhang
Journal:  Hum Mol Genet       Date:  1998-05       Impact factor: 6.150

3.  Reported in vivo splice-site mutations in the factor IX gene: severity of splicing defects and a hypothesis for predicting deleterious splice donor mutations.

Authors:  R P Ketterling; J B Drost; W A Scaringe; D Z Liao; J Z Liu; C K Kasper; S S Sommer
Journal:  Hum Mutat       Date:  1999       Impact factor: 4.878

4.  Phosphorylation of splicing factor SF1 on Ser20 by cGMP-dependent protein kinase regulates spliceosome assembly.

Authors:  X Wang; S Bruderer; Z Rafi; J Xue; P J Milburn; A Krämer; P J Robinson
Journal:  EMBO J       Date:  1999-08-16       Impact factor: 11.598

5.  A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition.

Authors:  J A Berglund; N Abovich; M Rosbash
Journal:  Genes Dev       Date:  1998-03-15       Impact factor: 11.361

6.  Splicing modulation of integrin beta4 pre-mRNA carrying a branch point mutation underlies epidermolysis bullosa with pyloric atresia undergoing spontaneous amelioration with ageing.

Authors:  S Chavanas; Y Gache; J Vailly; J Kanitakis; L Pulkkinen; J Uitto; J Ortonne; G Meneguzzi
Journal:  Hum Mol Genet       Date:  1999-10       Impact factor: 6.150

7.  Branch site haplotypes that control alternative splicing.

Authors:  Jana Královicová; Sophie Houngninou-Molango; Angela Krämer; Igor Vorechovsky
Journal:  Hum Mol Genet       Date:  2004-10-20       Impact factor: 6.150

8.  The KH domain of the branchpoint sequence binding protein determines specificity for the pre-mRNA branchpoint sequence.

Authors:  J A Berglund; M L Fleming; M Rosbash
Journal:  RNA       Date:  1998-08       Impact factor: 4.942

9.  A point mutation in an intronic branch site results in aberrant splicing of COL5A1 and in Ehlers-Danlos syndrome type II in two British families.

Authors:  N P Burrows; A C Nicholls; A J Richards; C Luccarini; J B Harrison; J R Yates; F M Pope
Journal:  Am J Hum Genet       Date:  1998-08       Impact factor: 11.025

10.  Two mutations remote from an exon/intron junction in the beta-hexosaminidase beta-subunit gene affect 3'-splice site selection and cause Sandhoff disease.

Authors:  M Fujimaru; A Tanaka; K Choeh; N Wakamatsu; H Sakuraba; G Isshiki
Journal:  Hum Genet       Date:  1998-10       Impact factor: 4.132

View more
  95 in total

1.  Intronic deletions that disrupt mRNA splicing of the tva receptor gene result in decreased susceptibility to infection by avian sarcoma and leukosis virus subgroup A.

Authors:  Markéta Reinišová; Jiří Plachý; Kateřina Trejbalová; Filip Šenigl; Dana Kučerová; Josef Geryk; Jan Svoboda; Jiří Hejnar
Journal:  J Virol       Date:  2011-12-14       Impact factor: 5.103

2.  Autoregulated splicing of muscleblind-like 1 (MBNL1) Pre-mRNA.

Authors:  Devika P Gates; Leslie A Coonrod; J Andrew Berglund
Journal:  J Biol Chem       Date:  2011-08-09       Impact factor: 5.157

3.  RNA splicing and debranching viewed through analysis of RNA lariats.

Authors:  Zhi Cheng; Thomas M Menees
Journal:  Mol Genet Genomics       Date:  2011-11-08       Impact factor: 3.291

4.  Functional analysis of synonymous substitutions predicted to affect splicing of the CFTR gene.

Authors:  Alexandra Scott; Hanna M Petrykowska; Timothy Hefferon; Valer Gotea; Laura Elnitski
Journal:  J Cyst Fibros       Date:  2012-05-14       Impact factor: 5.482

5.  Mutational analysis of the U12-dependent branch site consensus sequence.

Authors:  Jay E Brock; Rosemary C Dietrich; Richard A Padgett
Journal:  RNA       Date:  2008-09-29       Impact factor: 4.942

6.  Detecting and characterizing circular RNAs.

Authors:  William R Jeck; Norman E Sharpless
Journal:  Nat Biotechnol       Date:  2014-05       Impact factor: 54.908

7.  Alternative splicing and gene polymorphism of the human TAP3/SEC14L4 gene.

Authors:  Petra Kempna; Roberta Ricciarelli; Angelo Azzi; Jean-Marc Zingg
Journal:  Mol Biol Rep       Date:  2009-12-10       Impact factor: 2.316

8.  Congenital erythropoietic porphyria: a novel uroporphyrinogen III synthase branchpoint mutation reveals underlying wild-type alternatively spliced transcripts.

Authors:  David F Bishop; Xiaoye Schneider-Yin; Sonia Clavero; Han-Wook Yoo; Elisabeth I Minder; Robert J Desnick
Journal:  Blood       Date:  2009-11-24       Impact factor: 22.113

Review 9.  The power of fission: yeast as a tool for understanding complex splicing.

Authors:  Benjamin Jung Fair; Jeffrey A Pleiss
Journal:  Curr Genet       Date:  2016-09-14       Impact factor: 3.886

10.  Circular RNAs are abundant, conserved, and associated with ALU repeats.

Authors:  William R Jeck; Jessica A Sorrentino; Kai Wang; Michael K Slevin; Christin E Burd; Jinze Liu; William F Marzluff; Norman E Sharpless
Journal:  RNA       Date:  2012-12-18       Impact factor: 4.942

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.