| Literature DB >> 18285363 |
Kaiping Gao1, Akio Masuda, Tohru Matsuura, Kinji Ohno.
Abstract
Yeast carries a strictly conserved branch point sequence (BPS) of UACUAAC, whereas the human BPS is degenerative and is less well characterized. The human consensus BPS has never been extensively explored in vitro to date. Here, we sequenced 367 clones of lariat RT-PCR products arising from 52 introns of 20 human housekeeping genes. Among the 367 clones, a misincorporated nucleotide at the branch point was observed in 181 clones, for which we can precisely pinpoint the branch point. The branch points were comprised of 92.3% A, 3.3% C, 1.7% G and 2.8% U. Our analysis revealed that the human consensus BPS is simply yUnAy, where the underlined is the branch point at position zero and the lowercase pyrimidines ('y') are not as well conserved as the uppercase U and A. We found that the branch points are located 21-34 nucleotides upstream of the 3' end of an intron in 83% clones. We also found that the polypyrimidine tract spans 4-24 nucleotides downstream of the branch point. Our analysis demonstrates that the human BPSs are more degenerative than we have expected and that the human BPSs are likely to be recognized in combination with the polypyrimidine tract and/or the other splicing cis-elements.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18285363 PMCID: PMC2367711 DOI: 10.1093/nar/gkn073
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Previously reported consensus BPSs
| Consensus BPS | Note | References |
|---|---|---|
| UACUA | Invariant BPS | ( |
| Mammals | ||
| YNYUR | 11 mammalian BPSs | ( |
| YNCUR | 20 mammalian BPSs | ( |
| YNCUR | 15 BPSs of human HBB | ( |
| CUR | ( | |
| YUV | ( |
Branch point ‘A’ is underlined. Y, U or C; R, A or G; S, G or C; V, A, C or G.
aLow GC% region.
bHigh GC% region.
Figure 1.(A) Lariat RT-PCR of PGK1 intron 6 indicates a misincorporated ‘A’ nucleotide at the branch point. We can pinpoint the branch point in this situation. The sequencing is performed with primer B. The small dots indicate the 5′ end of an intron. (B) Lariat RT-PCR of GAPDH intron 2 exhibits no misincorporated nucleotide. The branch point can be either at ‘C’ or upstream ‘T’ depending on whether skipping of the reverse transcriptase occurs or not. We cannot locate the exact branch point in this case. The sequencing is performed with primer A.
Previously identified mammalian and viral BPSs
| Species | Gene | Intron | BPS | Position | Predicted BP | BPS Score | Reference |
|---|---|---|---|---|---|---|---|
| H. sapiens | 4 | CACTC | −36b | −36A | 3.85 | ( | |
| H. sapiens | 3 | TACTG | −23b | −42A | 2.60 | ( | |
| H. sapiens | 3 | GTACT | −24b | −42A | 2.60 | ( | |
| H. sapiens | 3 | GGTGC | −32b | −42A | 2.60 | ( | |
| H. sapiens | 1 | CCTCC | −23b | −23A | 2.75 | ( | |
| H. sapiens | 3 | CACAG | −21c | −21A | 3.25 | ( | |
| H. sapiens | 1 | CTCTG | −22b | na | na | ( | |
| H. sapiens | 1 | GGCTC | −28b | na | na | ( | |
| H. sapiens | 1 | TGCTC | −36b | na | na | ( | |
| H. sapiens | 4 | GCCTC | −24b | −37A | 2.80 | ( | |
| H. sapiens | 4 | ACCCA | −37b | −37A | 2.80 | ( | |
| H. sapiens | 4 | TACCC | −38b | −37A | 2.80 | ( | |
| H. sapiens | 1 | CCCTC | −19b | −37A | 3.25 | ( | |
| H. sapiens | 2 | CACTG | −18b | −18A | 3.95 | ( | |
| H. sapiens | 1 | CACTG | −37b | −37A | 3.95 | ( | |
| H. sapiens | 1 | CTCTA | −31b | −31A | 3.45 | ( | |
| H. sapiens | 1 | TTCTG | −30b | −30A | 3.85 | ( | |
| H. sapiens | 5 | TGCTA | −31b | na | na | ( | |
| H. sapiens | 3 | TGTTG | −4b | na | na | ( | |
| H. sapiens | 3 | TACTG | −24b | na | na | ( | |
| M. musculus | 1 | CACTA | −36b | −36A | 3.85 | ( | |
| M. musculus | 5 | AATTC | −22b | −22A | 3.30 | ( | |
| R. norvesicus | 1 | CCTCA | −18b | −18A | 3.15 | ( | |
| O. cuniculus | 1 | TGCTG | −34b | −34A | 3.85 | ( | |
| O. cuniculus | 2 | TGCTA | −32b | −32A | 3.75 | ( | |
| Adenovirus 5 | 1 | GTTTA | −30b | −30A | 2.70 | ( | |
| Adenovirus 2 | 1 | GACTG | −26b | −26A | 3.70 | ( | |
| Adenovirus 5 | 1 | TACTT | −24b | −24A | 3.05 | ( | |
| SV40 | 1 | ATTCT | −19b | −19A | 2.00 | ( |
aPredicted BPs and BPS scores are according to the Branch-Site Analyzer at http://ast.bioinfo.tau.ac.il/BranchSite.htm (14). na, not available.
bIdentified by the primer extension method.
cIdentified by lariat RT-PCR.
Figure 2.Pictogram (A, C and E) and WebLogo (B, D and F) presentations of mammalian BPSs. (A and B) Twenty-nine mammalian and viral BPSs identified by in vitro experiments (Table 2). (C and D) BPSs with a misincorporated nucleotide at the branch point in our studies. (E and F) BPSs without a misincorporated nucleotide at the branch point in our studies. We assume that ‘A’ residue one or two nucleotides downstream of the sequenced branch point is the actual branch point (see Figure 1B). Position 0 represents the branch point.
Analyzed introns and observed branch points
| Gene | Intron | Intron size (bp) | Predicted BP | BPS Score | Observed BP | Number of clones | Misincorporated nucleotide | Intronic sequence from BPS position −5 to the 3′ end of an intron |
|---|---|---|---|---|---|---|---|---|
| 3 | 441 | −24A | 3.10 | −30A | 10 | – | TCCCC | |
| 8 | 984 | −24A | 3.05 | −26T | 2 | – | TGTCT | |
| −25A | 2 | T | GTCTT | |||||
| −5A | 1 | – | ACCCC | |||||
| 4 | 1069 | −21A | 3.30 | −32A | 3 | – | TGAAT | |
| −30T | 1 | – | AATAG | |||||
| −21A | 2 | T | AATTC | |||||
| 11 | 845 | −44A | 2.95 | −24A | 6 | T | GCTTC | |
| −10C | 1 | – | GTTTG | |||||
| 2 | 366 | −24A | 2.80 | −28A | 2 | T | TAGTA | |
| −19C | 1 | – | AGTAA | |||||
| 1 | 943 | −33A | 3.25 | −23A | 10 | T | GGTTC | |
| 2 | 2837 | −33A | 3.25 | −27T | 6 | – | ATTGC | |
| −25C | 2 | T | TGCTA | |||||
| 4 | 2394 | −21A | 3.45 | −21A | 10 | T | CCCTC | |
| 5 | 737 | −32A | 3.20 | −27A | 6 | T | ACTTC | |
| −24C | 1 | – | TCATT | |||||
| 6 | 615 | −36A | 2.85 | −30T | 2 | G | CCCAG | |
| −27C | 3 | A | AGTGC | |||||
| −26A | 3 | – | GTGCC | |||||
| 7 | 796 | −25A | 3.05 | −38A | 3 | T | TACCT | |
| −28C | 1 | – | TTTTC | |||||
| −27A | 2 | T | TTTCC | |||||
| −26A | 2 | T | TTCCA | |||||
| 9 | 547 | −30A | 2.70 | −5T | 2 | A | TGGCT | |
| 11 | 1457 | −26A | 2.95 | −48T | 4 | – | AGGTC | |
| −47G | 2 | – | GGTCT | |||||
| −46A | 3 | T | GTCTG | |||||
| 1 | 6031 | −28A | 2.90 | −30A | 2 | – | AGGAC | |
| −28A | 4 | T | GACAA | |||||
| 8 | 3212 | −26A | 2.50 | −31A | 2 | – | AAGTC | |
| −26A | 1 | T | AAATC | |||||
| −25A | 1 | T | AATCA | |||||
| 10 | 2978 | −33A | 3.60 | −33A | 10 | – | GACTC | |
| 2 | 1633 | −26A | 3.35 | −6T | 10 | – | TTGTC | |
| 1 | 734 | −39A | 3.15 | −23A | 1 | T | TTTTA | |
| −19A | 1 | – | AAACC | |||||
| −17A | 1 | C | ACCAG | |||||
| 6 | 448 | −18A | 3.15 | −22A | 6 | T | GTACC | |
| −23C | 1 | – | TGTAC | |||||
| 10 | 777 | −22A | 3.20 | −24T | 1 | – | CAATC | |
| −22A | 2 | T | ATCTA | |||||
| 1 | 1434 | −22A | 2.75 | −18A | 1 | T | AATTA | |
| 2 | 7526 | −22A | 3.35 | −24T | 4 | – | GGTTC | |
| −23A | 1 | T | GTTCT | |||||
| −22A | 6 | T | TTCTA | |||||
| 1 | 5461 | −28A | 3.00 | −29G | 2 | – | AAGTT | |
| −28A | 4 | T | AGTTG | |||||
| 2 | 3826 | −35A | 3.00 | −26T | 4 | – | CATTC | |
| 4 | 3151 | −29A | 3.35 | −33C | 1 | – | GGAGC | |
| −29A | 5 | T | CCATC | |||||
| 5 | 635 | −32A | 3.40 | −29A | 1 | T | TGACT | |
| −22G | 7 | – | AATCT | |||||
| −21A | 2 | T | ATCTG | |||||
| 6 | 4664 | −27A | 3.15 | −28A | 1 | T | TCTTT | |
| −23A | 8 | T | AAGTG | |||||
| 8 | 1499 | −27A | 3.20 | −27A | 10 | T | AGCTC | |
| 10 | 364 | −35A | 2.75 | −36A | 10 | – | ATAGT | |
| 2 | 1443 | −39A | 3.05 | −31A | 2 | T | AATTA | |
| −29A | 1 | T | TTAAT | |||||
| −25G | 1 | C | TACTT | |||||
| −23G | 1 | C | CTTGT | |||||
| 3 | 6930 | −21A | 2.75 | −25T | 3 | – | ACGCT | |
| −21A | 4 | T | TTGTC | |||||
| 4 | 487 | −26A | 2.85 | −38T | 1 | – | TGGTG | |
| −33A | 1 | T | TCTCC | |||||
| −23C | 1 | – | GGACT | |||||
| −16A | 1 | T | TGCTT | |||||
| −8G | 1 | – | CTCTT | |||||
| −5C | 1 | – | TTGTC | |||||
| 5 | 781 | na | na | −32T | 1 | – | CGTGC | |
| −31C | 1 | – | GTGCT | |||||
| −28C | 3 | – | CTCTG | |||||
| −20A | 1 | T | CCCCT | |||||
| −18T | 1 | A | CCTAC | |||||
| −16A | 2 | T | TACTT | |||||
| 6 | 1343 | −29A | 3.10 | −39G | 1 | C | CCTCT | |
| −34A | 6 | T | GTTCT | |||||
| −32A | 2 | T | TCTAT | |||||
| 8 | 4107 | na | na | −65T | 2 | – | CCTTT | |
| −52T | 1 | – | AGCTC | |||||
| −50A | 5 | T | CTCTG | |||||
| 10 | 717 | −25A | 3.75 | −27T | 1 | – | TTTAC | |
| −26C | 2 | – | TTACT | |||||
| −25A | 6 | T | TACTC | |||||
| 4 | 393 | −40A | 3.05 | −44A | 4 | T | CTGTT | |
| −40A | 5 | T | TATTC | |||||
| 4 | 583 | −21A | 3.30 | −26A | 1 | C | ACCCC | |
| −23T | 1 | – | CCACT | |||||
| −22A | 4 | T | CACTT | |||||
| 5 | 492 | −22A | 3.30 | −22A | 9 | – | GTTTA | |
| −20A | 1 | T | TTAAC | |||||
| 1 | 2205 | −26A | 3.55 | −22C | 7 | – | GAGTC | |
| −8T | 1 | – | TTGTC | |||||
| 2 | 770 | −21A | 3.70 | −21A | 4 | T | GTCTG | |
| −19T | 3 | – | CTGAC | |||||
| 4 | 1150 | −22A | 3.30 | −24T | 10 | – | GGAGC | |
| 5 | 522 | −22A | 2.50 | −29T | 1 | – | GGCGC | |
| 6 | 477 | −34A | 3.45 | −22T | 3 | – | AGCCT | |
| −21A | 3 | T | GCCTT | |||||
| 4 | 806 | na | na | −23C | 5 | – | GTTCC | |
| 2 | 1591 | −30A | 2.85 | −31A | 1 | – | ATATT | |
| 1 | 2929 | −24A | 3.20 | −24A | 1 | T | GTCTC | |
| 2 | 1787 | −34A | 3.85 | −36T | 2 | – | ACCTC | |
| −35A | 5 | – | CCTCT | |||||
| −34A | 2 | T | CTCTA | |||||
| 3 | 1808 | −34A | 3.10 | −30G | 2 | – | CACTG | |
| −29G | 3 | – | ACTGG | |||||
| −28C | 1 | G | CTGGG | |||||
| −27A | 2 | – | TGGGC | |||||
| 4 | 519 | −25A | 3.75 | −27T | 7 | – | TCTTC | |
| −26A | 1 | T | CTTCT | |||||
| −25A | 5 | T | TTCTA | |||||
| 6 | 696 | −25A | 3.95 | −27T | 8 | – | GGCAC | |
| 1 | 717 | −33A | 3.45 | −30T | 6 | – | TGAGG | |
| −29G | 1 | – | GAGGT | |||||
| −28A | 2 | T | AGGTG |
aPredicted BPs and BPS scores are according to the Branch-Site Analyzer at http://ast.bioinfo.tau.ac.il/BranchSite.htm (14).
bObserved branch sites are underlined.
Figure 3.(A) Positions and nucleotides of 181 branch points with misincorporated nucleotides in our studies, where position −1 represents the 3′ end of an intron. The median value of the branch points is −26, and the mean and SD is −27.7 ± 7.6. Among the 181 sites with misincorporated nucleotides at the branch points, 150 sites (83%) are at positions −34 to −21 (horizontal bar on top). Native nucleotides, not the misincorporated nucleotides, are indicated. Nucleotide preferences (B and D) and information contents (C and E) are deduced from 181 branch points. (B and C) Plots are aligned in respect to the branch point (closed arrows), which is designated as position 0. Open arrows point to peaks of information contents at positions +7 and +8. A polypyrimidine stretch starts from position +4 down to position +24 (bars). The plots are truncated at position +25, because the numbers of observations fall below 40 after position +25, and the plots become less informative and uneven. The last three nucleotides of introns are excluded from the plots. (D and E) Plots are aligned in respect to the 3′ end of each intron, which is designated as position −1. A polypyrimidine stretch spans positions −19 to −5 (bars).
Nucleotide frequencies at the 181 branch sites
| Position | −5 | −4 | −3 | −2 | −1 | 0 | 1 | 2 | 3 |
|---|---|---|---|---|---|---|---|---|---|
| A | 0.254 | 0.232 | 0.083 | 0.066 | 0.166 | 0.923 | 0.182 | 0.302 | 0.201 |
| C | 0.210 | 0.227 | 0.470 | 0.160 | 0.448 | 0.033 | 0.331 | 0.274 | 0.391 |
| G | 0.254 | 0.193 | 0.127 | 0.028 | 0.177 | 0.017 | 0.066 | 0.112 | 0.112 |
| U | 0.282 | 0.348 | 0.320 | 0.746 | 0.210 | 0.028 | 0.420 | 0.313 | 0.296 |
Figure 4.Representative composition of the branch point sequence (arrow) and the PPT deduced from our studies.
Sixteen mutations and a single polymorphism disrupting BPSs
| Gene and intron | Sequence | Consequence | Reference |
|---|---|---|---|
| Wild-type | CCCTGAC | ||
| Mutant | CCC | Intron retentiona | ( |
| Mutant | CCC | Intron retentionb | ( |
| Mutant | CCC | Intron retentionb | ( |
| Wild-type | TACTAAG | ||
| Mutant | TAC | Exon skippinga | ( |
| Wild-type | GACTGAC | ||
| Mutant | GAC | Exon skippinga | ( |
| Wild-type | GGCTCAC | ||
| Mutant | GGC | Intron retention,a cryptic 3′ splice sitea | ( |
| Wild-type | GGCTGAT | ||
| Mutant | GGC | Exon skipping,a cryptic 3′ splice sitea | ( |
| Wild-type | ATCCAAG | ||
| Mutant | ATCCA | cryptic 3′ splice sitea | ( |
| Wild-type | CCCCAAT | ||
| Mutant | CCCCA | cryptic 3′ splice sitea | ( |
| Wild-type | TTGCAAT | ||
| Mutant | TTGCA | Exon skippinga | ( |
| Wild-type | TTGCAAT | ||
| Mutant | TTGCA | Cryptic 3′ splice sitea | ( |
| Wild-type | TTCTAGC | ||
| Mutant | TTCTA | Intron retentiona | ( |
| Wild-type | GCGTGAC | ||
| Mutant | GCGTG | Cryptic 3′ splice site,a intron retentiona | ( |
| Wild-type | TACTGAT | ||
| Mutant | TACTG | Exon skippinga | ( |
| Wild-type | CACTAAT | ||
| Mutant | CACTA | Exon skippinga | ( |
| Wild-type | CGTTAAT | ||
| Mutant | CGTTA | Exon skippingb | ( |
| Genotype A | CAC | Exon skippingb | ( |
| Genotype U | CAC | Exon inclusionb | ( |
Mutations or a polymorphism are underlined. Aberrant splicings have been determined in patientsa or minigenesb.
cBranch points have been identified by lariat RT-PCR. Others are putative BPSs lacking in vitro evidence.
dPolymorphism.