| Literature DB >> 21992029 |
Kirsten Faber1, Karl-Heinz Glatting, Phillip J Mueller, Angela Risch, Agnes Hotz-Wagenblatt.
Abstract
BACKGROUND: Some single nucleotide polymorphisms (SNPs) are known to modify the risk of developing certain diseases or the reaction to drugs. Due to next generation sequencing methods the number of known human SNPs has grown. Not all SNPs lead to a modified protein, which may be the origin of a disease. Therefore, the recognition of functional SNPs is needed. Because most SNP annotation tools look for SNPs which lead to an amino acid exchange or a premature stop, we designed a new tool called AASsites which searches for SNPs which modify splicing.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21992029 PMCID: PMC3194194 DOI: 10.1186/1471-2105-12-S4-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the dataflow of the pipeline AASsites. The different analysis steps performed with the SNP containing input sequence are displayed.
Figure 2Example output of pipeline AASsites. The part of the output with the final classification, the gene predictions and the ORF analysis is shown. The part with the enhancer analysis and the scoring information is omitted.
Test results of AASsites using SNPs with known changes
| Set | Number of SNPs | Correct Classification | Wrong Classification | SpliceScan Correct | SpliceScan Wrong |
|---|---|---|---|---|---|
| 56 | 37 (66%) | 19 (34%) | 32 (57%) | 24 (43%) | |
| Negative | 53 | 53 (100%) | 0 (0%) | 45 (85%) | 8 (15%) |
| All | 109 | 90 (83%) | 19 (17%) | 77 (71%) | 32 (29%) |
Classification results of selected human SNPs
| Location | Likely | Probable | Unlikely |
|---|---|---|---|
| Exon | 72 | 430 | 70444 |
| Intron | 239 | 555 | 8173 |
| All | 311 | 985 | 78617 |
SNPs with known changes in splicing identified by AASsites
| Protein | SNP | Change in splice pattern | Associated disease | Reference |
|---|---|---|---|---|
| GSTM4 | rs41283498 | Exon skipping | Lung cancer | [ |
| PCTK3 | rs55957903 | Exon skipping | - | |
| VHL | rs5030815 | Exon skipping | Renal cell carcinoma | [ |
| TSC2 | rs45517091 | Exon skipping | Tuberous sclerosis | [ |
| GCSH | rs62054483 | Exon skipping | Hyperglycinaemia | [ |
| NCAN | rs61222528 | Exon skipping | - | |
| EZH2 | rs1140478 | Exon extension | Prostate cancer | [ |
| ATP6V0A2 | rs1139788 | Exon extension | Cutis laxa | [ |
Figure 3Distribution of all SNPs and splice modifying SNPs in the intron. The distribution of all selected SNPs according to the distance to the splice site is shown in panel A, the distribution of splice modifying SNPs in panel B.
Figure 4Distribution of all SNPs and splice modifying SNPs in the exon. The distribution of all selected SNPs according to the distance to the splice site is shown in panel A, the distribution of splice modifying SNPs in panel B.
Pathways over-represented in genes with SNPs modifying splicing
| Pathway | Count | % | P-Value |
|---|---|---|---|
| Focal adhesion | 30 | 3.5 | 5.9E-6 |
| ECM-receptor interaction | 17 | 2.0 | 5.7E-5 |
| Metabolism of xenobiotics by cytochrome P450 | 12 | 1.4 | 2.3E-3 |
| ABC transporters - General | 9 | 1.1 | 4.9E-3 |
| Bladder cancer | 7 | 0.8 | 3.1E-2 |
| Regulation of actin cytoskeleton | 21 | 2.5 | 3.3E-2 |
| Adherens junction | 10 | 1.2 | 3.3E-2 |
| Phenylpropanoid biosynthesis | 3 | 0.4 | 6.3E-2 |
| Colorectal cancer | 10 | 1.2 | 7.0E-2 |
| Small cell lung cancer | 10 | 1.2 | 7.8E-2 |
| Cyanoamino acid metabolism | 3 | 0.4 | 8.0E-2 |
| Non-small cell lung cancer | 7 | 0.8 | 9.2E-2 |
| Pathogenic Escherichia coli infection - EPEC | 7 | 0.8 | 9.8E-2 |
The overexpressed pathways are shown together with the number of genes involved in the pathway (Count), the percentage of genes involved in the pathway compared to all input genes (%), and the p value from the Fisher test calculated by the DAVID tool.
Scoring table for combining the results of the AASsites analysis tools
| Score | 1 | 2 | 3 | 4 | 5 | 0 |
|---|---|---|---|---|---|---|
| SNP distance to splice site | <=2 nt | >2 nt and <=4 nt | >4 nt | - | - | - |
| Gene prediction | Intron/ Exon disappared/ appeared | Intron/Exon modified | No change | No prediction available | - | - |
| ORF | Indel with frameshift | Indel without frameshift | No frameshift no stop-codon appeared | New Amino Acid | No genewise prediction | Stop-codon appeared |
Classification rules
| SNP location | Rule* | Classification |
|---|---|---|
| Exon | ORF 1 or 0 | probable |
| Exon | ORF 2 and Gene prediction 1 | likely |
| Exon | ORF 2 and (Gene prediction 2 or 3 or 4) | unlikely |
| Exon | ORF 3 or 4 | unlikely |
| Exon | ORF 5 and (Gene prediction 1 or 2) and (SNP distance 1 or 2) | likely |
| Exon | ORF 5 and ((Gene prediction 3 or 4) or (SNP distance 3 or 4)) | unlikely |
| Intron | SNP distance 1 and (Gene prediction 1 or 2) | probable |
| Intron | SNP distance 1 and (Gene prediction 3 or 4) | unlikely |
| Intron | SNP distance 2 and (Gene prediction 1 or 2) | likely |
| Intron | SNP distance 2 and (Gene prediction 3 or 4) | unlikely |
| Intron | SNP distance 3 and (Gene prediction 1 or 2) | likely |
| Intron | SNP distance 3 and (Gene prediction 3 or 4) | unlikely |
* Rule means the combination of scores for the different features from table 5