| Literature DB >> 23499923 |
Ben C Shirley1, Eliseos J Mucaki, Tyson Whitehead, Paul I Costea, Pelin Akan, Peter K Rogan.
Abstract
Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present the Shannon pipeline software for genome-scale mutation analysis and provide evidence that the software predicts variants affecting mRNA splicing. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing variants are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), which were supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised 6-17 inactivating mutations, 1-5 leaky mutations and 6-13 cryptic splicing mutations. Predicted effects were validated by RNA-seq analysis of the three aforementioned cancer cell lines, and expression microarray analysis of SNPs in HapMap cell lines.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23499923 PMCID: PMC4357664 DOI: 10.1016/j.gpb.2013.01.008
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Performance of Shannon pipeline for human mRNA splicing mutation prediction
| Source of variants | Number of variants analyzed | Running timea |
|---|---|---|
| U2OS cell line | 211,049 | 1 h 12 min |
| A431 cell line | 290,589 | 1 h 17 min |
| U251 cell line | 314,637 | 1 h 20 min |
| ESP 6500 Exomes | 1,872,893 | 2 h 35 min |
Note:aIntel I7 CPU with 16 Gb RAM.
Figure 3Predicted mutation splicing phenotype supported by RNA-seq Predicted RBBP8 splicing mutation, chr18:20529676G > A (NM_203291.1: c.248G > A), is related to transcripts mapped to this region. A. IVG genome browser display of read distribution at the exon 4/intron 4 junction. Green boxes within the vertical hashed lines indicate the presence of the A allele. B. The natural and cryptic splice sites illustrated by sequence walkers generated on the ASSA server. The arrow tail and head draw attention to the location and sequence of the reference and variant sequence. The mutation reduces the strength of the natural donor site from 6.2 to 3.2 bits. All but 3 of the 59 reads extending into the intron contain the variant allele, as indicated by the green positions within the reads. These reads extend into the exon and terminate at the closest intronic cryptic donor site (chr18:20529700). The mutated natural and cryptic sites are of equal strength, which explains splicing at both sites.
Figure 2Sample output of the Shannon pipeline software The Shannon pipeline software generates the following types of output. A. Tabular results showing the first 12 of 134 changes in Ri values at different genomic coordinates predicted to be significant, after filtering for cryptic splicing mutations from all variants (n = 22,197) in a complete genome sequence. The first filter eliminates exonic cryptic sites, the second selects cryptic sites with increased Ri values, the third ensures that the cryptic site is stronger than the corresponding natural site of the same phase and the final filter ensures that all remaining sites exceed the minimum Ri value of a functional splice site. B. Manhattan-like plot indicating the locations and changes in Ri of all variants which alter splice site information in a region within intron 1 of BRCA1 (chr17:41277500-41288500) from different individuals with increased breast cancer risk. C. Custom track illustrating a cryptic splicing mutation detected in an ovarian serous carcinoma that inactivates the acceptor site of exon 4 in STXBP4, resulting in the activation of a pre-existing, in frame, alternative splice site 6 nucleotides downstream.
Enrichment for predicted splicing mutations after processing and filtering
| Cell line | Initial variants analyzed | Novel natural site | Novel cryptic site | Natural site (SNP)a | Cryptic site (SNP)a | Overall mutation fraction (%) |
|---|---|---|---|---|---|---|
| A431 | 290,589 | 16 | 13 | 13 | 3 | 0.015 |
| U251 | 314,637 | 7 | 10 | 18 | 3 | 0.012 |
| U2OS | 211,049 | 22 | 9 | 13 | 4 | 0.022 |
| Total | 816,275 | 46 | 32 | 49 | 10 | 0.017 |
Note:adbSNP135; <1% heterozygosity; minor allele.