| Literature DB >> 22241780 |
Miao-Xin Li1, Hong-Sheng Gui, Johnny S H Kwan, Su-Ying Bao, Pak C Sham.
Abstract
Exome sequencing strategy is promising for finding novel mutations of human monogenic disorders. However, pinpointing the casual mutation in a small number of samples is still a big challenge. Here, we propose a three-level filtration and prioritization framework to identify the casual mutation(s) in exome sequencing studies. This efficient and comprehensive framework successfully narrowed down whole exome variants to very small numbers of candidate variants in the proof-of-concept examples. The proposed framework, implemented in a user-friendly software package, named KGGSeq (http://statgenpro.psychiatry.hku.hk/kggseq), will play a very useful role in exome sequencing-based discovery of human Mendelian disease genes.Entities:
Mesh:
Year: 2012 PMID: 22241780 PMCID: PMC3326332 DOI: 10.1093/nar/gkr1257
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The three-level filtration and prioritization framework implemented in KGGSeq.
Counts of SNPs (and genes) after filtrations by functions of the three-level framework in KGGSeq
| Steps | S_exome1 | S_exome2 | S_exome3 | S_exome4 | S_exome5 | S_exome6 | S_exome7 | S_exome8 |
|---|---|---|---|---|---|---|---|---|
| SNV | SNV | SNV | SNV | SNV | SNV | SNV | Indel | |
| Initial | 16 120 | 15 971 | 19 721 | 19 518 | 16 012 | 16 048 | 16 048 | 146 |
| Inheritance pattern | 10 180 (Dom) | 9929 (Dom) | 12 897 (Dom) | 12 867 (Dom) | 9133 (Dom) | 9182 (Dom) | 6867 (Rec) | 146 (Dom) |
| Non-synonymousb | 4705 | 4582 | 5837 | 5833 | 4171 | 4163 | 3089 | 143 |
| Rare in dbSNP + 1000 Genomec | 457 | 508 | 709 | 794 | 410 | 508 | 51 | 116 (113) |
| Predicted to be disease causal | 106 (90) | 127 (117) | 149 (133) | 164 (152) | 95 (87) | 120 (107) | 2 (2) | – |
| Knowledge-relatedd | 1 (1) | 7 (7) | 4 (4) | 6 (6) | 6 (5) | 1 (1) | –e | 8 (8) |
| PPI | 1 (1) | 2 (2) | 1 (1) | 1 (1) | 1 (1) | 1 (1) | – | – |
| Pathway | 1 (1) | 5 (5) | 2 (2) | 4 (4) | 2 (2) | 1 (1) | – | – |
| PubMed | 1 (1) | 3 (3) | 3 (3) | 4 (4) | 4 (3) | 1 (1) | – | 8 (8) |
aDominant mode only considered with variants with heterozygous genotypes and recessive mode only considered with variants with homozygous genotypes; bNon-synonymous includes missence, stopgain, stoploss and splicing SNVs and insertions/deletions causing frameshift, non-frameshift, stoploss, stopgain and splicing differences; cThe rare variants referred to variants with MAF < 0.01 in dbSNP and 1000 Genome; dKnowledge-related variants/genes refer to those variants' genes having PPI(s) or sharing pathway(s) with provided candidate gene(s), and those variants fell into region(s) or gene(s) which co-occurred in the titles or abstracts of papers in PubMed database; e‘—’ means the corresponding analyses were not conducted for reasons stated in the 2nd paragraph of the ‘Results’ section.
Figure 2.Protein–protein interaction network of MYH3 with four candidate genes. The five involved genes are in dashed circle. Each filled node denotes a gene; edges between notes indicate PPIs between protein products of the corresponding genes. Different edge colors represent the types of evidence for the association. This figure was produced by STRING (V9.0).
Figure 4.Receiver operating characteristic (ROC) curves of various methods. (a) The control (neutral) variants are rare (MAF < 0.01); (b) the control (neutral) variants have MAF ≥ 0.01. The true positive rate (sensitivity) and false positive rate (1-specificity) of logistic model were obtained by 10-fold cross validation procedure. Logistic model: performance of conventional multiple logistic regression model when combining the five deleterious scores (PhyloP, SIFT, PolyPhen2, LRT and MutationTaster). The true positive rate and false positive rate of the five different prediction methods were generated by varying the threshold scores for prediction in the entire data set.
Figure 3.Pair-wise correlation of the five deleteriousness scores in the (a) disease-causal and (b) neutral rare variant sets. The Spearman's rank correlation method was used to calculate the pair-wise correlation coefficients.
Summary results of multiple logistic regression of five deleteriousness scores
| Deleteriousness scores | Beta (±SD) | Pr(>| | |
|---|---|---|---|
| PhyloP | 0.18 (±0.08) | 2.13 | 0.033 |
| SIFT | 1.9 (±0.12) | 15.33 | <2 |
| Polyphen2 | 1.00 (±0.06) | 16.73 | <2 |
| LRTScore | 0.10 (±0.12) | 0.85 | 0.39 |
| MutationTaster | 2.34 (±0.06) | 39.62 | <2 |
The disease causal variants and neutral rare variants (MAF < 0.01) were used for model fitting in the logistic regression model.