| Literature DB >> 28137713 |
Hashem A Shihab1, Mark F Rogers2, Colin Campbell2, Tom R Gaunt1.
Abstract
MOTIVATION: A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28137713 PMCID: PMC5581952 DOI: 10.1093/bioinformatics/btx028
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Methods for integrating feature groups: (a) feature groups are combined at the data level and fed into a single classifier; (b) feature groups are encoded as base kernels and combined using MKL; and (c) feature groups are used to construct heterogeneous base classifiers which are then combined using a stacking approach
Performance of haploinsufficiency predictors on our training data
| Method | Accuracy | Sensitivity | Specificity | Precision | NPV | AUC |
|---|---|---|---|---|---|---|
| EvoTol | 0.6367 | 0.5577 | 0.7988 | 0.6905 | 0.6917 | 0.6929 |
| GHIS | 0.7069 | 0.7178 | 0.6327 | 0.6578 | 0.6951 | 0.7450 |
| RVIS | 0.8129 | 0.7895 | 0.7596 | 0.7059 | 0.8316 | 0.8329 |
| HIS | 0.6707 | 0.6683 | 0.8383 | 0.8354 | 0.6731 | 0.8412 |
| IS | 0.8478 | 0.8403 | 0.7017 | 0.6779 | 0.8547 | 0.8489 |
| HIS (Imputed) | 0.6195 | 0.5155 | 0.9257 | 0.8581 | 0.6867 | 0.8549 |
| HIPred | 0.9032 | 0.8846 | 0.8919 | 0.8519 | 0.9167 | 0.8940 |
Note: NPV, negative predictive value; AUC, area under the curve.
The reported performance of HIPred is the average performance observed across our repeated cross-validation procedure.
Fig. 2Informative features used for predicting haploinsufficient genes
Performance of methods used for predicting haploinsufficiency on known disease genes and mouse models
| Method | Accuracy | Sensitivity | Specificity | Precision | NPV | AUC |
|---|---|---|---|---|---|---|
| OMIM HI | ||||||
| EvoTol | 0.5232 | 0.5263 | 0.7358 | 0.7407 | 0.5200 | 0.6477 |
| GHIS | 0.8077 | 0.8630 | 0.3621 | 0.6300 | 0.6774 | 0.6845 |
| RVIS | 0.7593 | 0.8354 | 0.2807 | 0.6168 | 0.5517 | 0.6609 |
| HIS | 0.6604 | 0.7049 | 0.6923 | 0.7818 | 0.6000 | 0.7303 |
| IS | 0.7869 | 0.8354 | 0.5172 | 0.7021 | 0.6977 | 0.7451 |
| HIS (Imputed) | 0.4933 | 0.4722 | 0.8333 | 0.8095 | 0.5128 | 0.7156 |
| HIPred | 0.7606 | 0.7821 | 0.6026 | 0.6630 | 0.7344 | 0.7543 |
| OMIM HI | ||||||
| EvoTol | 0.5455 | 0.5455 | 0.7273 | 0.7273 | 0.5455 | 0.6959 |
| GHIS | 0.8361 | 0.8889 | 0.2973 | 0.6061 | 0.6875 | 0.7135 |
| RVIS | 0.8667 | 0.9149 | 0.2500 | 0.6143 | 0.6923 | 0.6965 |
| HIS | 0.7188 | 0.7568 | 0.6923 | 0.7778 | 0.6667 | 0.7599 |
| IS | 0.8286 | 0.8723 | 0.4857 | 0.6949 | 0.7391 | 0.7350 |
| HIS (Imputed) | 0.5455 | 0.5349 | 0.8333 | 0.8214 | 0.5556 | 0.7357 |
| HIPred | 0.8919 | 0.9130 | 0.5217 | 0.6562 | 0.8571 | 0.7902 |
| MGI lethality | ||||||
| EvoTol | 0.4928 | 0.5000 | 0.7174 | 0.7292 | 0.4853 | 0.6258 |
| GHIS | 0.7576 | 0.8235 | 0.3958 | 0.6588 | 0.6129 | 0.6725 |
| RVIS | 0.6697 | 0.7600 | 0.3636 | 0.6706 | 0.4706 | 0.6523 |
| HIS | 0.5600 | 0.5926 | 0.7742 | 0.8205 | 0.5217 | 0.7210 |
| IS | 0.6949 | 0.7568 | 0.5200 | 0.7000 | 0.5909 | 0.7065 |
| HIS (Imputed) | 0.4676 | 0.4478 | 0.8537 | 0.8333 | 0.4861 | 0.7632 |
| HIPred | 0.7872 | 0.7973 | 0.7027 | 0.7284 | 0.7761 | 0.8143 |
| MGI seizures | ||||||
| EvoTol | 0.5341 | 0.5287 | 0.7164 | 0.7077 | 0.5393 | 0.6611 |
| GHIS | 0.6748 | 0.7619 | 0.2879 | 0.5766 | 0.4872 | 0.5826 |
| RVIS | 0.7440 | 0.8222 | 0.2836 | 0.6066 | 0.5429 | 0.5748 |
| HIS | 0.4759 | 0.5000 | 0.6327 | 0.6786 | 0.4493 | 0.5428 |
| IS | 0.7000 | 0.7667 | 0.4143 | 0.6273 | 0.5800 | 0.5767 |
| HIS (Imputed) | 0.3854 | 0.3140 | 0.7231 | 0.6000 | 0.4434 | 0.5479 |
| HIPred | 0.7073 | 0.7333 | 0.5682 | 0.6346 | 0.6757 | 0.7024 |
| ASD 1 | ||||||
| EvoTol | 0.4016 | 0.2400 | 0.8478 | 0.6316 | 0.5065 | 0.4978 |
| GHIS | 0.7429 | 0.8085 | 0.3043 | 0.5429 | 0.6087 | 0.5185 |
| RVIS | 0.7468 | 0.8077 | 0.3778 | 0.6000 | 0.6296 | 0.6925 |
| HIS | 0.3563 | 0.2000 | 0.6316 | 0.3333 | 0.4615 | 0.4023 |
| IS | 0.5158 | 0.5660 | 0.4043 | 0.5172 | 0.4524 | 0.4621 |
| HIS (Imputed) | 0.3684 | 0.2174 | 0.7442 | 0.4762 | 0.4706 | 0.4426 |
| HIPred | 0.6049 | 0.6667 | 0.3542 | 0.5079 | 0.5152 | 0.4948 |
| ASD 2 | ||||||
| EvoTol | 0.4015 | 0.2931 | 0.7308 | 0.5484 | 0.4810 | 0.4428 |
| GHIS | 0.6757 | 0.7647 | 0.2245 | 0.5065 | 0.4783 | 0.5646 |
| RVIS | 0.6905 | 0.7593 | 0.3400 | 0.5541 | 0.5667 | 0.6259 |
| HIS | 0.4490 | 0.4130 | 0.7143 | 0.6552 | 0.4808 | 0.5609 |
| IS | 0.6275 | 0.6724 | 0.4630 | 0.5735 | 0.5682 | 0.5923 |
| HIS (Imputed) | 0.3750 | 0.2857 | 0.7273 | 0.5714 | 0.4444 | 0.5483 |
| HIPred | 0.6211 | 0.6667 | 0.4259 | 0.5373 | 0.5610 | 0.5640 |
Note: NPV, negative predictive value; AUC, area under the curve.
Spearman’s rank correlation between the methods
| RVIS | IS | EvoTol | HIS | HIS (imputed) | GHIS | HIPred | |
|---|---|---|---|---|---|---|---|
| RVIS | 1.0000 | ||||||
| IS | 0.3293 | 1.0000 | |||||
| EvoTol | 0.0434 | 0.0675 | 1.0000 | ||||
| HIS | 0.3248 | 0.3534 | 0.0523 | 1.0000 | |||
| HIS (Imputed) | 0.3512 | 0.3879 | 0.0609 | 0.9993 | 1.0000 | ||
| GHIS | 0.5699 | 0.3783 | 0.0387 | 0.3598 | 0.3679 | 1.0000 | |
| HIPred | 0.4994 | 0.5250 | 0.0478 | 0.5652 | 0.5739 | 0.5031 | 1.0000 |