| Literature DB >> 24980617 |
Hashem A Shihab, Julian Gough, Matthew Mort, David N Cooper, Ian N M Day, Tom R Gaunt1.
Abstract
As the number of non-synonymous single nucleotide polymorphisms (nsSNPs) identified through whole-exome/whole-genome sequencing programs increases, researchers and clinicians are becoming increasingly reliant upon computational prediction algorithms designed to prioritize potential functional variants for further study. A large proportion of existing prediction algorithms are 'disease agnostic' but are nevertheless quite capable of predicting when a mutation is likely to be deleterious. However, most clinical and research applications of these algorithms relate to specific diseases and would therefore benefit from an approach that discriminates between functional variants specifically related to that disease from those which are not. In a whole-exome/whole-genome sequencing context, such an approach could substantially reduce the number of false positive candidate mutations. Here, we test this postulate by incorporating a disease-specific weighting scheme into the Functional Analysis through Hidden Markov Models (FATHMM) algorithm. When compared to traditional prediction algorithms, we observed an overall reduction in the number of false positives identified using a disease-specific approach to functional prediction across 17 distinct disease concepts/categories. Our results illustrate the potential benefits of making disease-specific predictions when prioritizing candidate variants in relation to specific diseases. A web-based implementation of our algorithm is available at http://fathmm.biocompute.org.uk.Entities:
Mesh:
Year: 2014 PMID: 24980617 PMCID: PMC4083756 DOI: 10.1186/1479-7364-8-11
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Performance of computational prediction algorithms when discriminating between disease-specific variants and other disease-causing/neutral variants
| Musculoskeletal | | | | | | | | | | | |
| SIFT | 4,730 | 37,701 | 23,323 | 944 | 0.61 | 0.57 | 0.38 | 0.83 | 0.70 | 0.24 | 0.64 |
| PolyPhen-2 | 5,278 | 44,047 | 34,859 | 714 | 0.66 | 0.61 | 0.44 | 0.88 | 0.79 | 0.36 | 0.71 |
| FATHMM | 5,902 | 51,596 | 29,202 | 201 | 0.66 | 0.60 | 0.36 | 0.41 | 0.73 | ||
| Disease-Specific | 4,120 | 3,123 | 77,675 | 1,983 | 0.68 | 0.75 | |||||
| Disease-Specific (20-fold) | - | - | - | - | 0.80 | 0.92 | 0.94 | 0.66 | 0.74 | 0.63 | - |
| Developmental | | | | | | | | | | | |
| SIFT | 845 | 41,586 | 23,983 | 284 | 0.56 | 0.54 | 0.37 | 0.75 | 0.59 | 0.12 | 0.56 |
| PolyPhen-2 | 920 | 48,405 | 35,337 | 236 | 0.61 | 0.58 | 0.42 | 0.80 | 0.67 | 0.23 | 0.63 |
| FATHMM | 1,006 | 52,429 | 33,278 | 188 | 0.62 | 0.58 | 0.39 | 0.26 | 0.59 | ||
| Disease-Specific | 621 | 710 | 84,997 | 573 | 0.52 | 0.67 | |||||
| Disease-Specific (20-fold) | - | - | - | - | 0.74 | 0.97 | 0.99 | 0.49 | 0.66 | 0.55 | - |
| Endocrine | | | | | | | | | | | |
| SIFT | 3,084 | 39,347 | 23,443 | 824 | 0.58 | 0.56 | 0.37 | 0.79 | 0.64 | 0.18 | 0.60 |
| PolyPhen-2 | 2,890 | 46,435 | 35,031 | 542 | 0.64 | 0.60 | 0.43 | 0.84 | 0.73 | 0.30 | 0.67 |
| FATHMM | 3,597 | 49,466 | 33,522 | 316 | 0.66 | 0.61 | 0.40 | 0.38 | 0.71 | ||
| Disease-Specific | 2,392 | 1,015 | 81,973 | 1,521 | 0.61 | 0.72 | |||||
| Disease-Specific (20-fold) | - | - | - | - | 0.79 | 0.97 | 0.98 | 0.60 | 0.71 | 0.63 | - |
| Metabolic | | | | | | | | | | | |
| SIFT | 10,731 | 31,700 | 21,913 | 2,354 | 0.61 | 0.58 | 0.41 | 0.82 | 0.69 | 0.25 | 0.64 |
| PolyPhen-2 | 11,337 | 37,988 | 33,788 | 1,785 | 0.67 | 0.62 | 0.47 | 0.86 | 0.78 | 0.36 | 0.72 |
| FATHMM | 13,068 | 39,914 | 33,271 | 648 | 0.70 | 0.64 | 0.45 | 0.47 | 0.80 | ||
| Disease-Specific | 10,767 | 3,209 | 69,976 | 2,949 | 0.78 | 0.82 | |||||
| Disease-Specific (20-fold) | - | - | - | - | 0.86 | 0.94 | 0.95 | 0.77 | 0.81 | 0.74 | - |
Figure 1Performance of disease-specific and generic computational prediction algorithms. ROC curves for computational prediction algorithms when tasked with discriminating between disease-specific mutations and other germline variants (i.e. other disease-causing/neutral mutations).
Summary of nsSNPs used in our disease-specific mutation datasets
| Human Gene Mutation Database (HGMD) | | |
| Blood | 99 | 1,474 |
| Blood coagulation | 45 | 3,508 |
| Developmental | 188 | 1,199 |
| Digestive | 116 | 1,850 |
| Ear, nose and throat | 113 | 943 |
| Endocrine | 192 | 3,913 |
| Eye | 227 | 3,031 |
| Genitourinary | 166 | 3,031 |
| Heart | 247 | 3,743 |
| Immune | 75 | 1,293 |
| Metabolic | 485 | 13,797 |
| Musculoskeletal | 309 | 6,110 |
| Nervous system | 473 | 8,553 |
| Psychiatric | 163 | 747 |
| Reproductive | 88 | 883 |
| Respiratory | 44 | 775 |
| Skin | 164 | 3,183 |
| SwissProt/TrEMBL | | |
| Putative neutral polymorphisms | 11,601 | 37,488 |