| Literature DB >> 25057096 |
Biao Li, Chet Seligman, Janita Thusberg, Jackson L Miller, Jim Auer, Michelle Whirl-Carrillo, Emidio Capriotti, Teri E Klein, Sean D Mooney.
Abstract
BACKGROUND: Missense pharmacogenomic (PGx) variants refer to amino acid substitutions that potentially affect the pharmacokinetic (PK) or pharmacodynamic (PD) response to drug therapies. The PGx variants, as compared to disease-associated variants, have not been investigated as deeply. The ability to computationally predict future PGx variants is desirable; however, it is not clear what data sets should be used or what features are beneficial to this end. Hence we carried out a comparative characterization of PGx variants with annotated neutral and disease variants from UniProt, to test the predictive power of sequence conservation and structural information in discriminating these three groups.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25057096 PMCID: PMC4092878 DOI: 10.1186/1471-2164-15-S4-S4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Protein and variant distribution across protein types.
| Type | Protein | Variant | ||
|---|---|---|---|---|
| Neutral | Disease | PGx | ||
| Enzyme | 22 | 116 | 130 | 34 |
| Transporter | 14 | 165 | 286 | 34 |
| Cytochrome P450 | 11 | 138 | 47 | 30 |
| Receptor | 10 | 39 | 13 | 17 |
| Channel | 3 | 18 | 153 | 4 |
| Others | 4 | 11 | 23 | 7 |
| Total | 64 | 487 | 652 | 126 |
Features showing different distribution among three groups of variant.
| Data set | Feature | Variant |
| |||
|---|---|---|---|---|---|---|
| Disease | PGx | Neutral | ||||
| Structure | Conservation | -0.474 | 0.000 | 0.209 | -0.22 | 6.1 × 10-9 |
| ConsNG | -0.316 | -0.152 | -0.021 | -0.21 | 7.6 × 10-8 | |
| NNG | 10.0 | 9.4 | 8.3 | 0.18 | 4.8 × 10-6 | |
| ACC | 30.4 | 42.0 | 52.8 | -0.17 | 1.2 × 10-5 | |
| Sequence | Conservation | -0.323 | 0.145 | 0.340 | -0.23 | 7.4 × 10-25 |
Abbreviations: Conservation, conservation at the variant position; ConsNG, average conservation of 3D neighbors; NNG, number of 3D neighbors; ACC, accessibility in number of water molecules; τ, Kendall correlation coefficient. P-value is computed from Kendall correlation coefficient test. The Kendall test has been performed sorting the feature vector according their average values in each class (Disease, PGx and Neutral).
Figure 1Comparison of conservation scores and accessibility among disease, neutral, and PGx variants. In each box-and-whisker plot, the black lines denote medians, boxes with gray background cover from the first to third quantile, and notches connecting to the boxes with dashed lines cover close to 95% interval. (A) Disease variants show strongest conservation and fall within a narrow score range; PGx variants are less conservative and the scores disperse in a larger range but less broad than neutral variants do. (B) Accessibility is measured by the number of water molecules. Disease variants are the least accessible, and the accessibility distribution of PGx variants is more similar to that of disease variants than to neutral ones, which scatter in a significantly larger range.
Figure 2Spread of three types of variant in the neutral-disease plane. Dashed lines constitute the prediction decision boundary among three types of variant in the neutral (x)-disease (y) plane. The intersections in both figures have a coordinate (1/3, 1/3). The decision boundary consists of three lines: y - x = 0, 1 - 2x - y = 0, and 1 - x - 2y = 0. (A) Distribution of three groups of variants based on structural data. (B) Distribution of three groups of variants based on sequence data.
Classification performance measurements from two-group comparison (%).
| Classification | Sensitivity | Specificity | Accuracy | Precision | AUC | |
|---|---|---|---|---|---|---|
| Structure | ||||||
| Disease/neutral | 76.4 | 75.4 | 75.9 | 74.3 | 75.4 | 82.0 |
| PGx/disease | 67.3 | 74.1 | 70.7 | 45.1 | 54.0 | 78.1 |
| PGx/neutral | 58.2 | 66.3 | 62.2 | 33.7 | 42.7 | 64.2 |
| Sequence | ||||||
| Disease/neutral | 73.5 | 68.4 | 70.9 | 75.7 | 74.6 | 79.3 |
| PGx/disease | 68.3 | 72.4 | 70.3 | 32.3 | 43.9 | 77.4 |
| PGx/neutral | 47.6 | 63.9 | 55.7 | 25.4 | 33.1 | 56.9 |
Figure 3Comparison of minor allele frequency and pathogenesis among disease, neutral, and PGx variants. Box-and-whisker representation of minor allele frequency and pathogenesis among three groups of variants. Note that in (A) not all variants could find MAF data due to being missing in the 1,092 individuals from the 1000 Genomes Project. (B) SIFT uses a threshold of 0.05 to classify variants with scores lower than this value as pathogenetic. (C)(D) Both PolyPhen2 and MutPred output probability-like scores measuring pathogenesis of variants.
Classification performance measurements from three tools (%).
| Classification | Sensitivity | Specificity | Accuracy | Precision | AUC | |
|---|---|---|---|---|---|---|
| SIFT | ||||||
| Disease/neutral | 71.1 | 71.1 | 71.1 | 75.1 | 73.1 | 77.3 |
| PGx/disease | 71.1 | 60.6 | 65.8 | 90.3 | 79.6 | 70.5 |
| PGx/neutral | 54.8 | |||||
| PolyPhen2 | ||||||
| Disease/neutral | 85.3 | 57.1 | 71.2 | 72.7 | 78.5 | 76.9 |
| PGx/disease | 85.3 | 54.0 | 69.6 | 90.6 | 87.8 | 69.9 |
| PGx/neutral | 53.8 | |||||
| MutPred | ||||||
| Disease/neutral | 98.5 | 76.6 | 87.5 | 84.9 | 91.2 | 95.1 |
| PGx/disease | 98.5 | 61.1 | 79.8 | 92.9 | 95.6 | 85.9 |
| PGx/neutral | 59.1 |
Classification thresholds between PGx and neutral variants are unknown and thus only AUC was computed for each tool.
| Class | Predicted Positive | Predicted Negative |
|---|---|---|
| True Positive | TP | FN |
| True Negative | FP | TN |