| Literature DB >> 28045094 |
Li-Chung Chuang1,2, Po-Hsiu Kuo2,3.
Abstract
A genetic risk score could be beneficial in assisting clinical diagnosis for complex diseases with high heritability. With large-scale genome-wide association (GWA) data, the current study constructed a genetic risk model with a machine learning approach for bipolar disorder (BPD). The GWA dataset of BPD from the Genetic Association Information Network was used as the training data for model construction, and the Systematic Treatment Enhancement Program (STEP) GWA data were used as the validation dataset. A random forest algorithm was applied for pre-filtered markers, and variable importance indices were assessed. 289 candidate markers were selected by random forest procedures with good discriminability; the area under the receiver operating characteristic curve was 0.944 (0.935-0.953) in the training set and 0.702 (0.681-0.723) in the STEP dataset. Using a score with the cutoff of 184, the sensitivity and specificity for BPD was 0.777 and 0.854, respectively. Pathway analyses revealed important biological pathways for identified genes. In conclusion, the present study identified informative genetic markers to differentiate BPD from healthy controls with acceptable discriminability in the validation dataset. In the future, diagnosis classification can be further improved by assessing more comprehensive clinical risk factors and jointly analysing them with genetic data in large samples.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28045094 PMCID: PMC5206749 DOI: 10.1038/srep39943
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The performance of discrimination ability for the union marker numbers of the top ranked in the two indices, the Gini Index and the conditional variable importance from the random forest procedure.
| Top | NO. of SNPs | No. of SNPs after excluding markers in complete LD | AUROC | Hosmer- Lemeshow test |
|---|---|---|---|---|
| 10 | 19 | 16 | 0.615 | 0.535 |
| 20 | 36 | 29 | 0.663 | 0.729 |
| 50 | 85 | 68 | 0.763 | 0.054 |
| 100 | 168 | 135 | 0.846 | 0.732 |
| 150 | 258 | 211 | 0.908 | 0.506 |
| 200 | 348 | 289 | 0.944 | 0.945 |
Note: AUROC: the area under the receiver characteristic curve; 95% C.I.: 95% confidence interval.
*Markers in complete linkage disequilibrium (LD, D’ = 1) were removed before regression analysis and only one SNP was kept for each case of complete LD situation.
Figure 1The receiver characteristic curves for (A) the 289 candidate markers and (B) the 121 optimal markers in the GAIN bipolar disorder dataset.
Sensitivity, specificity, Youden Index, and the likelihood ratios of the risk score model in the GAIN bipolar disorder dataset as a training set.
| Score | Sensitivity | Specificity | Youden Index | PPV | NPV | LR+ | LR− |
|---|---|---|---|---|---|---|---|
| 180 | 0.905 | 0.685 | 0.590 | 0.735 | 0.882 | 2.871 | 0.139 |
| 181 | 0.880 | 0.723 | 0.604 | 0.755 | 0.862 | 3.182 | 0.166 |
| 182 | 0.854 | 0.776 | 0.630 | 0.787 | 0.846 | 3.807 | 0.188 |
| 183 | 0.813 | 0.817 | 0.630 | 0.812 | 0.819 | 4.449 | 0.229 |
| 184 | 0.777 | 0.854 | 0.631 | 0.837 | 0.798 | 5.322 | 0.261 |
| 185 | 0.747 | 0.879 | 0.626 | 0.857 | 0.782 | 6.181 | 0.288 |
Note: PPV: positive predictive value; NPV: negative predictive value; LR+: the likelihood ratio for a positive test result; LR−: the likelihood ratio for a negative test result.
The performance of discrimination ability for the genetic risk score model in the training set and the validation dataset.
| Model construction: the GAIN dataset | Validation: the STEP dataset | |||||||
|---|---|---|---|---|---|---|---|---|
| Models | AUROC | (95% C.I.) | Hosmer- Lemeshow test | Error rate | AUROC | (95% C.I.) | Hosmer- Lemeshow test | |
| Controls | BPD | |||||||
| 289 candidate markers | 0.944 | (0.935–0.953) | 0.933 | 0.208 | 0.220 | 0.702 | (0.681–0.723) | 0.681 |
| 121 optimal markers | 0.924 | (0.913–0.935) | 0.458 | 0.193 | 0.209 | 0.639 | (0.617–0.662) | 0.002 |
| The risk score variable | 0.905 | (0.893–0.918) | 0.264 | 0.179 | 0.193 | 0.506 | (0.482–0.530) | 0.954 |
Note: AUROC: the area under the receiver characteristic curve; 95% C.I.: 95% confidence interval.
*Error rate was examined by the leave-one-out cross-validation procedure.
The significant gene sets for bipolar disorder based on the candidate markers from the GAIN and the STEP datasets.
| Gene set name | No. of gene in gene set | No. of gene | Dataset | q-value |
|---|---|---|---|---|
| Acetyl-glucosaminyl transferase activity | 16 | 2 | STEP | 3.4 × 10−03 |
| Actin binding | 76 | 3 | GAIN | 2.8 × 10−03 |
| Actin cytoskeleton organization and biogenesis | 104 | 3 | GAIN | 6.7 × 10−03 |
| Actin filament | 18 | 2 | STEP | 4.3 × 10−03 |
| Actin filament based process | 114 | 3 | GAIN | 8.6 × 10−03 |
| Amine metabolic process | 137 | 4 | STEP | 6.8 × 10−03 |
| Anatomical structure morphogenesis | 374 | 6 | GAIN | 2.6 × 10−03 |
| Auxiliary transport protein activity | 25 | 2 | STEP | 8.1 × 10−03 |
| Axon guidance | 22 | 2 | GAIN | 2.9 × 10−03 |
| Axonogenesis | 43 | 3 | GAIN | 5.3 × 10−04 |
| Brain development | 51 | 3 | GAIN | 8.8 × 10−04 |
| Calcium channel activity | 33 | 2 | GAIN | 6.5 × 10−03 |
| Calcium ion transport | 27 | 2 | GAIN | 4.4 × 10−03 |
| Calmodulin binding | 25 | 2 | STEP | 8.1 × 10−03 |
| Carbohydrate binding | 72 | 3 | STEP | 7.2 × 10−03 |
| Cation transport | 146 | 4 | STEP | 8.4 × 10−03 |
| Cell migration | 93 | 3 | GAIN | 4.9 × 10−03 |
| Cell surface | 76 | 3 | STEP | 8.3 × 10−03 |
| Cellular morphogenesis during differentiation | 49 | 3 | GAIN | 7.8 × 10−04 |
| Channel regulator activity | 23 | 2 | STEP | 6.9 × 10−03 |
| Chr12q23 | 78 | 3 | STEP | 8.9 × 10−03 |
| Chr2p23 | 75 | 4 | STEP | 7.6 × 10−04 |
| Chr2q23 | 25 | 2 | STEP | 8.1 × 10−03 |
| Chr3p14 | 56 | 3 | STEP | 3.6 × 10−03 |
| Chr4q34 | 19 | 2 | STEP | 4.7 × 10−03 |
| Chr6q13 | 23 | 2 | STEP | 6.9 × 10−03 |
| Chr6q26 | 16 | 2 | STEP | 3.4 × 10−03 |
| Cytoplasmic membrane bound vesicle | 112 | 3 | GAIN | 8.2 × 10−03 |
| Cytoplasmic vesicle | 116 | 3 | GAIN | 9.0 × 10−03 |
| Cytoskeletal protein binding | 158 | 5 | GAIN | 3.0 × 10−04 |
| Cytoskeleton | 361 | 6 | GAIN | 2.2 × 10−03 |
| Di-, tri-valent inorganic cation transport | 32 | 2 | GAIN | 6.2 × 10−03 |
| Endocytic vesicle | 14 | 2 | GAIN | 1.2 × 10−03 |
| Enzyme regulator activity | 314 | 6 | STEP | 7.6 × 10−03 |
| Establishment of localization | 852 | 12 | STEP | 2.5 × 10−03 |
| G protein signaling coupled to IP3 second messenger phospholipase C activating | 41 | 2 | GAIN | 1.0 × 10−02 |
| GABA receptor activity | 11 | 2 | STEP | 1.6 × 10−03 |
| Generation of neurons | 83 | 3 | GAIN | 3.6 × 10−03 |
| Integrin binding | 30 | 2 | GAIN | 5.4 × 10−03 |
| Ion transport | 184 | 5 | STEP | 3.4 × 10−03 |
| KEGG-Arrhythmogenic right ventricular cardiomyopathy | 76 | 5 | GAIN | 9.0 × 10−06 |
| KEGG-Calcium signaling pathway | 178 | 4 | GAIN | 4.2 × 10−03 |
| KEGG-Cardiac muscle contraction | 80 | 4 | GAIN | 2.2 × 10−04 |
| KEGG-Dilated cardiomyopathy | 92 | 4 | GAIN | 3.7 × 10−04 |
| KEGG-Hypertrophic cardiomyopathy HCM | 85 | 4 | GAIN | 2.7 × 10−04 |
| Membrane | 1942 | 20 | GAIN | 2.1 × 10−05 |
| Membrane bound vesicle | 114 | 3 | GAIN | 8.6 × 10−03 |
| Membrane organization and biogenesis | 133 | 4 | STEP | 6.1 × 10−03 |
| Membrane part | 1633 | 13 | GAIN | 6.6 × 10−03 |
| Neurite development | 53 | 3 | GAIN | 9.8 × 10−04 |
| Neurogenesis | 93 | 3 | GAIN | 4.9 × 10−03 |
| Neuron development | 61 | 3 | GAIN | 1.5 × 10−03 |
| Neuron differentiation | 76 | 3 | GAIN | 2.8 × 10−03 |
| Neuropeptide binding | 23 | 2 | GAIN | 3.2 × 10−03 |
| Neuropeptide receptor activity | 22 | 2 | GAIN | 2.9 × 10−03 |
| Nitrogen compound metabolic process | 150 | 4 | STEP | 9.3 × 10−03 |
| Plasma membrane | 1393 | 14 | GAIN | 5.4 × 10−04 |
| Plasma membrane part | 1141 | 10 | GAIN | 9.0 × 10−03 |
| RAS guanyl nucleotide exchange factor activity | 18 | 2 | STEP | 4.3 × 10−03 |
| Reactome-Depolarization of the presynaptic terminal triggers the opening of calcium channels | 12 | 2 | GAIN | 8.6 × 10−04 |
| Reactome-Neurotransmitter release cycle | 28 | 2 | GAIN | 4.7 × 10−03 |
| Reactome-Transmission across chemical synapses | 130 | 4 | GAIN | 1.4 × 10−03 |
| Receptor mediated endocytosis | 33 | 2 | GAIN | 6.5 × 10−03 |
| Regulation of action potential | 17 | 2 | STEP | 3.8 × 10−03 |
| Response to external stimulus | 306 | 6 | STEP | 6.7 × 10−03 |
| ST interleukin 4 pathway | 26 | 2 | STEP | 8.8 × 10−03 |
| System process | 558 | 8 | GAIN | 1.0 × 10−03 |
| Transmembrane receptor activity | 411 | 8 | STEP | 1.9 × 10−03 |
| Transport | 778 | 11 | STEP | 3.6 × 10−03 |
| Voltage-gated calcium channel activity | 18 | 2 | GAIN | 2.0 × 10−03 |
| Voltage-gated calcium channel complex | 15 | 2 | GAIN | 1.4 × 10−03 |
q-value: the value of false discovery rate.
*Number of gene in overlap.
#The dataset of significant gene set.
Figure 2The summary description of the selection of candidate markers for the construction of the genetic risk score model.
GAIN: the Genetic Association Information Network; STEP: the Systematic Treatment Enhancement Program; MACH: The program Markov Chain Haplotyping.