| Literature DB >> 26930047 |
Abstract
Cardiovascular disease (including coronary artery disease and myocardial infarction) is one of the leading causes of death in Europe, and is influenced by both environmental and genetic factors. With the recent advances in genomic tools and technologies there is potential to predict and diagnose heart disease using molecular data from analysis of blood cells. We analyzed gene expression data from blood samples taken from normal people (n = 21), non-significant coronary artery disease (n = 93), patients with unstable angina (n = 16), stable coronary artery disease (n = 14) and myocardial infarction (MI; n = 207). We used a feature selection approach to identify a set of gene expression variables which successfully differentiate different cardiovascular diseases. The initial features were discovered by fitting a linear model for each probe set across all arrays of normal individuals and patients with myocardial infarction. Three different feature optimisation algorithms were devised which identified two discriminating sets of genes, one using MI and normal controls (total genes = 6) and another one using MI and unstable angina patients (total genes = 7). In all our classification approaches we used a non-parametric k-nearest neighbour (KNN) classification method (k = 3). The results proved the diagnostic robustness of the final feature sets in discriminating patients with myocardial infarction from healthy controls. Interestingly it also showed efficacy in discriminating myocardial infarction patients from patients with clinical symptoms of cardiac ischemia but no myocardial necrosis or stable coronary artery disease, despite the influence of batch effects and different microarray gene chips and platforms.Entities:
Mesh:
Year: 2016 PMID: 26930047 PMCID: PMC4773227 DOI: 10.1371/journal.pone.0149475
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A flow chart to describe the classification process.
Fig 2Batch effects and their COMBAT adjustment on merging Nelson and Rothman datasets.
The SR of the optimised feature lists on the NelsonB and Rothman datasets.
| Dataset | Opt1D1 | Opt1D2 | Opt2D1 | Opt2D2 | Opt3D1 | Opt3D2 |
|---|---|---|---|---|---|---|
| NelsonB | 53% | 65% | 95% | 95% | 95% | |
| Rothman | 63% | 70% | 87% | 86% | 87% |
This table shows the classification performance of six optimised lists (optimised using NelsonA) on Nelson and Rothman datasets. Opt1D1 indicates reduced features set after the Optimisation Method 1 on Feature Discovery Method 1, Opt2 = Optimisation Method 2, Opt3 = Optimisation Method 3, D2 = Feature Discovery Method 2.
The classification statistics, including SR, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were recorded after the classification of each dataset.
| Dataset | SR | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| NelsonB1 | 59% | 0.82 | 0.17 | 0.64 | 0.33 |
| NelsonB2 | 65% | 0.73 | 0.5 | 0.73 | 0.5 |
| Rothman1 | 81% | 0.78 | 0.88 | 0.93 | 0.64 |
| Rothman2 | 66% | 0.72 | 0.5 | 0.76 | 0.44 |
| Beata-A-1 | 76% | 0.69 | 0.83 | 0.8 | 0.73 |
| Beata-A-2 | 82% | 0.8 | 0.83 | 0.83 | 0.8 |
| Beata-D-1 | 65% | 0.62 | 0.67 | 0.66 | 0.65 |
| Beata-D-2 | 78% | 0.75 | 0.79 | 0.78 | 0.77 |
| Beata-6M-1 | 59% | 0.6 | 0.56 | 0.59 | 0.58 |
| Beata-6M-2 | 66% | 0.64 | 0.67 | 0.67 | 0.65 |
Upper limit of 95% confidence interval (CI Upper) and lower limit of 95% confidence interval (CI Lower) for SR, sensitivity, specificity, PPV and NPV were recorded after the classification of each dataset.
| Dataset | CI (SR) | CI (Sensitivity) | CI (Specificity) | CI (PPV) | CI (NPV) |
|---|---|---|---|---|---|
| NelsonB1 | 35–83 | 60–100 | -16-50 | 38–90 | -32-10 |
| NelsonB2 | 41–90 | 045–100 | 6–93 | 45–100 | 6–94 |
| Rothman1 | 65–96 | 60–97 | 63–100 | 80–100 | 34–93 |
| Rothman2 | 47–84 | 51–93 | 13–87 | 55–97 | 10–80 |
| Beata-A-1 | 60–92 | 45–93 | 65–100 | 61–100 | 50–95 |
| Beata-A-2 | 67–96 | 58–100 | 63–100 | 62–100 | 59–100 |
| Beata-D-1 | 47–83 | 39–90 | 40–89 | 39–90 | 40–89 |
| Beata-D-2 | 61–93 | 50–97 | 61–100 | 58–100 | 54–98 |
| Beata-6M-1 | 40–77 | 35–87 | 32–82 | 35–85 | 32–85 |
| Beata-6M-1 | 48–83 | 37–88 | 44–94 | 40–93 | 41–90 |
Fig 3The ROC graph is plotted to show the performance of the binary classifiers.
Fig 4A MDS plot created for Beata dataset to show the distribution of its cases and controls.