| Literature DB >> 27923348 |
Eleanor Stanley1, Eleni Ioanna Delatola2, Esther Nkuipou-Kenfack3, William Spooner1, Walter Kolch2,4,5, Joost P Schanstra6,7, Harald Mischak8,9, Thomas Koeck3.
Abstract
BACKGROUND: When combined with a clinical outcome variable, the size, complexity and nature of mass-spectrometry proteomics data impose great statistical challenges in the discovery of potential disease-associated biomarkers. The purpose of this study was thus to evaluate the effectiveness of different statistical methods applied for urinary proteomic biomarker discovery and different methods of classifier modelling in respect of the diagnosis of coronary artery disease in 197 study subjects and the prognostication of acute coronary syndromes in 368 study subjects.Entities:
Keywords: Biomarker detection; Classifier modelling; Statistical proteome analysis
Mesh:
Substances:
Year: 2016 PMID: 27923348 PMCID: PMC5139137 DOI: 10.1186/s12859-016-1390-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Numbers of study subjects in Cohorts for biomarker discovery and validation
| Cohort | Discovery | Validation | Validation 1 (0–5 years) | Validation 2 (5–11 years) |
|---|---|---|---|---|
| DICADU | 39 | 21 | ||
| VASCAB | 93 | 44 | ||
| AusDiab | 144 | 74 | 92 | |
| CACTI | 6 | 2 | 50 |
Fig. 1Study design for biomarker identification and validation. CADD, coronary artery disease diagnosis; AMIP, acute myocardial infarction prognostication; ACD, combined coronary artery disease diagnosis and outcome (AMI) prognostication; WT, non-parametric Wilcoxon rank sum test, BDA, binary discriminant analysis; RF, random forests; SVM, support vector machine; DDA, diagonal discriminant analysis; LDA, linear discriminant analysis
Demographic and clinical data of subjects in the biomarker discovery cohort
| Parameter | DICADU control | DICADU case | VASCAB control | VASCAB case | AusDiab control | AusDiab case | CACTI control | CACTI case |
|---|---|---|---|---|---|---|---|---|
| N | 20 | 19 | 46 | 47 | 72 | 72 | 3 | 3 |
| Age | 56 ± 7 | 54 ± 6 | 63 ± 8 | 63 ± 9 | 65 ± 11 | 65 ± 11 | 43 ± 4 | 44 ± 3 |
| Female (%) | 52.6 | 40.0 | 23.6 | 23.4 | 38.9 | 37.5 | 33.3 | 33.3 |
| Gensini plaque score | 0 ± 0 | 42 ± 28 | 0 ± 0 | 80 ± 31 | n.a. | n.a. | n.a. | n.a. |
| Diabetes (%) | 5.0 | 21.1 | 0 | 23.4 | 8.3 | 27.8 | 100 | 100 |
| Current smoker (%) | 20.0 | 21.1 | 4.3 | 6.4 | 4.2 | 22.2 | 33.3 | 33.3 |
| Systolic blood pressure (mm Hg) | 137 ± 18 | 133 ± 15 | 140 ± 17 | 139 ± 25 | 136 ± 19 | 147 ± 21 | 111 ± 10 | 122 ± 14 |
| Diastolic blood pressure (mm Hg) | 81 ± 10 | 78 ± 10 | 82 ± 11 | 79 ± 13 | 70 ± 11 | 76 ± 11 | 80 ± 10 | 87 ± 10 |
| Total cholesterol (mmol/l) | 5.2 ± 1.2 | 5.9 ± 0.3 | 5.6 ± 1.1 | 4.1 ± 0.9 | 6.2 ± 1.1 | 6.0 ± 1.2 | 4.8 ± 0.4 | 5.4 ± 0.7 |
| HDL cholesterol (mmol/l) | 1.2 ± 0.3 | 1.2 ± 0.3 | 1.5 ± 0.4 | 1.2 ± 0.3 | 1.5 ± 0.4 | 1.2 ± 0.4 | 1.3 ± 0.5 | 1.3 ± 0.6 |
| Trigycerides (mmol/l) | 2.0 ± 1.0 | 1.8 ± 0.8 | 1.6 ± 0.9 | 2.2 ± 1.0 | n.a. | n.a. | 1.1 ± 0.8 | 1.1 ± 0.5 |
n.a. not available; Diabetes, type 2 except in CACTI where it is type I
Fig. 2Identified biomarkers using different statistical approaches. CADD, coronary artery disease diagnosis; AMIP, acute myocardial infarction prognostication; ACD, combined coronary artery disease diagnosis and outcome (AMI) prognostication
Diagnostic performance of classifiers modelled by SVM, DDA, LDA, BDA and RF for CADD
The values shown are the areas under the curve of Receiver Operating Characteristic (ROC) curve analyses
CADD coronary artery disease diagnosis, SVM support vector machine, DDA diagonal discriminant analysis, LDA linear discriminant analysis, BDA binary discriminant analysis, RF random forests
*P < 0.05 for CADD WT + RF + BDA vs. CADD t-score and CADD ≥ 3
Prognostic performance of classifiers modelled by SVM, LDA, BDA and RF for AMIP
The values shown are the areas under the curve of Receiver Operating Characteristic (ROC) curve analyses
AMIP acute myocardial infarction prognostication, SVM support vector machine, DDA diagonal discriminant analysis, LDA linear discriminant analysis, BDA binary discriminant analysis, RF random forests, BM biomarker; ≥ 3, biomarkers present in at least 3 out of the 5 biomarker patterns resulting from the different discovery approaches; ≥ 2, biomarkers present in at least 2 out of the 5 biomarker patterns resulting from the different discovery approaches
* P < 0.05 for AMIP BDA vs. AMIP cat-score, AMIP ≥ 3 and AMIP ≥ 2
Diagnostic/prognostic performance of classifiers modelled by SVM, DDA, LDA, BDA and RF for ACD
The values shown are the areas under the curve of Receiver Operating Characteristic (ROC) curve analyses
ACD combined coronary artery disease diagnosis and outcome (AMI) prognostication, SVM support vector machine, DDA diagonal discriminant analysis, LDA linear discriminant analysis, BDA binary discriminant analysis, RF random forests, BM biomarker; ≥ 3, biomarkers present in at least 3 out of the 5 biomarker patterns resulting from the different discovery approaches; ≥ 2, biomarkers present in at least 2 out of the 5 biomarker patterns resulting from the different discovery approaches
Multi-centre cohort classifiers built from all 5616 peptides and selected features using f-score
| Cohort | Size (cases/controls) | All peptides (5616) | Selected peptides | |||||
|---|---|---|---|---|---|---|---|---|
| Total | Discovery | Validation | AUC (discovery) | Accuracy (validation) | Peptide number | AUC (discovery) | Accuracy (validation) | |
| CAD | 101/96 | 66/66 | 35/30 | 0.750 | 60% | 148 | 0.871 | 64.6% |
| AMI in 0–5 y | 113/113 | 75/75 | 38/38 | 0.653 | 63.2% | 154 | 0.873 | 73.7% |
| AMI in 5–11 y | 144/171 | 75/75 | 69/96 | 0.653 | 52.1% | 154 | 0.873 | 61.2% |
| CVD | 214/208 | 141/140 | 73/68 | 0.709 | 61.7% | 651 | 0.805 | 71.6% |
y years, AUC area under the curve of a Receiver Operating Characteristic (ROC) curve analysis
Diagnostic/prognostic performance of classifiers modelled by SVM
| Biomarker patterns | Biomarker number | CADD | AMIP | ACD | |
|---|---|---|---|---|---|
| 0-5 y | 5-11 y | 0-5 y | |||
| CADD t-score + AMIP all BM | 123 | 0.759 | 0.730 | 0.681 | 0.741 |
| CADD BDA + AMIP all BM | 93 | 0.754 | 0.779 | 0.636 | 0.766 |
| CADD WT + RF + BDA + AMIP all BM | 111 | 0.758 | 0.736 | 0.645 | 0.747 |
The values shown are the areas under the curve of Receiver Operating Characteristic (ROC) curve analyses
CADD coronary artery disease diagnosis, AMIP acute myocardial infarction prognostication, ACD combined coronary artery disease diagnosis and outcome (AMI) prognostication, SVM support vector machine, DDA diagonal discriminant analysis, LDA linear discriminant analysis, BDA binary discriminant analysis, RF random forests, BM biomarker; ≥ 3, biomarkers present in at least 3 out of the 5 biomarker patterns resulting from the different discovery approaches; ≥ 2, biomarkers present in at least 2 out of the 5 biomarker patterns resulting from the different discovery approaches