| Literature DB >> 26599280 |
Tanzeela Khalid1, Raphael Aggio2, Paul White3, Ben De Lacy Costello3, Raj Persad4, Huda Al-Kateb3, Peter Jones3, Chris S Probert2, Norman Ratcliffe3.
Abstract
The aim of this work was to investigate volatile organic compounds (VOCs) emanating from urine samples to determine whether they can be used to classify samples into those from prostate cancer and non-cancer groups. Participants were men referred for a trans-rectal ultrasound-guided prostate biopsy because of an elevated prostate specific antigen (PSA) level or abnormal findings on digital rectal examination. Urine samples were collected from patients with prostate cancer (n = 59) and cancer-free controls (n = 43), on the day of their biopsy, prior to their procedure. VOCs from the headspace of basified urine samples were extracted using solid-phase micro-extraction and analysed by gas chromatography/mass spectrometry. Classifiers were developed using Random Forest (RF) and Linear Discriminant Analysis (LDA) classification techniques. PSA alone had an accuracy of 62-64% in these samples. A model based on 4 VOCs, 2,6-dimethyl-7-octen-2-ol, pentanal, 3-octanone, and 2-octanone, was marginally more accurate 63-65%. When combined, PSA level and these four VOCs had mean accuracies of 74% and 65%, using RF and LDA, respectively. With repeated double cross-validation, the mean accuracies fell to 71% and 65%, using RF and LDA, respectively. Results from VOC profiling of urine headspace are encouraging and suggest that there are other metabolomic avenues worth exploring which could help improve the stratification of men at risk of prostate cancer. This study also adds to our knowledge on the profile of compounds found in basified urine, from controls and cancer patients, which is useful information for future studies comparing the urine from patients with other disease states.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26599280 PMCID: PMC4657998 DOI: 10.1371/journal.pone.0143283
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Demographics for study participants with and without prostate cancer.
| N | Age range in years (median) | PSA range (ng/mL) (median) | No. of smokers (%) | |
|---|---|---|---|---|
| Controls | 43 | 41–81 (63) | 0.8–30 (6.2) | 7 (16) |
| Prostate cancer | 59 | 50–88 (69) | 3.4–647 (10.2) | 10 (17) |
Approaches and R packages applied for feature selection prior to statistical modelling.
| Description | R package::function | Reference |
|---|---|---|
| •Wrapper approach built around random forest | Boruta::Boruta | [ |
| •Linear discriminant analysis with stepwise feature selection | caret::stepLDA | [ |
| •Backwards selection of predictors based on predictor importance ranking | caret::rfe | [ |
| •Wrapper approach built around bagging tree | caret::treebagFuncs | [ |
Fig 1The pipeline of the validation techniques known as repeated 10 fold cross validation and repeated double cross validation.
A Monte-Carlo variation of each technique is achieved by randomising the labels of the testing samples.
Accuracy results of PSA testing for prostate cancer assessed with repeated 10 fold cross validation of Random Forest and Linear Discriminant Analysis (LDA) models.
| Repeated 10 fold cross validation | ||||||
| Model | Min. | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. |
|
| 0.10 | 0.50 | 0.60 | 0.62 | 0.70 | 1.00 |
|
| 0.27 | 0.56 | 0.64 | 0.64 | 0.70 | 1.00 |
| Monte-Carlo 10 fold cross validation | ||||||
| Model | Min. | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. |
|
| 0.10 | 0.36 | 0.40 | 0.43 | 0.50 | 0.73 |
|
| 0.33 | 0.55 | 0.60 | 0.58 | 0.60 | 0.80 |
Accuracy results of PSA testing for prostate cancer assessed using repeated double cross validation of Random Forest and Linear Discriminant Analysis (LDA) models.
| Repeated double cross validation | ||||||||||||
| Accuracy | Sensitivity | Specificity | ||||||||||
| Model | Mean | Min | Median | Max | Mean | Min | Median | Max | Mean | Min | Median | Max |
|
| 0.61 | 0.43 | 0.61 | 0.74 | 0.66 | 0.35 | 0.65 | 0.90 | 0.53 | 0.21 | 0.57 | 0.71 |
|
| 0.63 | 0.53 | 0.63 | 0.74 | 0.87 | 0.58 | 0.89 | 1.00 | 0.31 | 0.07 | 0.29 | 0.71 |
| Monte-Carlo repeated double cross validation | ||||||||||||
| Accuracy | Sensitivity | Specificity | ||||||||||
| Model | Mean | Min | Median | Max | Mean | Min | Median | Max | Mean | Min | Median | Max |
|
| 0.50 | 0.29 | 0.50 | 0.68 | 0.58 | 0.26 | 0.57 | 0.95 | 0.41 | 0.08 | 0.43 | 0.69 |
|
| 0.51 | 0.31 | 0.51 | 0.71 | 0.81 | 0.43 | 0.81 | 1.00 | 0.22 | 0.00 | 0.21 | 0.53 |
Accuracy results of repeated 10 fold cross validation of the Random Forest and Linear Discriminant Analysis (LDA) models built to classify urine samples from patients with prostate cancer and cancer-free controls based on the presence or absence of VOCs.
| Repeated 10 fold cross validation | ||||||
| Model | Min. | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. |
|
| 0.30 | 0.60 | 0.70 | 0.66 | 0.73 | 1.00 |
|
| 0.27 | 0.59 | 0.67 | 0.66 | 0.73 | 1.00 |
| Monte-Carlo 10 fold cross validation | ||||||
| Model | Min. | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. |
|
| 0.00 | 0.45 | 0.50 | 0.51 | 0.60 | 0.73 |
|
| 0.10 | 0.44 | 0.50 | 0.50 | 0.60 | 0.70 |
Accuracy results of repeated double cross validation of the Random Forest and Linear Discriminant Analysis (LDA) models built to classify urine samples from patients with prostate cancer and cancer-free controls based on the presence or absence of VOCs.
| Repeated double cross validation | ||||||||||||
| Accuracy | Sensitivity | Specificity | ||||||||||
| Model | Mean | Min | Median | Max | Mean | Min | Median | Max | Mean | Min | Median | Max |
|
| 0.65 | 0.47 | 0.66 | 0.79 | 0.74 | 0.37 | 0.75 | 0.90 | 0.53 | 0.13 | 0.53 | 0.86 |
|
| 0.63 | 0.44 | 0.64 | 0.76 | 0.75 | 0.35 | 0.77 | 1.00 | 0.47 | 0.13 | 0.50 | 0.79 |
| Monte-Carlo repeated double cross validation | ||||||||||||
| Accuracy | Sensitivity | Specificity | ||||||||||
| Model | Mean | Min | Median | Max | Mean | Min | Median | Max | Mean | Min | Median | Max |
|
| 0.50 | 0.30 | 0.51 | 0.64 | 0.63 | 0.25 | 0.64 | 0.93 | 0.37 | 0.07 | 0.37 | 0.72 |
|
| 0.50 | 0.26 | 0.50 | 0.67 | 0.65 | 0.25 | 0.67 | 0.92 | 0.35 | 0.05 | 0.33 | 0.76 |
Accuracy results of repeated 10 fold cross validation of the Random Forest and Linear Discriminant Analysis (LDA) models built to classify patients with prostate cancer and cancer-free controls based on blood PSA levels and urinary VOCs.
| Repeated 10 fold cross validation (%) | ||||||
| Model | Min. | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. |
|
| 20.00 | 66.67 | 72.73 | 73.69 | 80.00 | 100.00 |
|
| 22.22 | 58.89 | 63.64 | 64.85 | 72.73 | 100.00 |
| Monte-Carlo 10 fold cross validation (%) | ||||||
| Model | Min. | 1st.Qu. | Median | Mean | 3rd.Qu. | Max. |
|
| 10.00 | 45.45 | 55.56 | 55.79 | 66.67 | 90.91 |
|
| 10.00 | 40.00 | 50.00 | 48.00 | 60.00 | 88.89 |
Accuracy results of repeated double cross validation of the Random Forest and Linear Discriminant Analysis (LDA) models built to classify patients with prostate cancer and cancer-free controls based on blood PSA levels and urinary VOCs.
| Repeated double cross validation (%) | ||||||||||||
| Accuracy | Sensitivity | Specificity | ||||||||||
| Model | Mean | Min | Median | Max | Mean | Min | Median | Max | Mean | Min | Median | Max |
|
| 70.88 | 52.94 | 70.59 | 82.86 | 80.16 | 60.00 | 80.00 | 100.00 | 58.23 | 28.57 | 57.14 | 85.71 |
|
| 65.09 | 47.06 | 64.71 | 80.00 | 75.56 | 45.00 | 75.00 | 100.00 | 50.80 | 14.29 | 50.00 | 85.71 |
| Monte-Carlo repeated double cross validation (%) | ||||||||||||
| Accuracy | Sensitivity | Specificity | ||||||||||
| Model | Mean | Min | Median | Max | Mean | Min | Median | Max | Mean | Min | Median | Max |
|
| 50.52 | 26.47 | 50.00 | 73.53 | 64.09 | 35.71 | 64.29 | 94.74 | 36.72 | 5.88 | 37.50 | 64.29 |
|
| 49.89 | 32.35 | 50.00 | 72.73 | 64.70 | 38.46 | 65.00 | 90.00 | 34.70 | 0.00 | 33.33 | 81.82 |
Fig 2Receiver operating characteristic curve (ROC) for the random forest (RF) and linear discriminant analysis (LDA) models built using repeated double cross-validation to classify patients with prostate cancer and cancer-free controls based on PSA levels and VOCs in urine headspace.