| Literature DB >> 25919366 |
Benjamin A Neely1, Jason A Ferrante2, J Mauro Chaves1, Jennifer L Soper3, Jonas S Almeida4, John M Arthur5, Frances M D Gulland3, Michael G Janech6.
Abstract
Domoic acid toxicosis (DAT) in California sea lions (Zalophus californianus) is caused by exposure to the marine biotoxin domoic acid and has been linked to massive stranding events and mortality. Diagnosis is based on clinical signs in addition to the presence of domoic acid in body fluids. Chronic DAT further is characterized by reoccurring seizures progressing to status epilepticus. Diagnosis of chronic DAT is often slow and problematic, and minimally invasive tests for DAT have been the focus of numerous recent biomarker studies. The goal of this study was to retrospectively profile plasma proteins in a population of sea lions with chronic DAT and those without DAT using two dimensional gel electrophoresis to discover whether individual, multiple, or combinations of protein and clinical data could be utilized to identify sea lions with DAT. Using a training set of 32 sea lion sera, 20 proteins and their isoforms were identified that were significantly different between the two groups (p<0.05). Interestingly, 11 apolipoprotein E (ApoE) charge forms were decreased in DAT samples, indicating that ApoE charge form distributions may be important in the progression of DAT. In order to develop a classifier of chronic DAT, an independent blinded test set of 20 sea lions, seven with chronic DAT, was used to validate models utilizing ApoE charge forms and eosinophil counts. The resulting support vector machine had high sensitivity (85.7% with 92.3% negative predictive value) and high specificity (92.3% with 85.7% positive predictive value). These results suggest that ApoE and eosinophil counts along with machine learning can perform as a robust and accurate tool to diagnose chronic DAT. Although this analysis is specifically focused on blood biomarkers and routine clinical data, the results demonstrate promise for future studies combining additional variables in multidimensional space to create robust classifiers.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25919366 PMCID: PMC4412824 DOI: 10.1371/journal.pone.0123295
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Training set characteristics table.
| Total | DAT | Non-DAT | |
|---|---|---|---|
|
| 32 | 12 | 20 |
| male | 14 | 3 (25%) | 11 (55%) |
| female | 18 | 9 (75%) | 9 (45%) |
|
| |||
|
| 1 | 0 (0%) | 1 (5%) |
|
| 2 | 1 (8%) | 1 (5%) |
|
| 7 | 2 (17%) | 5 (25%) |
|
| 9 | 3 (25%) | 6 (30%) |
|
| 13 | 6 (50%) | 7 (35%) |
|
| 18 | 8 (67%) | 10 (50%) |
|
| |||
|
| - | 8 | |
|
| - | 2 | |
|
| - | 3 | |
|
| - | 6 | |
|
| - | 1 | |
Descriptive data of sea lions in the training set.
Hematological parameters of the training set.
| DAT | non-DAT | ||
|---|---|---|---|
| Parameter | Avg ± S.D. (n) | Avg ± S.D. (n) |
|
| WBC (103/mm3) | 13.4 ± 4.7 (n = 11) | 17.0 ± 7.0 (n = 17) | 0.095 |
| RBC (106/mm3) | 5.0 ± 0.6 (n = 11) | 4.2 ± 0.5 (n = 19) | 0.002* |
| HGB (g/dL) | 17.8 ± 2.0 (n = 11) | 14.8 ± 1.8 (n = 19) | 0.001* |
| HCT (%) | 47.6 ± 12.5 (n = 11) | 43.0 ± 5.9 (n = 19) | 0.078 |
| PLT (fL) | 407.9 ± 163.1 (n = 11) | 473.8 ± 188.7 (n = 18) | 0.252 |
| MCV (pg) | 101.6 ± 6.1 (n = 11) | 102.2 ± 4.7 (n = 19) | 0.729 |
| MCH (g/dL) | 35.5 ± 1.7 (n = 11) | 35.4 ± 1.7 (n = 19) | 0.763 |
| MCHC (103/mm3) | 35.3 ± 1.8 (n = 11) | 34.6 ± 1.3 (n = 19) | 0.438 |
| RDW (%) | 15.6 ± 0.8 (n = 11) | 15.6 ± 1.1 (n = 19) | 0.846 |
| MPV (fL) | 7.9 ± 1.0 (n = 11) | 8.4 ± 1.0 (n = 19) | 0.149 |
| Seg (#/mm3) | 8875 ± 4328 (n = 11) | 12309 ± 7200 (n = 17) | 0.221 |
| Band (#/mm3) | 57 ± 85 (n = 11) | 1172 ± 2207 (n = 17) | 0.063 |
| Lymph (#/mm3) | 3339 ± 1247 (n = 11) | 2756 ± 1602 (n = 17) | 0.371 |
| Mono (#/mm3) | 160 ± 175 (n = 11) | 227 ± 458 (n = 17) | 0.443 |
| Eos (#/mm3) | 936 ± 461 (n = 11) | 545 ± 566 (n = 17) | 0.024* |
Blood data is based on analysis of blood drawn the same day as samples used for analysis. Complete blood count analyses of the 32 samples used in the training set are listed. Some parameters were not measured for all individuals and are indicated by ‘n’. Hematological parameters were compared between chronic domoic acid toxicosis (DAT) and non-DAT groups using a Wilcoxon rank-sum test with normal approximation, and significance (p<0.05) is highlighted by an ‘*’. Abbreviations: WBC, white blood cell count; RBC, red blood cell count; HGB, hemoglobin; HCT, hematocrit; PLT, platelet; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; RDW, red cell distribution width; MPV, mean platelet volume; Seg, segmented neutrophils; Band, band cells; Lymph, lymphocytes; Mono, monocytes; Eos, eosinophils.
Fig 1Volcano plot of all 618 spots in the training set.
Spot intensities were compared between chronic DAT and non-DAT samples using a Wilcoxon rank sum test and the-log10 p-value and log2 fold-change (DAT/non-DAT) values were used to generate a volcano plot. Red squares indicate spots with a p<0.05 (n = 93).
Fig 2Two-dimensional gel electrophoresis.
Depleted serum from 32 sea lions was separated using 2-dimensional gel electrophoresis. Shown is a representative gel with 49 spots of interest that were used for downstream analysis.
Gel spot protein identifications and statistics.
| Spot # | fold Δ |
| AuROC | UniProt ID | Protein Name |
|---|---|---|---|---|---|
| 3533 | -3.36 | 0.045 | 0.717 | Q7M2U7 | ApoE |
| 3486 | -3.19 | 0.002 | 0.842 | Q7M2U7 | ApoE |
| 3442 | -3.18 | 0.003 | 0.817 | Q7M2U7 | ApoE |
| 2134 | -3.15 | 0.001 | 0.863 | Q7M2U7 | ApoE |
| 3436 | -2.76 | 0.003 | 0.821 | Q7M2U7 | ApoE |
| 1409 | -2.69 | 0.002 | 0.829 | F1PDJ9 | Unknown |
| 2159 | -2.66 | 0.0001 | 0.917 | Q7M2U7 | ApoE |
| 3453 | -2.64 | 0.010 | 0.779 | Q7M2U7 | ApoE |
| 3454 | -2.63 | 0.005 | 0.804 | Q7M2U7 | ApoE |
| 3452 | -2.55 | 0.008 | 0.788 | Q7M2U7 | ApoE |
| 2451 | -2.42 | 0.005 | 0.800 | E2QYU2 | Clusterin (ApoJ) |
| 3446 | -2.20 | 0.006 | 0.796 | Q7M2U7 | ApoE |
| 2721 | -2.04 | 0.001 | 0.846 | G1LGY8 | Immunoglobulin J chain |
| 2583 | -2.00 | 0.015 | 0.763 | Q7M2U7 | ApoE |
| 3465 | -1.96 | 0.009 | 0.783 | D2HC79 | ApoA-IV |
| 3536 | -1.96 | 0.013 | 0.767 | G1LGY8 | Unknown |
| 3501 | -1.89 | 0.004 | 0.813 | E2QYB2 | Unknown |
| 3535 | -1.80 | 0.028 | 0.738 | D2HC79 | ApoA-IV |
| 3252 | -1.38 | 0.017 | 0.758 | D2HC77 | ApoA-1 |
| 2452 | -1.36 | 0.015 | 0.763 | G1KZV4 | Clusterin (ApoJ) |
| 3396 | 2.69 | 0.013 | 0.767 | F1MKC4 | Actin |
| 1044 | 1.93 | 0.017 | 0.758 | G1MKE1 | Carboxypeptidase N subunit 2 |
| 3395 | 1.93 | 0.017 | 0.758 | F1PQL8 | Actin, cytoplasmic 1 |
| 1464 | 1.86 | 0.017 | 0.758 | A2VE41 | EGF-containing fibulin-like extracellular matrix protein 1 |
| 3534 | 1.77 | 0.003 | 0.821 | G1MAY6 | Antithrombin-III |
| 1515 | 1.71 | 0.005 | 0.800 | Q2Y099 | Vitronectin |
| 3019 | 1.62 | 0.019 | 0.754 | G1L6A5 | Complement C4-A |
| 1686 | 1.60 | 0.004 | 0.808 | G1MJI1 | Fibrinogen gamma chain |
| 1651 | 1.60 | 0.013 | 0.767 | G1LHM6 | Vitamin D-binding protein |
| 3150 | 1.58 | 0.021 | 0.750 | F1PH71 | Hemoglobin subunit gamma |
| 1628 | 1.55 | 0.013 | 0.767 | F1P8G0 | Fibrinogen gamma chain |
| 2789 | 1.50 | 0.006 | 0.796 | G3X8D7 | Glutathione peroxidase |
| 1528 | 1.41 | 0.017 | 0.758 | G1LEJ5 | Albumin |
| 3173 | 1.36 | 0.041 | 0.721 | D2HC77 | ApoA-1 |
Of the 49 spots of interest, 34 could be identified. Fold change (Δ) is DAT/non-DAT, with p-values calculated using a Wilcoxon rank sum test. The area under the ROC curve (AuROC) is also given. The protein ID (UniProt ID) is of the most abundant protein within each spot. If an uncharacterized protein had no apparent homology to a known protein using Blastp, it is listed as ‘unknown’. Overall, 20 different proteins and their charge forms were identified. A complete version of this table exists in S1 Table.
Statistical performance of models to discriminate DAT cases in an independent test set.
| Input Data | Model Type | Threshold | Sensitivity Test Set | Specificity Test Set | trAuROC | trSens | trSpec |
|---|---|---|---|---|---|---|---|
| Spot 2159 | ROC curve | minMC | 100% | 38.5% | 0.917 | 91.7% | 85.0% |
| Spot 3486 | ROC curve | minMC | 85.7% | 69.2% | 0.842 | 75.0% | 90.0% |
| Spot 3486 | ROC curve | opT | 100% | 61.5% | 0.842 | 83.3% | 80.0% |
| Eosinophil Count | ROC curve | minMC | 57.1% | 92.3% | 0.759 | 72.7% | 76.5% |
| 49 spots | CANN101 | minMC | 100% | 23.1% | 1.000 | 100% | 100% |
| 2159 + 3486 | CANN101 | minMC | 100% | 61.5% | 1.000 | 100% | 100% |
| 2159 + Eos | CANN101 | minMC | 85.7% | 84.6% | 1.000 | 100% | 100% |
| 3486 + Eos | CANN101 | minMC | 85.7% | 84.6% | 1.000 | 100% | 100% |
| 2159 + 3486 + Eos | CANN101 | minMC | 85.7% | 76.9% | 1.000 | 100% | 100% |
| 2159 + 3486 + Eos | CT | NA | 85.7% | 84.6% | NA | 90.9% | 100% |
| 2159 + 3486 + Eos |
| NA | 85.7% | 69.2% | NA | 72.7% | 89.5% |
| 2159 + 3486 + Eos | RF | NA | 85.7% | 69.2% | NA | 100% | 100% |
|
|
|
|
|
|
|
|
|
An AuROC for the test set was not determined since this set was only used to qualify models derived from the training set. The top performing model, SVM using spots 2159, 3486 and eosinophil counts, is bolded. NA, Not Applicable. Abbreviations: ROC curve, receiver operator characteristic curve; CANN, combined artificial neural networks; CT, classification tree; k-NN, k-nearest neighbor; RF, random forest; SVM, support vector machine; minMC, minimum mis-classified; opT, optimum threshold; AuROC, area under the ROC curve; tr prefix, values are from training set.
Test set characteristics table.
| Total | DAT | Non-DAT | |
|---|---|---|---|
|
| 20 | 7 | 13 |
| male | 10 | 3 (43%) | 7 (54%) |
| female | 10 | 4 (57%) | 6 (46%) |
|
| |||
|
| 0 | 0 (0%) | 0 (0%) |
|
| 2 | 0 (0%) | 2 (15%) |
|
| 4 | 1 (14%) | 3 (23%) |
|
| 6 | 2 (29%) | 4 (31%) |
|
| 8 | 4 (57%) | 4 (31%) |
|
| 9 | 7 (100%) | 2 (15%) |
|
| |||
|
| - | 3 | |
|
| - | 3 | |
|
| - | 0 | |
|
| - | 4 | |
|
| - | 3 | |
Descriptive data of sea lions in the test set.
Hematological parameters of the test set.
| DAT | non-DAT | ||
|---|---|---|---|
| Parameter | Avg ± S.D. (n) | Avg ± S.D. (n) |
|
| WBC (103/mm3) | 13.0 ± 4.5 (n = 7) | 21.0 ± 8.9 (n = 13) | 0.014* |
| RBC (106/mm3) | 4.3 ± 0.5 (n = 7) | 4.2 ± 0.8 (n = 11) | 1.000 |
| HGB (g/dL) | 15.0 ± 2.1 (n = 7) | 14.3 ± 3.2 (n = 11) | 0.412 |
| HCT (%) | 44.0 ± 4.7 (n = 7) | 41.6 ± 9.3 (n = 13) | 0.428 |
| PLT (fL) | 475.3 ± 182.8 (n = 7) | 408.0 ± 164.7 (n = 11) | 0.860 |
| MCV (pg) | 103.3 ± 2.0 (n = 7) | 99.9 ± 5.0 (n = 11) | 0.079 |
| MCH (g/dL) | 35.0 ± 2.1 (n = 7) | 33.7 ± 1.5 (n = 11) | 0.099 |
| MCHC (103/mm3) | 33.8 ± 2.0 (n = 7) | 33.7 ± 0.8 (n = 11) | 0.218 |
| RDW (%) | 15.7 ± 1.1 (n = 7) | 15.8 ± 0.6 (n = 11) | 0.740 |
| MPV (fL) | 7.7 ± 0.7 (n = 7) | 8.0 ± 0.7 (n = 11) | 0.441 |
| Seg (#/mm3) | 9564 ± 4165 (n = 7) | 16769 ± 8229 (n = 13) | 0.032* |
| Band (#/mm3) | 192 ± 353 (n = 7) | 1118 ± 1637 (n = 13) | 0.089 |
| Lymph (#/mm3) | 2027 ± 1118 (n = 7) | 2381 ± 1075 (n = 13) | 0.219 |
| Mono (#/mm3) | 176 ± 205 (n = 7) | 548 ± 559 (n = 13) | 0.077 |
| Eos (#/mm3) | 996 ± 637 (n = 7) | 153 ± 260 (n = 13) | 0.005* |
Blood data is based on analysis of blood drawn the same day as samples used for analysis. Complete blood count analyses of the 20 samples used in the test set are listed. Some parameters were not measured for all individuals and are indicated by ‘n’. Hematological parameters were compared between chronic domoic acid toxicosis (DAT) and non-DAT groups using a Wilcoxon rank-sum test with normal approximation, and significance (p<0.05) is highlighted by an ‘*’. Abbreviations: WBC, white blood cell count; RBC, red blood cell count; HGB, hemoglobin; HCT, hematocrit; PLT, platelet; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; RDW, red cell distribution width; MPV, mean platelet volume; Seg, segmented neutrophils; Band, band cells; Lymph, lymphocytes; Mono, monocytes; Eos, eosinophils.