| Literature DB >> 29696139 |
Carlos Fernando Odir Rodrigues Melo1, Luiz Claudio Navarro2, Diogo Noin de Oliveira1, Tatiane Melina Guerreiro1, Estela de Oliveira Lima1, Jeany Delafiori1, Mohamed Ziad Dabaja1, Marta da Silva Ribeiro1, Maico de Menezes1, Rafael Gustavo Martins Rodrigues1, Karen Noda Morishita1, Cibele Zanardi Esteves1, Aline Lopes Lucas de Amorim1, Caroline Tiemi Aoyagui1, Pierina Lorencini Parise3, Guilherme Paier Milanez3, Gabriela Mansano do Nascimento3, André Ricardo Ribas Freitas4,5, Rodrigo Angerami6, Fábio Trindade Maranhão Costa3, Clarice Weis Arns3, Mariangela Ribeiro Resende6, Eliana Amaral7, Renato Passini Junior7, Carolina C Ribeiro-do-Valle7, Helaine Milanez7, Maria Luiza Moretti6, Jose Luiz Proenca-Modena3, Sandra Avila2, Anderson Rocha2, Rodrigo Ramos Catharino1.
Abstract
Recent Zika outbreaks in South America, accompanied by unexpectedly severe clinical complications have brought much interest in fast and reliable screening methods for ZIKV (Zika virus) identification. Reverse-transcriptase polymerase chain reaction (RT-PCR) is currently the method of choice to detect ZIKV in biological samples. This approach, nonetheless, demands a considerable amount of time and resources such as kits and reagents that, in endemic areas, may result in a substantial financial burden over affected individuals and health services veering away from RT-PCR analysis. This study presents a powerful combination of high-resolution mass spectrometry and a machine-learning prediction model for data analysis to assess the existence of ZIKV infection across a series of patients that bear similar symptomatic conditions, but not necessarily are infected with the disease. By using mass spectrometric data that are inputted with the developed decision-making algorithm, we were able to provide a set of features that work as a "fingerprint" for this specific pathophysiological condition, even after the acute phase of infection. Since both mass spectrometry and machine learning approaches are well-established and have largely utilized tools within their respective fields, this combination of methods emerges as a distinct alternative for clinical applications, providing a diagnostic screening-faster and more accurate-with improved cost-effectiveness when compared to existing technologies.Entities:
Keywords: Zika diagnosis; Zika virus; diagnosis classifier; diseases diagnosis; feature importance; high resolution mass spectrometry; machine learning; random forest
Year: 2018 PMID: 29696139 PMCID: PMC5904215 DOI: 10.3389/fbioe.2018.00031
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Summary of the specimens included in the study regarding demographic information, clinical conditions, and results from reverse-transcriptase polymerase chain reaction (RT-PCR) performed during the high viremia period.
| Zika virus (ZIKV) symptomatic and current infected | ZIKV 1 month after infection | Symptomatic, but not ZIKV | Symptomatic dengue RT-PCR+ | Healthy, asymptomatic more than 30 days | |
|---|---|---|---|---|---|
| RT-PCR | + | + | − | − | − |
| Positive/Negative | Positive | Positive | Negative | Negative | Negative |
| Demographics | |||||
| Male | 27 | 23 | 48 | 25 | 6 |
| Female | 16 | 16 | 16 | 21 | 5 |
| Total of specimens | 43 | 39 | 64 | 46 | 11 |
| Mean age (median) | 33.23 (33) | 32.85 (32.2) | 32.53 (31) | 33.21 (33) | 32.76 (30) |
Figure 1Number of trees given by grid search as function of vector length. Cross marks inside the chart denotes values evaluated during the grid search. Lines 1, 2, and 3 correspond to functions as expressed in Table 2 used to compute the number of trees on the evaluation of discriminant features reduction.
Comparison of the most discriminant 10-round training and validation results using the three selected equations for the number of trees in each iteration as function of the ranked vector length.
| Number of trees equation (ν = vector length) | Max[40,sqrt(ν)] | 230 | 32 + [log2(ν)/2.sqrt(ν)] | |||
|---|---|---|---|---|---|---|
| Grid chart line 1 | Grid chart line 2 | Grid chart line 3 | ||||
| μ | σ | μ | σ | μ | σ | |
| Best vector length | 42 | 59 | 93 | |||
| Accuracy | 96.54% | 3.58% | 96.03% | 2.61% | 96.12% | 2.00% |
| Sensitivity | 97.74% | 3.66% | 97.74% | 3.66% | 96.99% | 3.71% |
| Specificity | 95.34% | 5.23% | 94.31% | 5.81% | 95.26% | 3.79% |
| Precision | 93.99% | 6.29% | 92.82% | 6.46% | 93.66% | 4.61% |
| NPV | 98.46% | 2.50% | 98.55% | 2.34% | 98.02% | 2.31% |
| F1Score | 95.74% | 4.23% | 95.03% | 3.17% | 95.18% | 2.42% |
| F1Neg | 96.82% | 3.38% | 96.26% | 2.78% | 96.55% | 1.78% |
| Green | Metric’s best value | |||||
| Rose | Metric’s worst value | |||||
Figure 2(A) Iterative process to determine the most discriminant ranked features. (B) Visualization of vectors with spectral signature features (length 42) using t-SNE technique. Vectors corresponding to positive Zika virus-infected patients are separated into two categories: acute phase and 1 month after infection.
Figure 3(A) Ranked features SD range in log scale for Zika virus (ZIKV) positive and control group (negative) vectors. The green highlight identifies the marker features for ZIKV, selected using the rationale of Δj > 40%. (B) Example of probability distribution and cumulative distribution charts for the main ranked feature for ZIKV, ion m/z 1,295.6 (Ganglioside); the rationale for Δj calculation is given on the right chart.
Zika virus (ZIKV) diagnosis classifier’s tests results.
| Metric | Formula | 10 rounds Validation tests | Blind final test | |
|---|---|---|---|---|
| Mean | σ | |||
| Feature vector length | 42 | 42 | ||
| Real positives | 15 | |||
| Real negatives | 24 | |||
| Predicted positives | 15 | |||
| Predicted negatives | 24 | |||
| True negatives | 23 | |||
| False positives | 1 | |||
| False negatives | 1 | |||
| True positives | 14 | |||
| Accuracy | 96.54% | 3.58% | 94.49% | |
| Sensitivity | 97.74% | 3.66% | 93.33% | |
| Specificity | 95.34% | 5.23% | 95.65% | |
| Precision | 93.99% | 6.29% | 93.33% | |
| Negative Predicted value | 98.46% | 2.50% | 95.65% | |
| F1Score | 95.74% | 4.23% | 93.33% | |
Comparison of 10-round training and validation results between classifiers using same datasets for the full-length input vectors and for the signature features selected by the reduction method proposed in the article.
| SVM | Tree | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sequential minimal optimization | Iterative single data algorithm | Random forest | Gini’s diversity index | Deviance | |||||||
| μ | σ | μ | σ | μ | σ | μ | σ | μ | σ | ||
| Accuracy | 90.16% | 5.96% | 90.84% | 6.28% | 94.19% | 3.59% | 89.62% | 5.60% | 90.07% | 5.91% | |
| Sensitivity | 87.88% | 11.58% | 89.25% | 9.62% | 94.06% | 4.81% | 87.41% | 9.37% | 88.83% | 8.10% | |
| Specificity | 92.44% | 7.58% | 92.44% | 7.58% | 94.31% | 5.25% | 91.83% | 5.47% | 91.31% | 4.23% | |
| Precision | 89.74% | 10.05% | 89.58% | 10.34% | 92.45% | 6.29% | 88.32% | 7.41% | 87.42% | 6.22% | |
| NPV | 92.54% | 6.78% | 93.24% | 6.05% | 95.93% | 3.13% | 91.59% | 5.89% | 92.31% | 5.42% | |
| F1Score | 88.08% | 7.50% | 89.00% | 7.73% | 93.11% | 4.21% | 87.54% | 6.49% | 88.08% | 6.91% | |
| F1Neg | 92.17% | 4.59% | 92.63% | 5.14% | 95.03% | 3.27% | 91.59% | 4.41% | 91.79% | 4.63% | |
| Accuracy | 93.13% | 2.80% | 93.42% | 4.05% | 96.54% | 3.58% | 91.22% | 3.54% | 91.24% | 4.60% | |
| Sensitivity | 93.93% | 5.10% | 92.45% | 5.20% | 97.74% | 3.66% | 89.60% | 3.62% | 89.60% | 6.28% | |
| Specificity | 92.34% | 5.02% | 94.39% | 5.66% | 95.34% | 5.23% | 92.84% | 5.01% | 92.89% | 4.86% | |
| Precision | 89.91% | 6.34% | 92.39% | 7.47% | 93.99% | 6.29% | 89.87% | 6.56% | 89.81% | 7.14% | |
| NPV | 95.89% | 3.07% | 94.93% | 3.40% | 98.46% | 2.50% | 92.83% | 2.68% | 92.92% | 4.25% | |
| F1Score | 91.65% | 3.26% | 92.24% | 4.84% | 95.74% | 4.23% | 89.64% | 4.31% | 89.57% | 5.60% | |
| F1Neg | 93.98% | 2.62% | 94.58% | 3.66% | 96.82% | 3.38% | 92.78% | 3.32% | 92.84% | 3.74% | |
| Green | Metric’s best value | ||||||||||
| Rose | Metric’s worst value | ||||||||||