| Literature DB >> 29463885 |
Susana I C J Palma1, Ana P Traguedo1, Ana R Porteira1, Maria J Frias1, Hugo Gamboa2, Ana C A Roque3.
Abstract
Non-invasive and fast diagnostic tools based on volatolomics hold great promise in the control of infectious diseases. However, the tools to identify microbial volatile organic compounds (VOCs) discriminating between human pathogens are still missing. Artificial intelligence is increasingly recognised as an essential tool in health sciences. Machine learning algorithms based in support vector machines and features selection tools were here applied to find sets of microbial VOCs with pathogen-discrimination power. Studies reporting VOCs emitted by human microbial pathogens published between 1977 and 2016 were used as source data. A set of 18 VOCs is sufficient to predict the identity of 11 microbial pathogens with high accuracy (77%), and precision (62-100%). There is one set of VOCs associated with each of the 11 pathogens which can predict the presence of that pathogen in a sample with high accuracy and precision (86-90%). The implemented pathogen classification methodology supports future database updates to include new pathogen-VOC data, which will enrich the classifiers. The sets of VOCs identified potentiate the improvement of the selectivity of non-invasive infection diagnostics using artificial olfaction devices.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29463885 PMCID: PMC5820279 DOI: 10.1038/s41598-018-21544-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Research strategy. The workflow was divided in four main tasks, including data collection, input data, machine learning and output data. The selected data available in the literature was organized in a matrix of labels (pathogens) and features (VOCs), and further used as the input for machine learning steps. Feature selection and classification algorithms were implemented using support vector machines (SVM) to determine the set of VOCs that better separates the pathogens, and build a model that predicts the pathogen based on information about the presence/absence of a set of VOCs in a sample.
Figure 2Classes and diversity of VOCs emitted by microbial pathogens. (a) Number of different VOCs from each chemical class. (b) Relative abundance of VOC chemical classes, given by the ratio between the number of hits of VOCs in a chemical class and the number of hits of VOCs in all classes. (c) Graphical representation of pathogen-VOC associations described in the literature (Cytoscape 3.5). Each line represents one hit for a given pathogen-VOC association; the diameters of the circles that represent each chemical class are proportional to the number of different VOCs within that class. (d) Number of hits and chemical class of the 10 most referred VOCs, also highlighted in (a).
Figure 3Graphical representation of the associations between the 11 pathogens with more than 4 experiments and the 702 VOCs identified in the scope of a total of 336 experiments. Each line represents one hit for a given pathogen-VOC association; VOC nodes within the circumference represent VOCs that were reported to be emitted by multiple pathogens, while VOC nodes outside the circle represent VOCs that were reported to be emitted by only one pathogen. The most referred exclusive VOCs per pathogen are indicated outside the graph.
List of VOCs that lead to the best classification results in the identification mode of the classifier, considering the 11 pathogen – 702 VOC dataset and using “leave-one-out” cross validation. The VOCs are listed in descendent order of importance for the performance of the classifier. Classification accuracy improves gradually by the sequential addition of the VOCs in the list to the vector of features that is used to classify the samples.
| Number of VOCs used | VOC | Chemical structure | Added identification accuracy (%) | Cumulative identification accuracy (%) |
|---|---|---|---|---|
| 1 | 1-decanol |
| 49.7 | 49.7 |
| 2 | 3-methylbutanal |
| 3.3 | 53.0 |
| 3 | ethyl acetate |
| 3.0 | 56.0 |
| 4 | 1,3,5-trimethylbenzene |
| 1.7 | 57.7 |
| 5 | 3-methylbutanoic acid |
| 1.8 | 59.5 |
| 6 | indole |
| 2.4 | 61.9 |
| 7 | isopentanol |
| 3.0 | 64.9 |
| 8 | 1-undecene |
| 2.7 | 67.6 |
| 9 | 2-methylbutanal |
| 2.0 | 69.6 |
| 10 | ɣ-butyrolactone |
| 1.5 | 71.1 |
| 11 | 4-methylphenol |
| 0.9 | 72.0 |
| 12 | furan |
| 1.2 | 73.2 |
| 13 | cymol |
| 0.9 | 74.1 |
| 14 | methyl nicotinate |
| 0.9 | 75.0 |
| 15 | cyclohexanone |
| 0.6 | 75.6 |
| 16 | 4-methylpentanoic acid |
| 0.2 | 75.8 |
| 17 | n-butyl acetate |
| 0.4 | 76.2 |
| 18 | 1-butanethiol |
| 0.3 | 76.5 |
Confusion matrix illustrating the prediction results of the classifier in the identification mode for the 11 pathogen – 702 VOC dataset. underlined bold cells represent the incorrect predictions made by the classifier and bold cells represent the correct predictions.
| Predicted pathogen | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
|
| 0 | 0 |
| 0 |
| 0 |
|
|
| 0 | 0 | |
|
| 0 | 0 |
|
| 0 | 0 | 0 | 0 |
| 0 | 0 | |
|
| 0 | 0 | 0 | 0 |
| 0 | 0 |
|
| 0 | 0 | |
|
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | |
|
| 0 | 0 | 0 | 0 |
| 0 |
|
|
| 0 | 0 | |
|
| 0 | 0 |
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | |
|
| 0 | 0 |
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
| 0 | |
|
| 0 | 0 |
| 0 | 0 | 0 | 0 |
| 0 | 0 |
| |
Prediction results of the classifier in the verification mode, for the 11 pathogen – 702 VOC dataset and using “leave-one-out” cross-validation.
| Pathogen | No. of exps. | Verification VOCs set | Base accuracy (%) | Computed accuracy (%) | Computed sensitivity (%) | Computed precision (%) |
|---|---|---|---|---|---|---|
|
| 8 | (E)-3-nonen-2-one | 97.6 | 100 | 100.0 | 100.0 |
|
| 6 | methyl 4-methylpentanoate | 98.2 | 100 | 100.0 | 100.0 |
|
| 84 | indole | 75.0 | 90.1 | 66.7 | 94.9 |
|
| 7 | ɣ-butyrolactone | 97.9 | 99.4 | 85.7 | 85.7 |
|
| 33 | 2,2,4,4-tetramethyloxolane | 90.2 | 91.7 | 15.2 | 100.0 |
|
| 9 | 4-methyldodecane | 97.3 | 100 | 100.0 | 100.0 |
|
| 14 | (1,1-dimethylethoxy)methylbenzene | 95.8 | 95.8 | 0.0 | (b) |
|
| 118 | hydrogen cyanide | 64.9 | 92.9 | 87.3 | 91.9 |
|
| 42 | ethyl 2-methylbutyrate | 87.5 | 90.5 | 26.1 | 91.7 |
|
| 5 | (1,1-dimethylethoxy)methylbenzene | 98.5 | 98.5 | 0.0 | (b) |
|
| 10 | 3-phenylfuran | 97.0 | 98.5 | 50.0 | 100.0 |
(a)By inspection of the publication where it was reported[19], this compound is most likely a contaminant which was misidentified as bacterial VOC. (b)Only the “not the pathogen” class examples was correctly classified, therefore the precision towards the “pathogen” class is not determinable.
Methodology refinement examples, using subgroups of the complete dataset. The classifier was run in verification and identification modes, using “leave-one-out” cross validation. The computed accuracy, sensitivity and precision represent performance measurements of the model.
| Dataset name | Pathogen /Group of pathogens | Verification mode | Identification mode | ||||||
|---|---|---|---|---|---|---|---|---|---|
| VOCs set | Acc. (%) | Sens. (%) | Prec. (%) | VOCs | Av acc. (%) | Av sens. (%) | Av prec. (%) | ||
| Faeces |
| methyl 4-methylpentanoate | 100 | 100 | 100 | isopentanol | 92.4 | 92.8 | 95.1 |
|
| isopentanol | 91.6 | 95.2 | 91.9 | |||||
|
| isopentanol | 92.4 | 80.9 | 91.9 | |||||
| GI | 1-dodecanol | 88.4 | 100 | 77.7 | 1-dodecanol | 86.9 | 87.0 | 87.9 | |
| UTI | 1-decanol | 91.3 | 77.2 | 94.4 | |||||
| Others | 1-octanol (dimer) | 82.6 | 57.8 | 73.3 | |||||
| Clinical Samples and clinical isolates |
| 1-decanol | 87.1 | 48.0 | 100.0 | 1-decanol | 86.1 | 79.4 | 85.7 |
|
| ɣ-butyrolactone | 99.0 | 83.3 | 100.0 | |||||
|
| 1,3,5-trimethylbenzene | 100.0 | 100.0 | 100.0 | |||||
|
| trimethylamine | 98.0 | 81.8 | 100.0 | |||||
|
| 2-aminoacetophenone | 96.0 | 95.1 | 95.1 | |||||
|
| 3-methyl-4-(1-methylethenyl)cyclohexane | 98.0 | 60.0 | 100.0 | |||||
|
| 2-butenal | 98.0 | 71.4 | 100.0 | |||||
| Breath and respiratory fluid clinical samples and clinical isolates |
| ɣ-butyrolactone | 100.0 | 100.0 | 100.0 | 1,3-butadiene | 100.0 | 100.0 | 100.0 |
|
| 2,3-butanedione | 100.0 | 100.0 | 100.0 | |||||
|
| 1,3-butadiene | 100.0 | 100.0 | 100.0 | |||||
Acc: computed accuracy; Av acc: average computed accuracy; Sens: computed sensitivity; Av sens: average computed sensitivity; Prec: computed precision; Av prec: average computed precision.