| Literature DB >> 28910313 |
Salome Horsch1, Dominik Kopczynski2, Elias Kuthe3, Jörg Ingo Baumbach4, Sven Rahmann2,3, Jörg Rahnenführer1.
Abstract
MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process.Entities:
Mesh:
Year: 2017 PMID: 28910313 PMCID: PMC5598980 DOI: 10.1371/journal.pone.0184321
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Example for a raw measurement from the first dataset.
The rows of the heatmap represent the retention times and the columns represent 1/K0, a transformation of the drift time. The colors display the signal intensities with increasing values from white over blue and red to yellow.
Quartiles of performance for all datasets over all combinations of peak picking, clustering, statistical classification, and all replications of the cross-validation.
| 1st Qu. | Median | 3rd Qu. | |
|---|---|---|---|
| 0.878 | 0.933 | 0.965 | |
| 0.780 | 0.827 | 0.874 | |
| 0.643 | 0.721 | 0.792 |
Ranks of median AUCs for each combination of peak picking, peak clustering and classification algorithms and ranksum over the three datasets and corresponding mean AUC.
| Rank | |||||||
|---|---|---|---|---|---|---|---|
| Peak | Cluster | Classif | RS | ||||
| SGLTR | DBSCAN | RF | 0.957 | 12 | 3 | 1 | 16 |
| SGLTR | EM | RF | 0.950 | 12 | 2 | 5 | 19 |
| SGLTR | CE | RF | 0.947 | 14 | 7 | 3 | 24 |
| VN | VN | RF | 0.936 | 6 | 19 | 2 | 27 |
| LM | DBSCAN | RF | 0.916 | 2 | 15 | 18 | 35 |
| VN | VN | RF | 0.925 | 3 | 9 | 23 | 35 |
| LM | EM | RF | 0.913 | 5 | 11 | 26 | 42 |
| VN | DBSCAN | RF | 0.919 | 8 | 31 | 4 | 43 |
| SGLTR | EM | GBM | 0.927 | 26 | 5 | 15 | 46 |
| LM | GS | RF | 0.914 | 4 | 4 | 44 | 52 |
| SGLTR | GS | RF | 0.907 | 17 | 12 | 32 | 61 |
| SGLTR | CE | GBM | 0.919 | 48 | 7 | 16 | 71 |
| VN | VN | SVMrbf | 0.897 | 27 | 27 | 20 | 74 |
| VN | EM | RF | 0.889 | 1 | 60 | 17 | 78 |
| VN | DBSCAN | GBM | 0.914 | 50 | 20 | 10 | 80 |
| OPME | DBSCAN | RF | 0.885 | 15 | 46 | 29 | 90 |
| VN | VN | GBM | 0.901 | 31 | 51 | 8 | 90 |
| LM | DBSCAN | GBM | 0.884 | 18 | 13 | 62 | 93 |
| SGLTR | DBSCAN | GBM | 0.921 | 82 | 6 | 11 | 99 |
| VN | CE | RF | 0.894 | 33 | 54 | 12 | 99 |
Fig 2Averaged raw measurements of the first dataset and the consensus peaks identified by the combination of SGLTR/DBSCAN (left) and the manual gold standard VN (right).
Performance for all classification algorithms over all peak picking and peak clustering algorithms and all replications for each dataset.
| SVMlin | SVMrbf | kNN | CT | GBM | RF | |
|---|---|---|---|---|---|---|
| 0.937 | 0.935 | 0.889 | 0.863 | 0.939 | 0.977 | |
| 0.808 | 0.806 | 0.789 | 0.819 | 0.874 | 0.890 | |
| 0.727 | 0.735 | 0.664 | 0.685 | 0.742 | 0.808 |
Number of peaks detected by each peak picking method in the single measurements.
| LM | PME | PDSA | SGLTR | OPME | VN | VN | |
|---|---|---|---|---|---|---|---|
| Median | 13 | 14 | 15 | 42 | 31 | 30 | – |
| Min | 5 | 5 | 3 | 15 | 5 | 0 | – |
| Max | 34 | 54 | 47 | 115 | 69 | 62 | – |
| Median | 12 | 17 | 7 | 49 | 14 | 10 | – |
| Min | 5 | 6 | 0 | 15 | 3 | 0 | – |
| Max | 69 | 69 | 90 | 137 | 61 | 143 | – |
| Median | 29 | 32 | 33 | 45 | 29 | 56 | – |
| Min | 20 | 23 | 22 | 34 | 19 | 25 | – |
| Max | 38 | 53 | 46 | 84 | 41 | 93 | – |
Performance for all peak picking algorithms over all peak clustering and classification algorithms and all replications for each dataset.
| LM | PME | PDSA | SGLTR | OPME | VN | VN | |
|---|---|---|---|---|---|---|---|
| 0.959 | 0.880 | 0.940 | 0.948 | 0.891 | 0.925 | 0.955 | |
| 0.851 | 0.812 | 0.811 | 0.839 | 0.798 | 0.843 | 0.874 | |
| 0.731 | 0.587 | 0.664 | 0.777 | 0.741 | 0.764 | 0.828 |
Numbers of consensus peaks detected by each combination of peak picking and peak clustering, summarized over all measurements in a dataset.
| LM | PME | PDSA | SGLTR | OPME | VN | VN | ||
|---|---|---|---|---|---|---|---|---|
| GS | 42 | 51 | 43 | 138 | 124 | 112 | – | |
| DBSCAN | 25 | 26 | 26 | 28 | 66 | 29 | – | |
| CE | 26 | 23 | 22 | 52 | 35 | 25 | – | |
| EM | 42 | 56 | 44 | 142 | 116 | 98 | – | |
| VN | – | – | – | – | – | 239 | – | |
| VN | – | – | – | – | – | – | 120 | |
| GS | 23 | 25 | 17 | 46 | 11 | 20 | – | |
| DBSCAN | 16 | 14 | 23 | 43 | 16 | 30 | – | |
| CE | 18 | 23 | 26 | 66 | 35 | 37 | – | |
| EM | 19 | 26 | 27 | 67 | 19 | 51 | – | |
| VN | – | – | – | – | – | 239 | – | |
| VN | – | – | – | – | – | – | 224 | |
| GS | 62 | 53 | 62 | 75 | 54 | 132 | – | |
| DBSCAN | 40 | 23 | 43 | 52 | 42 | 56 | – | |
| CE | 34 | 25 | 35 | 54 | 38 | 34 | – | |
| EM | 47 | 57 | 56 | 59 | 56 | 105 | – | |
| VN | – | – | – | – | – | 265 | – | |
| VN | – | – | – | – | – | – | 60 |
Median AUCs over all replications of each classification algorithm and rank sum of median AUCs for each classification algorithm.
| Picking | Clustering | AUC | RS | AUC | RS | AUC | RS |
|---|---|---|---|---|---|---|---|
| LM | GS | 0.957 | 2.67 | 0.868 | 2.50 | 0.662 | 3.67 |
| LM | DBSCAN | 0.970 | 1.50 | 0.825 | 2.50 | 0.736 | 2.50 |
| LM | CE | 0.926 | 4.00 | 0.808 | 3.33 | 0.742 | 2.17 |
| LM | EM | 0.970 | 1.83 | 0.873 | 1.67 | 0.749 | 1.67 |
| PME | GS | 0.868 | 2.67 | 0.836 | 2.00 | 0.637 | 1.50 |
| PME | DBSCAN | 0.945 | 1.00 | 0.825 | 2.83 | 0.568 | 3.00 |
| PME | CE | 0.802 | 3.50 | 0.781 | 3.33 | 0.561 | 2.83 |
| PME | EM | 0.875 | 2.83 | 0.833 | 1.83 | 0.568 | 2.67 |
| PDSA | GS | 0.928 | 2.83 | 0.802 | 3.17 | 0.586 | 3.33 |
| PDSA | DBSCAN | 0.965 | 1.83 | 0.803 | 3.00 | 0.729 | 2.17 |
| PDSA | CE | 0.879 | 4.00 | 0.817 | 2.00 | 0.644 | 2.83 |
| PDSA | EM | 0.977 | 1.33 | 0.817 | 1.83 | 0.720 | 1.67 |
| SGLTR | GS | 0.936 | 3.50 | 0.822 | 3.50 | 0.724 | 3.50 |
| SGLTR | DBSCAN | 0.963 | 2.25 | 0.827 | 2.83 | 0.857 | 1.17 |
| SGLTR | CE | 0.932 | 3.17 | 0.859 | 1.83 | 0.800 | 2.33 |
| SGLTR | EM | 0.971 | 1.08 | 0.840 | 1.83 | 0.777 | 3.00 |
| OPME | GS | 0.897 | 2.83 | 0.699 | 4.00 | 0.762 | 1.83 |
| OPME | DBSCAN | 0.943 | 1.17 | 0.812 | 2.33 | 0.716 | 3.00 |
| OPME | CE | 0.818 | 3.83 | 0.817 | 1.67 | 0.752 | 1.83 |
| OPME | EM | 0.904 | 2.17 | 0.829 | 2.00 | 0.713 | 3.33 |
| VN | GS | 0.904 | 4.50 | 0.793 | 4.50 | 0.697 | 4.33 |
| VN | DBSCAN | 0.924 | 2.83 | 0.876 | 1.83 | 0.829 | 2.00 |
| VN | CE | 0.897 | 3.67 | 0.852 | 3.00 | 0.786 | 3.17 |
| VN | EM | 0.947 | 2.67 | 0.815 | 3.83 | 0.753 | 2.83 |
| VN | VN | 0.962 | 1.33 | 0.892 | 1.83 | 0.774 | 2.67 |
| VN | VN | 0.955 | 1.00 | 0.874 | 1.00 | 0.828 | 1.00 |