| Literature DB >> 24957760 |
Anne-Christin Hauschild1, Till Schneider2, Josch Pauling2, Kathrin Rupp3, Mi Jang3, Jörg Ingo Baumbach3, Jan Baumbach4.
Abstract
Ion mobility spectrometry combined with multi-capillary columns (MCC/IMS) is a well known technology for detecting volatile organic compounds (VOCs). We may utilize MCC/IMS for scanning human exhaled air, bacterial colonies or cell lines, for example. Thereby we gain information about the human health status or infection threats. We may further study the metabolic response of living cells to external perturbations. The instrument is comparably cheap, robust and easy to use in every day practice. However, the potential of the MCC/IMS methodology depends on the successful application of computational approaches for analyzing the huge amount of emerging data sets. Here, we will review the state of the art and highlight existing challenges. First, we address methods for raw data handling, data storage and visualization. Afterwards we will introduce de-noising, peak picking and other pre-processing approaches. We will discuss statistical methods for analyzing correlations between peaks and diseases or medical treatment. Finally, we study up-to-date machine learning techniques for identifying robust biomarker molecules that allow classifying patients into healthy and diseased groups. We conclude that MCC/IMS coupled with sophisticated computational methods has the potential to successfully address a broad range of biomedical questions. While we can solve most of the data pre-processing steps satisfactorily, some computational challenges with statistical learning and model validation remain.Entities:
Year: 2012 PMID: 24957760 PMCID: PMC3901238 DOI: 10.3390/metabo2040733
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Working principle of an Ion Mobility Spectrometer.
Figure 2Workflow of the data processing, data mining and evaluation methods used in clinical breath diagnostics.
Figure 3(a) Visualization of the ion mobility spectrometry (IMS)-chromatogram; (b) Single ion mobility spectrum; (c) Single multi-capillary column (MCC) spectrum.
Figure 4(a) IMS chromatogram; (b) A selected area (green) within the MCC/IMS chromatogram is converted into a three-dimensional plot.
Figure 5MCC/IMS chromatograms of raw (a) smoothed, (b) and de-noised, (c) data, illustrating the remaining information after de-noising and smoothing. 2D side views of raw (d) smoothed, (e) and de-noised chromatograms also show different baselines of the peaks caused by RIP tailing; (f) [41]. Reproduced with permission from Bader et al., International Journal of Ion Mobility Spectrometry published by Springer-Verlag, 2008.
Example of MLN formula emerged from Alchemy’s structure learning (90% accuracy), where pc i (M ) is the presence of peak cluster number i in sample M and bc ( M ) indicates that the sample M originates from a patient suffering from bronchial carcinoma (¬bc ( M ) = healthy control) [76]. Reproduced with permission from Finthammer et al., International Journal of Ion Mobility Spectrometry published by Springer-Verlag, 2010.
| # | Formula | Weight |
|---|---|---|
| 37 |
| 4.43 |
| 39 |
| 4.82 |
| 44 |
| 5.05 |
| 46 |
| −4.30 |
| 47 |
| −8.98 |
| 53 |
| −8.14 |
| 57 |
| 6.38 |
| 61 |
| 7.15 |
| 62 |
| 7.49 |
| 66 |
| −5.62 |
| 68 |
| 4.01 |
| 70 |
| −5.18 |
| 72 |
| 2.45 |
| 75 |
| −2.78 |
| 80 |
| −5.55 |
| 81 |
| 5.61 |
| 82 |
| 8.77 |
| 89 |
| −5.15 |
This shows the ranking of the achievements in MCC/IMS data analysis using computational methods.
| Computational requirements | Completed |
|---|---|
| Data format | *** |
| Visualization | *** |
| Pre-processing methods | ** |
| Peak detection methods | ** |
| Centralized data repository | * |
| Statistical approaches | *** |
| Statistical learning methods | * |
| Differentiation of diseases, infections, cancer, etc. | * |
| Disease pathway identification | - |
“***”accomplished; “**” almost complete; “*” first steps have been made; “-” not solved
An overview of the four studies in Section 4, analyzing MCC/IMS data of different diseases (bronchial carcinoma (BC) and chronic obstructive pulmonary disease (COPD)). The ACC is the accuracy given by the percentage of correctly classified samples, # is the number of samples in that study, the AUC is the area under the receiver operating characteristics (ROC) curve, and CV indicates whether cross validation was used.
| Study | Disease | # | ACC | AUC | CV |
|---|---|---|---|---|---|
| Finthammer | BC | 158 | 90% | - | √ |
| Baumbach | BC | 107 | 99% | 99% | - |
| Westhoff | COPD | 130 | 94% | - | - |
| Hauschild | COPD and BC | 119 | 94% | 92% | √ |