| Literature DB >> 31590669 |
Emmanuel P Mwanga1, Elihaika G Minja2, Emmanuel Mrimi2, Mario González Jiménez3, Johnson K Swai2, Said Abbasi2, Halfan S Ngowo2,4, Doreen J Siria2, Salum Mapua2,5, Caleb Stica2, Marta F Maia6,7, Ally Olotu6,8, Maggy T Sikulu-Lord9,10, Francesco Baldini4, Heather M Ferguson4, Klaas Wynne3, Prashanth Selvaraj11, Simon A Babayan4, Fredros O Okumu12,13,14.
Abstract
BACKGROUND: Epidemiological surveys of malaria currently rely on microscopy, polymerase chain reaction assays (PCR) or rapid diagnostic test kits for Plasmodium infections (RDTs). This study investigated whether mid-infrared (MIR) spectroscopy coupled with supervised machine learning could constitute an alternative method for rapid malaria screening, directly from dried human blood spots.Entities:
Keywords: Attenuated total reflection-Fourier Transform Infrared spectrometer; Dried blood spots; Ifakara Health Institute; Malaria diagnosis; Mid-infrared spectroscopy; PCR; Plasmodium; Supervised machine learning
Mesh:
Year: 2019 PMID: 31590669 PMCID: PMC6781347 DOI: 10.1186/s12936-019-2982-9
Source DB: PubMed Journal: Malar J ISSN: 1475-2875 Impact factor: 2.979
Fig. 1Map showing study villages in Kilombero and Ulanga districts, southeastern Tanzania (courtesy of Alex J Limwagu)
Fig. 2Schematic illustration of specific processes of: a collection of blood specimens and preparation of DBS on filter papers, scanning on mid-infrared spectrometer, and recording sample spectra; b data splitting, model training, cross-validation and evaluation of performance of final model (supervised machine learning process)
Fig. 3Averaged mid-infrared spectra obtained from dried blood spots confirmed by PCR as Plasmodium positive (Npositives = 114) or Plasmodium negative (Nnegatives = 146). Assignations of biochemical properties of different wavelengths are shown in Table 1
Biochemical properties associated with peaks in the mid-infrared spectra obtained from dried blood spots in Fig. 3
| Wave number (cm−1) | Vibrational mode | Component identification |
|---|---|---|
| 3600–3000 | N–H stretching | Amides (proteins, haemoglobin), urea |
| O–H stretching | Alcohols carbohydrates, cellulose | |
| 3332 | O–H stretching | Alcohol cellulose |
| 3293 | N–H stretching | Amide (proteins, hemoglobin) |
| 3272 | O–H stretching | Cellulose |
| 3000–2800 | C–H stretching | Lipids, amino acids, carbohydrates |
| 2894 | C–H stretching | Cellulose, carbohydrates |
| 1700–1600 | C=O stretching | Amides (proteins, haemoglobin), urea |
| 1540 | N–H bending coupled to C–N stretching | Amides (proteins, haemoglobin) |
| 1457 | CH3 bending | Amino acids, lipids |
| 1400–1310 | C–H stretching | Lipids, amino acids, carbohydrates |
| 1307 | C–N stretching | Amides (proteins, haemoglobin) |
| 1165–1110 | C–O–C stretching | Ethers (cellulose, carbohydrates) |
| 1070–950 | C–O stretching | Alcohols (cellulose, carbohydrates, amino-acids) |
| =C–H bending | Haem group; haemoglobin | |
| 730 | C–H bending | Lipids, amino acids, carbohydrates |
Fig. 4a Percentage prediction accuracies and precisions for different classification models, based on PCR test results as reference. Models compared included k-nearest neighbours (KNN), logistic regression (LR), support vector machines (SVM), naïve Bayes (NB), XGBoost (XGB), random forest (RF), Multilayer perceptron (MLP). Logistic regression (LR) was the best performing model; b distribution of per class accuracies obtained by final LR classifiers and standard deviation from 70 bootstrapped models in predicting PCR test results from MIR spectral data. In both figures, accuracy refers to the how high the percentage prediction for each individual classification method is, while precision implies the statistical variation around those predictions
Fig. 5a Averaged proportions of correct predictions of PCR-confirmed Plasmodium falciparum-infected individuals achieved during the training of the models; b averaged proportions of correct predictions of Plasmodium falciparum-infected individuals achieved when the final model is challenged the previously unseen validation spectra
Performance of mid-infrared spectroscopy coupled with logistic regression, and the RDT (i.e. SD BIOLINE malaria Ag P.f/Pan 05FK60), both compared to PCR for identifying Plasmodium falciparum-infected individuals from the validation set
| PCR | Total | % sensitivity | % specificity | % positive predictive value | % negative predictive value | |||
|---|---|---|---|---|---|---|---|---|
| Positive | Negative | |||||||
| Mid-infrared spectroscopy and machine learning (MIR-ML) | Positive | 26 | 2 | 28 | 92.8 | 91.7 | 92.8 | 91.7 |
| Negative | 2 | 22 | 24 | |||||
| 28 | 24 | 52 | ||||||
Fig. 6Plots of 20 most dominant spectral features (wave numbers) influencing model prediction of Plasmodium-infection status of the dried blood spot specimens. The positive coefficients are those most predictive of Plasmodium positive specimens while the negative coefficients are those most predictive of Plasmodium-negative specimen. All the top 20 features are found in the fingerprint region (1730 cm−1–883 cm−1)