| Literature DB >> 36180590 |
A Godmer1,2, J Bigot3, Q Giai Gianetto4,5, Y Benzerara6, N Veziris6,7, A Aubry7,8, J Guitard3, C Hennequin3.
Abstract
This study aimed to evaluate the contribution of Machine Learning (ML) approach in the interpretation of intercalating dye-based quantitative PCR (IDqPCR) signals applied to the diagnosis of mucormycosis. The ML-based classification approach was applied to 734 results of IDqPCR categorized as positive (n = 74) or negative (n = 660) for mucormycosis after combining "visual reading" of the amplification and denaturation curves with clinical, radiological and microbiological criteria. Fourteen features were calculated to characterize the curves and injected in several pipelines including four ML-algorithms. An initial subset (n = 345) was used for the conception of classifiers. The classifier predictions were combined with majority voting to estimate performances of 48 meta-classifiers on an external dataset (n = 389). The visual reading returned 57 (7.7%), 568 (77.4%) and 109 (14.8%) positive, negative and doubtful results respectively. The Kappa coefficients of all the meta-classifiers were greater than 0.83 for the classification of IDqPCR results on the external dataset. Among these meta-classifiers, 6 exhibited Kappa coefficients at 1. The proposed ML-based approach allows a rigorous interpretation of IDqPCR curves, making the diagnosis of mucormycosis available for non-specialists in molecular diagnosis. A free online application was developed to classify IDqPCR from the raw data of the thermal cycler output ( http://gepamy-sat.asso.st/ ).Entities:
Mesh:
Year: 2022 PMID: 36180590 PMCID: PMC9525288 DOI: 10.1038/s41598-022-21010-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Design of the study for classification comparison of intercalating dye qPCR (IDqPCR) using two approaches (A and B). (A) “Routine based approach”: classification algorithm for intercalating dye qPCR (IDqPCR) based on the “visual reading” (with Tm for melting temperature and Cp for crossing point) and 2020 revised European Organization for Research and Treatment of Cancer and the Mycoses Study Group Education and Research Consortium (EORTC MSGRC) criteria[27] and (B) “Machine Learning based approach”: classification based on ML algorithms classifier estimations. *Eleven IDqPCR were excluded from the study according due to the impossibility of rendering a final result with the current “Routine based approach” (lack of clinical data essential for diagnosis).
Figure 2Boxplots of 14 features calculated from the intercalating dye-based quantitative PCR curves comparing positive and negative classes. Notes: The mean is represented by a red crossbar; means were compared using Wilcoxon rank sum test:****(significant difference with p value < 10–4 and ns (non significant difference with p value > 0.05). cpD1 (Cp at the first maximum derivative of the amplification curve); cpD2 (Cp at the second maximum derivative of the amplification curve); fluo (the fluorescence value the maximum of the second derivative curve (cpD2)); init1 (the initial template fluorescence from the sigmoidal model); init2 (the initial template fluorescence from an exponential model); maximum fluorescence (the maximum of fluorescence of the amplification curve); global slope (the slope of the amplification curve using a linear regression model); AUC amplification (Area Under the amplification Curve); delta fluorescence (the difference of fluorescence between the minim and the maximum of fluorescence); maxRatio (this method allows the identification of a coherent point in or very close to the exponential region of the qPCR signal). Tm (melting temperature), AUC Tm: area under the melting curve, kurtosis (measure of shape concerning the melting curve) and skewness (Skewness, measure of asymmetry of the melting curve).
Figure 3Comparison of Machine Learning algorithms, resampling and feature selection methods to estimate with the Kappa coefficient a classification model on test sets. Notes SVM for Linear Support Vector Machine, RF for Random Forests and, NB for Naive Bayes and nnet for single hidden layer Neural NETwork; Recursive Feature Elimination (RFE) coupled to Random forests (RF) or Logistic Regression (Glmnet) or No selection variable method; resampling methods (Up, Down or SMOTE [8]) or no sampling method (raw).
Figure 4Pipelines for classifiers development and estimation of performances with the external dataset in three steps (A, B and C). (A) Extraction 14 features from amplification and melting curves using the R packages [25–27]; splitting the conception classifier dataset (345 IDqPCR) into train and test datasets (B) Conception of classifiers (NB for Naive Bayes, SVM for Linear Support Vector Machine, RF for Random Forests and nnet for single-hidden-layer Neural NETwork with several pipelines: (i) selection of features using Recursive Feature Elimination (RFE) coupled to Random forests (RF) or Logistic Regression (Glmnet) or without selection variable method. (ii) different resampling methods (Up, Down or SMOTE [30]) or no resampling method. The steps (A) and (B) were repeated with 50 iterations using a random loop and generated 2400 classifiers (C) Performances of 48 meta-classifiers grouped by Machine Learning algorithm, resampling, selection variable methods and based on hard voting method were estimated on the external dataset (389 IDqPCR).