Literature DB >> 34888355

Coupling Machine Learning and High Throughput Multiplex Digital PCR Enables Accurate Detection of Carbapenem-Resistant Genes in Clinical Isolates.

Luca Miglietta^1,2, Ahmad Moniri², Ivana Pennisi¹, Kenny Malpartida-Cardenas², Hala Abbas³, Kerri Hill-Cawthorne¹, Frances Bolt¹, Elita Jauneikaite^1,4, Frances Davies^1,3, Alison Holmes^1,3, Pantelis Georgiou², Jesus Rodriguez-Manzano¹.

Abstract

Rapid and accurate identification of patients colonised with carbapenemase-producing organisms (CPOs) is essential to adopt prompt prevention measures to reduce the risk of transmission. Recent studies have demonstrated the ability to combine machine learning (ML) algorithms with real-time digital PCR (dPCR) instruments to increase classification accuracy of multiplex PCR assays when using synthetic DNA templates. We sought to determine if this novel methodology could be applied to improve identification of the five major carbapenem-resistant genes in clinical CPO-isolates, which would represent a leap forward in the use of PCR-based data-driven diagnostics for clinical applications. We collected 253 clinical isolates (including 221 CPO-positive samples) and developed a novel 5-plex PCR assay for detection of blaIMP, blaKPC, blaNDM, blaOXA-48, and blaVIM. Combining the recently reported ML method "Amplification and Melting Curve Analysis" (AMCA) with the abovementioned multiplex assay, we assessed the performance of the AMCA methodology in detecting these genes. The improved classification accuracy of AMCA relies on the usage of real-time data from a single-fluorescent channel and benefits from the kinetic/thermodynamic information encoded in the thousands of amplification events produced by high throughput real-time dPCR. The 5-plex showed a lower limit of detection of 10 DNA copies per reaction for each primer set and no cross-reactivity with other carbapenemase genes. The AMCA classifier demonstrated excellent predictive performance with 99.6% (CI 97.8-99.9%) accuracy (only one misclassified sample out of the 253, with a total of 160,041 positive amplification events), which represents a 7.9% increase (p-value <0.05) compared to conventional melting curve analysis. This work demonstrates the use of the AMCA method to increase the throughput and performance of state-of-the-art molecular diagnostic platforms, without hardware modifications and additional costs, thus potentially providing substantial clinical utility on screening patients for CPO carriage.

Entities: Chemical

Keywords: data driven (DD); digital PCR (dPCR); infectious disease; moleuclar diagnostics; real-time PCR

Year: 2021 PMID： 34888355 PMCID： PMC8650054 DOI： 10.3389/fmolb.2021.775299

Source DB: PubMed Journal: Front Mol Biosci ISSN： 2296-889X

1 Introduction

This paper demonstrates that machine learning (ML) approaches coupled with high throughput real-time digital PCR (dPCR) can be used to increase detection accuracy of multiplex PCR assays when screening clinical isolates for the presence of carbapenemase-producing organisms (CPOs). We used a recently reported ML method called Amplification and Melting Curve Analysis (AMCA), which leverages the target-specific information encoded in each amplification event (via real-time data), to identify the nature of nucleic acid molecules (Moniri et al., 2020a). The AMCA approach is based on training supervised machine learning algorithms to extract kinetic and thermodynamic information from PCR amplification and melting curves to enhance the classification accuracy in multiplexing. Validation of this methodology using clinical isolates has never been reported before; therefore, this work represents a step forward towards the implementation of this method into clinical microbiology laboratories. Nucleic acid amplification tests (NAATs) that incorporate the AMCA classifier for multiple target detection will greatly improve their specificity, sensitivity and turn-around time to result, reducing overall resource consumptions and improving diagnostic performance. Antimicrobial resistance (AMR) is a serious global threat and poses a challenge for modern medicine, compromising effective infectious disease management (Bush and Fisher, 2011; Tzouvelekis et al., 2012). One of the most concerning forms of AMR is the rapid spread of CPOs; bacteria producing enzymes that inactivate the potent antibiotics, carbapenems. Whilst overall United Kingdom incidence is low, there are centres nationally facing increasing rates and outbreaks, including Imperial College Healthcare NHS Trust (ICHNT), and it is endemic in many other regions worldwide (Otter et al., 2017b; Rodriguez-Manzano et al., 2020). CPO infections are associated with higher morbidity and mortality than susceptible strains, in part because their resistance can lead to ineffective empirical therapy and suboptimal treatment (Neuner et al., 2011; Eliopoulos et al., 2014). Therapeutic options are severely restricted, and in many cases clinical management relies on “last line” antibiotics that are less effective and have more side effects (Bleumin et al., 2012). Patients infected with CPOs present significant challenges for diagnostics and infection control. There is an urgent need for accurate and timely diagnosis to improve patient outcomes and prevent the spread of AMR. Carbapenemase resistance genes are often co-localised on highly transmissible plasmids and are readily shared between bacterial species, providing the ideal conditions for multidrug resistant organisms (Johnning et al., 2018). Incorrect diagnosis delays appropriate intervention, increases financial burdens for the healthcare system, and complicates antimicrobial stewardship efforts (Charani et al., 2021). A local ICHNT economic analysis estimated the cost of a large hospital outbreak (−100 infections) of carbapenemase producing Klebsiella pneumoniae to be £1M. Some of the increased expenditure was associated with increased screening, bed closures, medication and patient bed-days (Otter et al., 2017a); better diagnostics could reduce these costs. Diagnosis of CPOs is often too complicated and time-consuming, as it is normally based upon multiple tests which employ a wide range of instruments and diagnostic tests. Phenotypic methods typically target carbapenemase production and provide no information on the underlying resistance mechanism (Codjoe and Donkor, 2017). These tests represent a low-cost (£2–15 per sample) and robust methodology; however, they rely on pure culture which increases turnaround times (12–24 h) (Moloney et al., 2019). A variety of molecular methods, including amplification (PCR-based), microarray and sequencing assays have been developed and are frequently used in microbiology laboratories (Matsumura and Pitout, 2016; Reta et al., 2020). Microarray and sequencing are time consuming (>12–48 h), expensive (>£50K platforms and >£80 per sample), and require bioinformatic expertise. Conversely, NAATs are commonly cheaper (£15–30 per sample) and faster (1,2 h), whereas instrument price significantly ranges between tens to hundreds of thousands of pounds for conventional and digital PCR platforms, respectively (Huggett et al., 2015; Quan et al., 2018). Furthermore, the application of sophisticated data processing for its optimisation (as done with microarray and sequencing methods) has been largely unexplored (Collins and Moons, 2019; Beinhauerova et al., 2020). As a result of all aforementioned limitations, implementation of microarrays, sequencing and molecular methods for CPO diagnosis into routine practice is often limited. Recently, our group has demonstrated that the large volume of data obtained from real-time digital PCR (dPCR) instruments can be exploited to perform data-driven multiplexing in a single fluorescent channel, reporting a 99.33 ± 0.13% classification accuracy when using synthetic DNA in a 9-plex format (Moniri et al., 2020a). This result represented an increase of 10% over using melting curve analysis, indicative of the potential benefits of this methodology for diagnostic and screening applications. The ML method used (AMCA) leverages kinetic and thermodynamic information encoded in the amplification and melting curves to perform target identification in multiplexed environments (Moniri et al., 2019; Rodriguez-Manzano et al., 2019; Moniri et al., 2020b). Here we evaluate, for the first time, the analytical performance of AMCA method compared to Xpert Carba-R Cepheid and Resist-3 O.K.N assays when tested on clinical isolates for detection of the most common types of serine-beta-lactamases (blaKPC and blaOXA-48) and metallo-beta-lactamases (blaIMP, blaVIM and blaNDM) (Maurer et al., 2015; Lim et al., 2018). Results were compared against another ML based classifier ‘Melting Curve Analysis’ (MCA), which uses the thermodynamic information contained in PCR melting curves for identification of multiple targets in a single well reaction (Athamanolap et al., 2014; Moniri et al., 2020a). A 5-plex PCR assay was developed in-silico and validated with synthetic DNA templates. The performance of the AMCA method, using this 5-plex, was further assessed with 253 clinical isolates provided by the microbiology department at Charing Cross Hospital, ICHNT. All samples were analysed in real-time dPCR, using an intercalating dye (EvaGreen) in a single-fluorescent channel. This work demonstrates that the AMCA method can be integrated with conventional clinical diagnostic workflows in combination with real-time dPCR platforms, as it does not require any hardware modification. Increasing multiplexing capabilities enables improved workflow efficiency while reducing per sample cost, and it is beneficial to a number of application fields beyond clinical diagnostics, such as veterinary and environmental fields, where multiple targets need to be analysed simultaneously (e.g., SNP genotyping, forensic studies and gene deletion analysis). Figure 1 illustrates the concept of data-driven multiplexing, where tailored PCR-based amplification chemistries combined with advance data analytics can be seamlessly integrated into existing diagnostics pipelines which utilize real-time platforms.

FIGURE 1

Integration of data-driven approaches to standard diagnostic workflows. The blue arrow indicates the conventional diagnosis pipeline from patient to result, where patient sample is collected from different sources (e.g., eye swab, nasopharyngeal swab, throat swab, urine, or rectal swab). Subsequently, samples are cultured, and nucleic acids are extracted in a microbiology lab. Following this, the most suitable genetic test is developed in-silico, comprising of specialised assays capable of multi target detection in a single reaction (first grey arrow). The test is performed in the dPCR instrument, outputting large amounts of data, which are analysed by a machine learning supported algorithm to ensure reliable and accurate results (second grey arrow). This is where the AMCA methodology is applied.

2 Experimental Section

2.1 Synthetic DNA

Double-stranded synthetic DNA (gBlocks® Gene Fragments) containing the entire coding sequences of blaIMP, blaKPC, blaNDM, blaOXA-48 and blaVIM genes was used for quantitative real-time PCR (qPCR) experiments when determining the limit-of-detection of the 5-plex PCR assay, and in dPCR experiments for generating the digital bulk standards and training the mathematical models. The gene fragments (ranging from 900 to 1,000 bp) were purchased from Integrated DNA Technologies Ltd. (IDT) and resuspended in Tris−EDTA buffer to 10 ng/μl stock solutions (stored at −80°C until further use). The DNA stock concentration for all targets was estimated by dPCR using the Fluidigm’s Biomark HD system. The following NCBI accession numbers are used as reference for the gBlock synthesis: NG_049172 (blaIMP), NC_016846 (blaKPC), NC_023908 (blaNDM), NG_049762 (blaOXA-48) and NG_050336 (blaVIM).

2.2 Clinical Isolates—Bacterial Strains and Culture Condition

A total of 253 non-duplicated Enterobacteriaceae isolates were collected between 2012 and 2020 from clinical or screening samples routinely processed by Microbiology Department at Charing Cross Hospital, ICHNT (Ethics protocol 06/Q0406/20). Species identification was performed using MALDI-TOF MS and carbapenemase mechanisms were determined using the Xpert Carba-R (Cepheid) or Resist-3 O.K.N assay (Corisbio). The isolates were subcultured on appropriate growth media and incubated at 37°C overnight, and the genomic DNA was extracted using GenElute Bacterial Genomic DNA kit (Sigma-Aldrich) following the manufacturer’s instructions.

2.3 Primer Design

The genes used in this study belong to 1) class A carbapenemase encoding for blaKPC type, 2) class D oxacillinases encoding blaOXA-48 and 3) class B metalloenzymes encoding blaNDM, blaIMP and blaVIM. The sequences of these genes were downloaded from the GenBank website (http://www.ncbi.nlm.nih.gov/genbank/). Based on the comprehensive analyses and alignments of each carbapenemase type using the MUSCLE algorithm, primers were specifically designed to amplify all alleles of each carbapenemase gene family described above (Edgar, 2004). Design and in-silico analysis were conducted using GENEious Prime 2020.1.2 (https://www.geneious.com). Primer characteristics were analysed through IDT OligoAnalyzer software (https://eu.idtdna.com/pages/tools/oligoanalyzer) using the J. SantaLucia thermodynamic table for melting temperature (Tm) evaluation, hairpin, self-dimer, and cross-primer formation (multiple-primer-analyzer @ www.thermofisher.com). The Tm of the amplification product of each gene was determined by Melting Curve Predictions Software (uMELT) package (https://dna-utah.org/umelt/umelt.html). To confirm the specificity of the real-time digital PCR assays, the primers were first evaluated in a singleplex PCR environment to ensure that they correctly amplified their respective loci and that the amplicons showed the predicted Tm and after that in multiplex format. All primers were synthesised by IDT (Coralville, IA, United States). Primer sequences and amplicon information are listed in Table 1.

TABLE 1

Primer sets developed in this study for the 5-plex PCR assay.

CPE target	Forward primer sequence (5′—3′)	Reverse primer sequence (5′—3′)	Amplicon size (bp)	Amplicon T_m (^oC)
bla_IMP	CAGCAGAGYCTTTGCCAGATT	GCCACGYTCCACAAACCAA	203	86.5
bla_KPC	GGCTCAGGCGCAACTGTAA	GCCCAACTCCTTCAGCAACAA	273	95.5
bla_NDM	CGCGTGCTGKTGGTCGATA	GGCGAAAGTCAGGCTGTGTTG	240	96
bla_OXA-48	CGATTTGGGCGTGGTTAAGGAT	GTCGAGCCARAAACTGTCTAC	235	88.5
bla_VIM	CGAGGYAGAGGGGARCGAGATT	CTSTGCTTCCGGGTAGTGTT	275	94

Primer sets developed in this study for the 5-plex PCR assay.

2.4 Multiplex Real-Time Digital PCR

Each amplification mix for dPCR experiments contained the following: 2 μl of SsoFast EvaGreen Supermix with Low ROX (BioRad, United Kingdom), 0.4 μl of 20X GE Sample Loading Reagent (Fluidigm PN 85000746), 0.2 μl of PCR grade water, 0.2 μl of 20X multiplex PCR primer mixture containing the five primer sets (10 μM of each primer), and 1.2 μl of different concentrations of synthetic DNA, samples or controls to bring the final volume to 4 μl. PCR cycling condition consisted of a hot start step for 10 min at 95°C, followed by 45 cycles at 95°C for 20 s, 67°C for 45 s, and 72°C for 30 s. Melting curve analysis was performed with one cycle at 65°C for 3 s and reading from 65 to 97°C with an increment of 0.5°C. We used the integrated fluidic circuit controller to prime and load qdPCR 37K digital chips and Fluidigm’s Biomark HD system to perform the dPCR experiments, following manufacturer’s instructions. Each digital chip contains 48 inlets, where each inlet is connected to a microfluidic panel consisting of 770 partitions or wells (0.85 nl well volume). In this study, we used a total of 7 qdPCR 37K digital chips, totalling 336 panels and 189,206 positive amplification reactions (29,165 from training and 160,041 from testing experiments).

2.5 Limit of Detection for the 5-Plex PCR Assay

Analytical sensitivity was evaluated with 10-fold dilutions of gBlocks® Gene Fragments containing the sequence for the five carbapenemase genes, ranging from 101 to 106 DNA copies per reaction. Each experimental condition was run in triplicate. The qPCR assays were performed in a LightCycler 96 and the data was analysed using LC96 System software version SW1.1. Further details in the experimental conditions used for qPCR are provided in Supplementary Data S1.

2.6 Quantification of Clinical Isolates

Clinical isolates were quantified by real-time dPCR following the methodology proposed by Moniri et al. (2020b) Thus, using Poisson statistics when the microfluidic panel occupancy was ≤85% (a maximum of 665 positive amplification events for a given panel) and quantification cycle (Cq) interpolation from digital bulk standards when panel occupancy was >85%. Digital bulk standards were generated by serial dilutions of the gBlocks® Gene Fragments containing the sequence for the five carbapenemase genes ranging from reaction 101 to 105 DNA copies per panel. The Cq values are calculated by the Fluidigm Digital PCR Analysis software 2.1.1.

2.7 Machine Learning-Based Methods

The proposed method, AMCA, trains a supervised machine learning model in which the best fit linear line and the optimal value of intercept and coefficient are calculated to minimize error when combining the predictions of amplification curve analysis (ACA) and MCA, as previously reported in Moniri et al. (2020a) and Moniri et al. (2020b). In this study, the ACA consists of applying a k-nearest neighbors (KNN) model (with parameter k = 10) to the entire real-time curve from each amplification event, whereas the MCA method consists of applying a logistic regression model to Tm values extracted from each melting curve (Cunningham and Delany, 2020). Both ACA and MCA output five probabilities associated with each target in the 5-plex. Therefore, as showed in the flowchart in the Supplementary Figure S1, these probabilities are concatenated into 10 values which are the input to the AMCA method. It is important to note that this classifier is tuned with its own cross-validation step to avoid overfitting. The classifier threshold for positive samples has been set at 5% of panel occupancy, Further details of the AMCA linear regression model are described in Supplementary Data S2.

2.8 Statistical Analysis

1) Sample size: A sufficient number of samples was determined to provide statistically significant results via the binomial proportion confidence interval method (Mercaldo et al, 2007). Under the assumption that the test has a sensitivity and specificity of 95% with a 5% margin of error, the number of samples were determined as 72 (which is significantly smaller than 221 used in this study). 2) AMCA cross-validation performance: Prior to evaluating the in-sample performance of the model, by using the 221 clinical isolates, the out-of-sample classification accuracy was estimated by 10-fold cross-validation on the training data (using stratified splits). 3) AMCA accuracy: The two-sided t-test with unknown variances was used to determine statistical significance for comparing the classification accuracy of AMCA against MCA. Prior to this test, a Lilliefors test was used to determine normality of the distributions and the Bartlett test for equal/unequal variances. A p-value of 0.05 was used as a threshold for statistical significance for all tests.

3 Results

3.1 Primer Characterisation for Optimal Multiplex PCR Assay Performance

3.1.1 In-Silico Analysis

To test the inclusivity and exclusivity of the 5-plex PCR assay, primers were subjected to a general NCBI BlastN search against more than 500 sequences per target. Inclusivity results showed over 99% identity coverage for each target (inclusivity alignments are provided in Supplementary Figures S2–S6). For exclusivity analysis, BlastN hits with an identity score lower than 80% were regarded as negative. No cross-reactivity was observed with other sequences deposited in the database.

3.1.2 Experimental Results in qPCR

The 5-plex PCR assay has been validated using a conventional qPCR platform with synthetic DNA templates at concentrations ranging from 101 to 106 DNA copies/reaction. Supplementary Figure S7 shows the real-time amplification, melting and standard curves obtained from analytical sensitivity experiments. The amplification and melting curves have distinct shape and Tm value distribution for each target, respectively, which is beneficial for AMCA classification. Observed Tm values for blaIMP, blaKPC, blaNDM, blaOXA-48 and blaVIM are 81.4, 89.5, 90.2, 83.8 and 87.9°C, respectively. Moreover, each primer set (in a multiplex environment) shows an excellent limit-of-detection (LOD) of 10 DNA copies/reaction. Corresponding standard curves, illustrating the Cq value as a function of the target concentration, yield an assay efficiency of 87.3, 103.5, 105.7, 98.7, and 88.1%, respectively. PCR products were absent in all the negative controls.

3.1.3 Experimental Results in Real-Time dPCR

The 5-plex PCR assay was further validated in the dPCR platform with synthetic DNA templates at concentrations ranging from 101 to 105 DNA copies per panel, which were chosen such that we observe amplification events in both-single and bulk regions to capture kinetic information in both domains. Figure 2A shows end-point photographs (cycle 45) of panels at increasing amount of DNA. A total of 29,165 positive amplification reactions were performed. As shown in Figure 2B, A digital bulk standard curve for each target was build using the real-time dPCR instrument. As this microfluidic platform is capable of real-time data collection, quantification cycle values were used to generate the standard curves by plotting the Cq values against log[quantity] of a ten-fold serial dilution of each DNA target. It can be observed that there is a clear separation between the single-molecule (101 to 102 copies/panel) and the bulk regions (104 to 105 copies/panel) based on Cq value ranges, where 103 copies/panel acts as a transition region across all the targets. In the none-saturated panels we can observed a digital pattern (number of ONs and OFFs) at the end of the reaction and the amount input molecules can be calculated using binomial and Poisson statistics (Quan et al., 2018), whereas in the saturated panels the amount input molecules can be quantified using the digital bulk standard curve (as in qPCR). Digital bulk standard curves yield an assay efficiency of 118.1, 98.7, 86.2, 100.8, and 90.2% efficiency for blaIMP, blaKPC, blaNDM, blaOXA-48 and blaVIM assays, respectively. Table 2 reports the standard curve parameters for each assay, digital count and panel occupancy. Figures 3A,B, respectively, show the amplification and melting curves for the five carbapenem-resistant genes and the average characteristic sigmoidal shape for each target (black solid line) in real-time dPCR. Figure 3C represents the distribution of melting temperature, where the Tm range for each target is computed as: blaIMP (81.3, 83.2°C), blaKPC (89.0, 91.5°C), blaNDM (90.0, 92.7°C), blaOXA-48 (83.7, 86.6°C), and blaVIM (87.7, 90.8°C). After peak detection, negative reactions can be confirmed by identifying curves with no peak.

FIGURE 2

TABLE 2

Standard curve parameter in real-time digital PCR.

Target	Slope	Constant	Rsqr^a	Eff. (%)^b	Single-molecule region		Transition region	Bulk region
Target	Slope	Constant	Rsqr^a	Eff. (%)^b	10¹ cp/pnl (occ.) ^c	10² cp/pnl (occ.) ^c	10³ cp/pnl (occ.) ^c	10⁴ cp/pnl (occ.) ^c	10⁵ cp/pnl (occ.) ^c
bla_IMP	−2.953	37.875	0.978	118.111	7 (0.9%)	51 (6.6%)	519 (67.4%)	770 (100.0%)	768 (99.7%)
bla_KPC	−3.354	38.275	0.993	98.661	5 (0.6%)	56 (7.3%)	398 (51.7%)	769 (99.9%)	770 (100.0%)
bla_NDM	−3.705	40.62	0.996	86.174	4 (0.5%)	21 (2.7%)	190 (24.7%)	767 (99.6%)	769 (99.9%)
bla_OXA-48	−3.304	38.01	0.998	100.77	3 (0.4%)	25 (3.2%)	321 (41.7%)	769 (99.9%)	768 (99.7%)
bla_VIM	−3.582	39.96	0.994	90.169	6 (0.8%)	59 (7.7%)	659 (85.6%)	770 (100.0%)	770 (100.0%)

R-squared.

Efficiency (%).

Copies/panel (% occupancy in digital PCR). The occupancy is calculated by counting the number of amplification reaction occurring per each panel and diving it by the total number of wells (N = 770).

FIGURE 3

Real-time amplification and melting curves obtained from the dPCR instrument. (A) Raw amplification curves at different concentrations from synthetic DNA templates; the black line represents the average trend of the kinetic information based on each specific target-primer interaction. (B) Melting curves across the five different CPO; the black line represents the average trend of the thermodynamic information based on each specific target-primer interaction. (C) Melting peak (Tm) distribution from the dPCR instrument, showing the probability density function (PDF) for each target.

Standard Curve in real-time digital PCR. (A) Digital patterns for each microfluidic panel at increasing concentrations (770 reaction chambers per panel; 0.85 nL volume per chamber). (B) Standard curves correlating the Cq values with the concentration of each target; shaded blue area indicates the single-molecule region; shaded orange shows the bulk region; and the middle area displays the theoretical transition between the single-molecule and bulk. Standard curve parameter in real-time digital PCR. R-squared. Efficiency (%). Copies/panel (% occupancy in digital PCR). The occupancy is calculated by counting the number of amplification reaction occurring per each panel and diving it by the total number of wells (N = 770). Real-time amplification and melting curves obtained from the dPCR instrument. (A) Raw amplification curves at different concentrations from synthetic DNA templates; the black line represents the average trend of the kinetic information based on each specific target-primer interaction. (B) Melting curves across the five different CPO; the black line represents the average trend of the thermodynamic information based on each specific target-primer interaction. (C) Melting peak (Tm) distribution from the dPCR instrument, showing the probability density function (PDF) for each target.

3.2 Clinical Isolates

As depicted in Table 3, the 253 pure bacterial strains were identified from MALDI-TOF MS as Acinetobacter spp. (n = 2), Citrobacter spp. (n = 16), Enterobacter spp. (n = 37), Escherichia spp. (n = 57), Klebsiella sp. (n = 133), Proteus sp. (n = 1), Pseudomonas sp. (n = 5), and Serratia sp. (n = 2). Carbapenemase genes were determined as a single enzyme in 220 strains (blaIMP = 45; blaKPC = 9; blaNDM = 74; blaOXA-48 = 84; blaVIM = 8), and as a combination in one isolate (blaNDM and blaOXA-48). Thirty-two isolates were confirmed as negative for the five carbapenemase genes. A more detailed description of each isolate, including bacterial species, date of sampling, specimen type, antibiotic resistance mechanisms and concentration (copies/µl of extracted DNA) can be found in Supplementary Table S1.

TABLE 3

Clinical Enterobacteriaceae isolates used in this study.

Species (MALDI-TOF MS)	Carbapenemase gene	Number of isolates
Citrobacter spp.	bla_IMP	1
	bla_KPC	2
	bla_NDM	1
	bla_OXA-48	10
	bla_VIM	1
Enterobacter spp.	bla_IMP	20
	bla_NDM	7
	bla_OXA-48	2
	bla_VIM	2
Escherichia spp.	bla_IMP	7
	bla_NDM	14
	bla_NDM and bla_OXA-48	1
	bla_OXA-48	26
Klebsiella pneumoniae	bla_IMP	15
	bla_KPC	6
	bla_NDM	51
	bla_OXA-48	45
	bla_VIM	3
Proteus mirabilis	bla_NDM	1
Pseudomonas aeruginosa	bla_IMP	2
Pseudomonas aeruginosa	bla_VIM	2
Serratia marcescens	bla_KPC	1
Serratia marcescens	bla_OXA-48	1
Multiple species*	negative	32

*CPO-negative species: Acinetobacter baumannii, Citrobacter freundii, Enterobacter spp., Escherichia coli, Klebsiella pneumoniae, and Pseudomonas aeruginosa.

Clinical Enterobacteriaceae isolates used in this study. *CPO-negative species: Acinetobacter baumannii, Citrobacter freundii, Enterobacter spp., Escherichia coli, Klebsiella pneumoniae, and Pseudomonas aeruginosa.

3.3 The AMCA Model: Training and Cross-Validation

Our study aims to validate the performance of the AMCA method for detection of carbapenem-resistant genes in clinical isolates compared with the MCA approach. To train both models, a total of 99,860 amplification events were generated using synthetic DNA templates, of which 29,165 were positive: blaIMP (N = 4,941), blaKPC (N = 5,940), blaNDM (N = 5,870), blaOXA-48 (N = 4,333), and blaVIM (N = 8,081). Observed overall classification performance of training dataset for the MCA and AMCA methods was 94.9 ± 21.99% and 99.2% ± 8.86%, respectively. Supplementary Figure S8 shows the confusion matrices comparing the true and predicted targets for both methods. It can be observed that the blaNDM and blaKPC targets are misclassified by the MCA methods, whereas the AMCA considerably improves the prediction of both targets: from 804 to 52 amplification events for blaNDM, and from 511 to 46 for blaKPC. No other target was misclassified more than 1.26% for either method.

3.4 The AMCA Model: Clinical Validation

A total of 253 clinical isolates, including 221 positives, and 224,840 amplification events (of which 160,041 positives) were used for the clinical validation. Compare to results obtained with the Xpert Carba-R Cepheid and Resist-3 O.K.N assays, the overall observed accuracy for MCA was 91.7% (CI 87.59–94.79%) and 99.6% (CI 97.82%–99.99%) for AMCA, which represent a 7.9% increase (p-value < 0.01) (Supplementary Figure S9). A total of 21 clinical isolates were misclassified for the MCA method and considered false positives (FP) as shown in Table 4, whereas the AMCA reduced the number of misclassified samples to 1 (Table 5). All the false positive samples were identified as double infection because of the overlapping distribution in the Tm, as shown in Figure 3. Performance improvement in the AMCA method is due to the addition of real-time amplification data, contrary to the MCA approach that only takes into account the melting curve distribution. Further details on AMCA coefficient contributions (i.e., ACA and MCA weights) are shown in Supplementary Figure S10. Moreover, 32 bacterial isolates not carrying the five carbapenemase genes were used to evaluate the assay specificity. The 5-plex PCR assay showed negative results in the absence of the specific target.

TABLE 4

Classification of clinical isolates when using the ML-based MCA method.

Target	N	TP	TN ^a	FP	SEN (%)	SPE (%)	Accuracy (CI)
bla_IMP	45	45	32	0	100.0	100.0	100.0% (95.32–100.00%)
bla_KPC	9	8	32	1 ^b	100.0	96.97	97.56% (87.14–99.94%)
bla_NDM	74	54	32	20 ^c	100.0	61.54	81.13% (72.38–88.08%)
bla_OXA-48	84	84	32	0	100.0	100.0	100.0% (96.87–100.00%)
bla_VIM	8	8	32	0	100.0	100.0	100.0% (91.19–100.00%)
bla_OXA-48 and bla_NDM	1	1	32	0	100.0	100.0	100.0% (97.24–100.00)
Total	221	200	32	21	100.0	60.38	91.70% (87.59 to 94.79%)

Abbreviations-N, number of samples; TP, true Positive; TN, true negative; FP, false positive; FN, false negative; SEN, sensitivity; SPE, specificity; CI, confidence interval.

A total 32 negatives samples are considered across all the groups for sensitivity, specificity and accuracy calculation.

This isolate was misclassified as blaNDM and blaKPC double infection.

These isolates were misclassified as blaNDM and blaKPC double infections.

TABLE 5

Classification of clinical isolates based on ML-based AMCA method.

Target	N	TP	TN ^a	FP	SEN (%)	SPE (%)	Accuracy (CI)
bla_IMP	45	45	32	0	100.0	100.0	100.0% (95.32–100.00%)
bla_KPC	9	9	32	0	100.0	100.0	100.0% (91.40–100.00%)
bla_NDM	74	73	32	1 ^b	100.0	96.97	99.06% (94.86–99.98%)
bla_OXA-48	84	84	32	0	100.0	100.0	100.0% (96.87–100.00%)
bla_VIM	8	8	32	0	100.0	100.0	100.0% (91.19–100.00%)
bla_OXA-48 and bla_NDM	1	1	32	0	100.0	100.0	100.0% (97.24–100.00)
Total	221	220	32	1	100.0	96.97	99.60% (97.82 to 99.99%)

Abbreviations-N, Number of samples; TP, True Positive; TN, True Negative; FP, False Positive; FN, False Negative; SEN, Sensitivity; SPE, Specificity; CI, Confidence Interval.

A total 32 negatives samples are considered across all the groups for sensitivity, specificity and accuracy calculation.

This isolate was misclassified as blaNDM, and blaKPC, double infection.

Classification of clinical isolates when using the ML-based MCA method. Abbreviations-N, number of samples; TP, true Positive; TN, true negative; FP, false positive; FN, false negative; SEN, sensitivity; SPE, specificity; CI, confidence interval. A total 32 negatives samples are considered across all the groups for sensitivity, specificity and accuracy calculation. This isolate was misclassified as blaNDM and blaKPC double infection. These isolates were misclassified as blaNDM and blaKPC double infections. Classification of clinical isolates based on ML-based AMCA method. Abbreviations-N, Number of samples; TP, True Positive; TN, True Negative; FP, False Positive; FN, False Negative; SEN, Sensitivity; SPE, Specificity; CI, Confidence Interval. A total 32 negatives samples are considered across all the groups for sensitivity, specificity and accuracy calculation. This isolate was misclassified as blaNDM, and blaKPC, double infection.

4 Discussion

In the last decade, novel pandemic outbreaks and the continued threats of emerging multi-drug resistant microorganisms have significantly increased the demand for molecular tests, in particular PCR-based methods (Nishizawa and Suzuki, 2014; Vasala et al., 2020). To respond to this need, the AMCA technology has been designed to increase the throughput of real-time molecular platforms. Seamlessly integrated with conventional diagnostic workflows, this machine learning based approach can enhance multiplexing capabilities of traditional qPCR and state-of-the art dPCR instruments, increasing the number of nucleic acid targets that can be identified in a single fluorescent channel without hardware modifications. Individual primer sets produce amplification products at a sequence-specific amplification rate and efficiency, which generate unique amplification and melting curves for different target concentrations. Such curves can be capture as time-series data by real-time instruments, feed into machine learning models and used to identify multidimensional patterns (or signatures) specific to each primer set. Therefore, enabling the identification of multiple DNA targets per fluorescent channel using only real-time data (i.e., data-driven multiplexing). In this paper, we performed a clinical validation on diagnostic accuracy of the AMCA methodology by targeting the “big five” carbapenem-resistant genes (blaVIM, blaOXA-48, blaNDM, blaIMP and blaKPC) in multiplex PCR. A 5-plex PCR assay was developed and characterised in both real-time qPCR and dPCR instruments, and the AMCA performance investigated through the identification of 253 clinical isolates from patients’ samples. The MCA was used as a reference method to compare results. We successfully show a 99.2% accuracy for identifying the five carbapenem-resistant genes in the clinical isolates. The AMCA method was shown to enhance the classification performance by 7.9% compared to MCA. The AMCA takes advantage of the volume of raw data extracted from amplification and melting curves, whereas the MCA only considers melting curves. It is interesting to observe that the overlapping melting curve distribution in Figure 3B (e.g., blaNDM and blaKPC) represents a misclassification of 1,303 reactions (509 blaKPC as blaNDM, and 804 blaNDM as blaKPC) and 21 clinical isolates (20 blaNDM and 1 blaKPC as coinfections) when using the MCA, but it only represents a misclassification of 99 reactions and 1 clinical isolates for the AMCA method. As described in previous publications (Moniri et al., 2020a), these results support the hypothesis that the underlaying biological factors driving these methods for target identification are fundamentally different. As observed in Supplementary Figure S10, machine learning methods can be used to exploit the distinctive information contained on the amplification and melting curves by weighting the predictions from the ACA and MCA to optimally combine them and maximize the AMCA performance. Although dPCR is not likely to replace all qPCR assays in the clinical laboratory due to associated instrument costs and greater complexity, it has several specific advantages over qPCR. The vast number of partitions reduce the likelihood of coamplification and inhibitors in a single reaction, facilitating accurate detection of multiple analytes; and the large amount of data enables the use of advance machine learning algorithms to detect subtle kinetic and thermodynamic differences encoded in the real-time amplification data. On the other hand, real-time dPCR platforms enable the use of digital bulk standards and offer a valuable solution for absolute quantification of clinical isolates (equivalently to conventional qPCR standards) even when the panels are saturated, expanding the dynamic range of quantification of the microfluidic chips and eliminating the need of testing the samples at multiple dilutions to ensure that at least one of them falls within the conventional dPCR range (i.e. panels at occupancy <85%). As shown in Figure 2, it is possible to create a standard curve in real-time dPCR by extracting Cq values as a function of the target concentration because there is a clear separation between the single-molecule and the bulk regions. We envision that coupling real-time dPCR instruments with data-driven multiplexing will expand the use of these platforms in clinical microbiology laboratories. The results presented in this study represent a step forward in the use of PCR-based data-driven diagnostics for clinical applications. However, there are several aspects that need to be further investigated. Firstly, in this paper we evaluated the performance of AMCA method in clinical isolates using pure bacterial cultures, therefore a follow-up study needs to be conducted to evaluate the performance of the method directly from clinical samples. Secondly, it is important to identify co-presence of infections for patient treatment, however in this paper we address only one sample with a double infection; a larger study will be required to test the effectiveness of the AMCA in double pathogen identification. Depending on the sample concentration, this might not limit multiplexing capabilities in dPCR, but it could represent a challenge when qPCR instruments are used. This work suggests that the AMCA approach provides a versatile solution for the accurate detection of AMR genes, representing a cost-effective interaction as it does not require hardware modifications. This study highlights the importance of integrating artificial intelligence for diagnosis and how effectively it increases result reliability of state-of-the-art dPCR instruments. Moreover, the AMCA methodology has the potential for further application in point-of-care devices and isothermal chemistries, as a solution to leverage identification accuracy and enable faster detection of multiple pathogens.

29 in total

1. Counting the cost of an outbreak of carbapenemase-producing Enterobacteriaceae: an economic evaluation from a hospital perspective.

Authors: J A Otter; P Burgess; F Davies; S Mookerjee; J Singleton; M Gilchrist; D Parsons; E T Brannigan; J Robotham; A H Holmes
Journal: Clin Microbiol Infect Date: 2016-10-13 Impact factor: 8.067

2. Amplification Curve Analysis: Data-Driven Multiplexing Using Real-Time Digital PCR.

Authors: Ahmad Moniri; Luca Miglietta; Kenny Malpartida-Cardenas; Ivana Pennisi; Miguel Cacho-Soblechero; Nicolas Moser; Alison Holmes; Pantelis Georgiou; Jesus Rodriguez-Manzano
Journal: Anal Chem Date: 2020-09-18 Impact factor: 6.986

3. Clearance of carbapenemase-producing Enterobacteriaceae (CPE) carriage: a comparative study of NDM-1 and KPC CPE.

Authors: Y J Lim; H Y Park; J Y Lee; S H Kwak; M N Kim; H Sung; S-H Kim; S H Choi
Journal: Clin Microbiol Infect Date: 2018-06-02 Impact factor: 8.067

4. Confidence intervals for predictive values with an emphasis to case-control studies.

Authors: Nathaniel D Mercaldo; Kit F Lau; Xiao H Zhou
Journal: Stat Med Date: 2007-05-10 Impact factor: 2.373

Review 5. Mechanisms of Helicobacter pylori antibiotic resistance and molecular testing.

Authors: Toshihiro Nishizawa; Hidekazu Suzuki
Journal: Front Mol Biosci Date: 2014-10-24

Review 6. dPCR: A Technology Review.

Authors: Phenix-Lan Quan; Martin Sauzade; Eric Brouzes
Journal: Sensors (Basel) Date: 2018-04-20 Impact factor: 3.576

Review 7. Carbapenem Resistance: A Review.

Authors: Francis S Codjoe; Eric S Donkor
Journal: Med Sci (Basel) Date: 2017-12-21

8. A PCR-based diagnostic testing strategy to identify carbapenemase-producing Enterobacteriaceae carriers upon admission to UK hospitals: early economic modelling to assess costs and consequences.

Authors: Eoin Moloney; Kai Wai Lee; Dawn Craig; A Joy Allen; Sara Graziadio; Michael Power; Carolyn Steeds
Journal: Diagn Progn Res Date: 2019-04-18

9. Emergence and clonal spread of colistin resistance due to multiple mutational mechanisms in carbapenemase-producing Klebsiella pneumoniae in London.

Authors: Jonathan A Otter; Michel Doumith; Frances Davies; Siddharth Mookerjee; Eleonora Dyakova; Mark Gilchrist; Eimear T Brannigan; Kathleen Bamford; Tracey Galletly; Hugo Donaldson; David M Aanensen; Matthew J Ellington; Robert Hill; Jane F Turton; Katie L Hopkins; Neil Woodford; Alison Holmes
Journal: Sci Rep Date: 2017-10-05 Impact factor: 4.379

10. Framework for DNA Quantification and Outlier Detection Using Multidimensional Standard Curves.

Authors: Ahmad Moniri; Jesus Rodriguez-Manzano; Kenny Malpartida-Cardenas; Ling-Shan Yu; Xavier Didelot; Alison Holmes; Pantelis Georgiou
Journal: Anal Chem Date: 2019-05-14 Impact factor: 6.986

2 in total

1. Adaptive Filtering Framework to Remove Nonspecific and Low-Efficiency Reactions in Multiplex Digital PCR Based on Sigmoidal Trends.

Authors: Luca Miglietta; Ke Xu; Priya Chhaya; Louis Kreitmann; Kerri Hill-Cawthorne; Frances Bolt; Alison Holmes; Pantelis Georgiou; Jesus Rodriguez-Manzano
Journal: Anal Chem Date: 2022-10-03 Impact factor: 8.008

Review 2. REASSURED Multiplex Diagnostics: A Critical Review and Forecast.

Authors: Jonas A Otoo; Travis S Schlappi
Journal: Biosensors (Basel) Date: 2022-02-16

2 in total