| Literature DB >> 33859233 |
Nam K Tran1, Taylor Howard2, Ryan Walsh3, John Pepper4, Julia Loegering2, Brett Phinney2, Michelle R Salemi2, Hooman H Rashidi5.
Abstract
The 2019 novel coronavirus infectious disease (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has created an unsustainable need for molecular diagnostic testing. Molecular approaches such as reverse transcription (RT) polymerase chain reaction (PCR) offers highly sensitive and specific means to detect SARS-CoV-2 RNA, however, despite it being the accepted "gold standard", molecular platforms often require a tradeoff between speed versus throughput. Matrix assisted laser desorption ionization (MALDI)-time of flight (TOF)-mass spectrometry (MS) has been proposed as a potential solution for COVID-19 testing and finding a balance between analytical performance, speed, and throughput, without relying on impacted supply chains. Combined with machine learning (ML), this MALDI-TOF-MS approach could overcome logistical barriers encountered by current testing paradigms. We evaluated the analytical performance of an ML-enhanced MALDI-TOF-MS method for screening COVID-19. Residual nasal swab samples from adult volunteers were used for testing and compared against RT-PCR. Two optimized ML models were identified, exhibiting accuracy of 98.3%, positive percent agreement (PPA) of 100%, negative percent agreement (NPA) of 96%, and accuracy of 96.6%, PPA of 98.5%, and NPA of 94% respectively. Machine learning enhanced MALDI-TOF-MS for COVID-19 testing exhibited performance comparable to existing commercial SARS-CoV-2 tests.Entities:
Year: 2021 PMID: 33859233 PMCID: PMC8050054 DOI: 10.1038/s41598-021-87463-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of common emergency use authorized diagnostic methods for evaluating COVID-19 tests.
| A. Molecular assays | ||||||
|---|---|---|---|---|---|---|
| Manufacturer/Platform | Method | Throughput | RNA targets | LoD (NDU/mL) | PPA (%) | NPA (%) |
| Abbott Molecular/Alinity m | RT-PCR | 300 test/8 ha | N1/N2 | 2700 | > 99.0 | > 99.0 |
| Becton Dickenson/BDMax | RT-PCR | 12 tests/3 h | N1/N2 | 1800 | > 99.0 | > 99.0 |
| BioFire Defense/FilmArray | RT-PCR | 1 test/45 min | ORF1ab/ORF8 | 5400 | > 99.0 | > 99.0 |
| Bio-Rad/QX200 | ddPCR | 96 test/6 h | N1/N2 | 600 | > 99.0 | > 99.0 |
| CDC/ABI 7500 | RT-PCR | 21 test/5 h | N1/N2 | 18,000 | > 99.0 | > 99.0 |
| Cepheid/GeneXpert | RT-PCR | 1–16 tests/1 hb | N2/E | 5400 | > 99.0 | > 99.0 |
| Hologic/Panther Fusion | RT-TMA | 500 tests/8 h | ORF1ab | 600 | > 99.0 | > 99.0 |
| Thermo Fisher/Amplitude | RT-PCR | 3000 tests/24 h | ORF1ab/N/S | 180 | > 99.0 | > 99.0 |
| Roche Molecular systems/cobas 6800 | RT-PCR | 94 tests/3 h | ORF1ab/E | 1800 | > 99.0 | > 99.0 |
| Roche Molecular systems/cobas Liat | RT-PCR | 1 test/20 min | ORF1ab/N | 5400 | > 99.0 | > 99.0 |
ABI applied biosciences, Ag antigen, CDC Centers for Disease Control and Prevention, ddPCR digital droplet PCR, E envelope protein gene, IMCA immunochromographic membrane assay, LFIA lateral flow immunofluorescent assay, LoD limit of detection, N nucleoprotein gene, NDU nucleic acid test detectable units, NPA negative percent agreement, ORF open reading frame, PCR polymerase chain reaction, PPA positive percent agreement, RNA ribonucleic acid, RT reverse transcription, S spike protein gene, TCID50 median tissue culture infective dose, TMA transcription mediated amplification.
aRandom access capable.
bInstrument model dependent.
Figure 1Conceptual drawing of study workflow. The study workflow consisted of patients providing a nasal swab specimen preserved in saline transport media. Media was tested by RT-PCR (Step 1) and swabs plated onto the MAALDI-TOF–MS platform (Step 2). Mass spectra were standardized (Step 3) and then analyzed using machine learning via the Auto-ML MILO platform (Step 4). COVID-19 status is then exported to a smart device app (Step 5).
Figure 2Machine intelligence learning optimizer Fig. 1. The MILO auto-machine learning (ML) infrastructure consists of beginning with two datasets: (a) balanced data (Dataset A) set used for training and initial validation, and (b) an unbalanced dataset (Dataset B) for generalization/secondary testing. MILO initially removes the missing values followed by providing several scaling options for the given dataset which is then assessed by the software. Unsupervised ML is then used for feature selection and feature engineering. The generated models are then trained on a subset (80%) of dataset A (depicted as Dataset 1 in the image above) and then initially tested with the remaining subset (20%) of Data Set A during its supervised ML stage. Following this training/initial validation stage, each of the ML models generated in this stage are then secondarily tested on Dataset B (depicted as Dataset 2 in the image above) for generalization testing. Selected models can then be deployed thereafter as joblib files. For this study, we imported the study data into MILO using COVID-19 status as the outcome measure for analysis. The following functions are then performed automatically by MILO.
Figure 3Study datasets. A total of 226 asymptomatic and symptomatic patients were enrolled. Twenty-seven samples were invalid due to polymer contamination, preventing MALDI-TOF-MS analysis. The remaining 199 were successfully tested by MALDI-TOF-MS and produced spectra. These data were divided into Datasets A and B, with Dataset A serving as the training/initial validation dataset. Optimized models produced from Dataset A were then secondarily tested with Dataset B for generalization to assess their true performance.
Figure 4Example MALDI-TOF-MS spectra and PCA analysis of COVID-19 positive vs. negative samples. (A) Illustrates the average MALDI-TOF-MS spectra for patients that were SARS-CoV-2 RNA PCR positive (pink) versus PCR negative (blue). Zoomed in regions of interested are also shown. X-axis is mass to charge (m/z) ratio and Y-axis is relative abundance. (B, C) Show unscaled and scaled PCA, respectively, for the 199 samples (red = positive, blue = negative) tested by the MALDI-TOF-MS method. (D) A pair of example (COVID-19 positive vs. negative) patients.
Figure 5Receiver operator characteristic curves of the top performing ML models. The figure illustrates optimized deep neural network (A) and gradient boosting machine (B) ML models secondarily tested by Dataset B. For the deep neural network, the ML model used 75% of MS peaks (features) to yield a positive percent agreement of 100% (95% CI 95–100%), and negative percent agreement of 96% (95% CI 86–99%), with an area under the receiver operator curve of 0.9985. In contrast, the Gradient Boosting Machine ML model used only 25% of the MS peaks (features) to yield a positive percent agreement of 99% (95% CI 92–100%) and negative percent agreement of 94% (95% CI 84–99%) with an area under the receiver operator curve of 0.9904.
Figure 6Conceptual model for near patient ML-enhanced MALDI-TOF-MS COVID-19 testing. The Figure outlines the conceptual workflow for our ML-enhanced MALDI-TOF-MS COVID-19 testing method when performed near patient. Individuals with unknown COVID-19 status register via a smart device app which links their identity with a unique quick-response (Q–R) barcode. The Q-R code is paired to the nasal swab specimen which is self-collected under supervision. The sample is tested by MALDI-TOF-MS and mass spectra analyzed by the ML algorithm to report out a COVID-19 result. COVID-19 individuals are allowed entry for 24 h until. COVID-19 positive/indeterminant individuals will be denied entry and/or require follow-up testing by molecular methods. Data from MALDI-TOF-MS is fed routinely to the automated ML platform for both quality assurance and continual refinement of models. Total time from sample collection to result is < 1 h.