| Literature DB >> 31346170 |
Heike Sichtig1, Timothy Minogue2, Yi Yan3, Christopher Stefan4, Adrienne Hall4, Luke Tallon5, Lisa Sadzewicz5, Suvarna Nadendla5, William Klimke6, Eneida Hatcher6, Martin Shumway6, Dayanara Lebron Aldea7, Jonathan Allen7, Jeffrey Koehler4, Tom Slezak7, Stephen Lovell3, Randal Schoepp4, Uwe Scherf3.
Abstract
FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials.Entities:
Mesh:
Year: 2019 PMID: 31346170 PMCID: PMC6658474 DOI: 10.1038/s41467-019-11306-6
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Proposed composite reference method (C-RM) for ID-NGS diagnostics. Panel a illustrates a walkthrough of the C-RM. Here, we show in silico target sequence comparison with FDA-ARGOS reference genomes in combination with representative clinical testing to understand the performance of ID-NGS diagnostic tests. Using raw sequence data from the ID-NGS diagnostic test device, in silico comparison of results obtained with the assay in-house database to results when using FDA-ARGOS will evaluate device bioinformatic analysis pipelines and report generation while eliminating the need for additional sample testing with a gold standard comparator (current FDA benchmarks). Overall, we anticipate the use of the C-RM based on assay-specific subsets of clinical samples and/or microbial reference materials (MRMs) for clinical validation in combination with FDA-ARGOS in silico target sequence comparison to generate scientifically valid evidence for understanding the performance of ID NGS diagnostic tests. Panel b lists the required quality control metrics for passing the regulatory-grade reference genome criteria. At a minimum, an FDA-ARGOS regulatory-grade reference genome adheres to six metrics (a–f). Specifically, category f details the minimum data requirements that are further described in (c). In addition, panel d lists the 10 critical metadata that need to be ascribed to a genome to meet the regulatory-grade criteria
Fig. 2FDA-ARGOS quality-controlled reference genomes for diagnostic use. Summary statistics of the current 487 microbial genomes show primary coverage of FDA-ARGOS resides with bacterial isolates, followed by viruses and then eukaryotic parasites (a). Supplementary Data 1 provides accessions for all 487 genomes currently available publicly. A majority of FDA-ARGOS constituents (b) originate from North America and are from human clinical isolation
Fig. 3FDA-ARGOS reference genome assemblies quality metrics. Comparative microbial genome assembly quality metrics contrasted current FDA-ARGOS assemblies to 2013 and 2018 NCBI GenBank assemblies submitted for each species captured within the FDA-ARGOS database. Assembly quality metrics measured included: (a) median coverage, (b) median N50, (c) median L50, and (d) number of 2018 NCBI genomes that exhibited all, one or a specific quality control metric used to vet FDA-ARGOS genomes for inclusion. The NCBI assemblies were downloaded on August 6, 2018. For each box plot the center line represents the median value and is bounded by the 25th and 75th percentiles. The whiskers represent the min and max values
Fig. 4Comparison of NCBI Nt and FDA-ARGOS read classification results. Visualizing bioinformatics analysis with the MegaBLAST tool of metagenomics shotgun data of mock clinical human blood sample spiked with 105 E. avium. The heatmap showed read classification results for triplicate samples run against 200 database instances. Dark blue indicates read numbers below 10. A gradient from white to red indicates read numbers ranging from above 10 to 100,000. Here we demonstrated read classification results for all simulated species. E. avium classification results were consistent across all database instances. In addition, several other species were classified at >1000 reads with the normalized NCBI Nt database instances (Supplementary Data 3 and 4)
Bundibugyo ebolavirus performance summary
| Sample | Real-time PCR | MIPS | FDA-ARGOS | FDA-ARGOS |
|---|---|---|---|---|
| 2012-1 |
|
|
|
|
| 2012-16 | ND | 0.02% | 0.76% | 0.02% |
| 2012-91 | ND | 0.03% | 0.65% | 0.04% |
| 2012-95 | ND | 0.02% | 0.59% | 0.03% |
| 2012-99 | ND | 0.02% | 0.67% | 0.06% |
| 2012-120 |
|
|
|
|
| 2012-147 |
|
|
|
|
| 2012-153 |
|
|
|
|
| 2012-176 |
| 0.01% | 0.87% | 0.01% |
| 2012-198 | ND | 0.02% | 0.71% | 0.03% |
| NTC | N/A | 0.02% | 1.59% | 1.05% |
Illustration of target sequence comparison with FDA-ARGOS reference genomes for diagnostic performance testing. This table shows the traditional benchmark comparison of the Bundibugyo MIPS assay to real-time PCR (RT-PCR) results and target sequence comparison with FDA-ARGOS using two bioinformatics tools (MegaBLAST and Kraken). Benchmark positive values were only noted for samples that yielded duplicative positive results by RT-PCR (bolded). Percent reads classified only refer to percentage of reads that were assigned to Bundibugyo ebolavirus, the remaining reads are non-specific
Zaire ebolavirus Makona performance summary
| Sample | Real-time PCR | MIPS | FDA-ARGOS | FDA-ARGOS | FDA-ARGOS |
|---|---|---|---|---|---|
| 3754-2 |
| 0.05% | 0.60% | 0.06% | 0.03% |
| 3754-4 |
| 0.06% | 0.68% | 0.06% | 0.03% |
| 3811-2 |
| 0.07% | 0.56% | 0.08% | 0.04% |
| 3856-1P |
|
|
|
|
|
| 3913-5 |
| 0.00% | 0.68% | 0.00% | 0.01% |
| 3958-4 |
| 0.04% | 0.65% | 0.05% | 0.05% |
| 3991-2 |
| 0.00% | 0.65% | 0.00% | 0.01% |
| 4007-2 |
|
|
|
|
|
| 4015-1 |
|
|
|
|
|
| 4033-1 |
|
|
|
|
|
| 4268-1P |
|
|
|
|
|
| 4468-3 |
| 0.04% | 0.59% | 0.05% | 0.03% |
| 4641-3P |
| 0.03% | 0.59% | 0.04% | 0.02% |
| 4726-1 |
|
|
|
|
|
| 4845-3 |
| 0.00% | 0.66% | 0.01% | 0.01% |
| NTC | N/A | 0.02% | 0.86% | 0.04% | 0.01% |
Illustration of target sequence comparison with FDA-ARGOS reference genomes for diagnostic performance testing. This table shows the traditional benchmark comparison of the EBOV MIPS assay to real-time PCR (RT-PCR) results and target sequence comparison with FDA-ARGOS using three bioinformatics tools (MegaBLAST, Kraken, and LMAT). Benchmark positive values were only noted for samples that yielded duplicative positive results by RT-PCR (bolded). Percent reads classified only refer to percentage of reads that were assigned to Zaire ebolavirus Makona, the remaining reads are non-specific
Experimental design and results from EBOV mock clinical trial
| PFU/ml (LOD) | n | Avg EBOV reads | Avg %reads mapped | CoV | Positive samples | Negative samples |
|---|---|---|---|---|---|---|
| 1,000,000 (10×) | 16 | 5442.5 | 2.66% | 136.55% | 15 | 1 |
| 500,000 (5×) | 16 | 2777.5 | 2.49% | 152.33% | 13 | 3 |
| 100,000 (1×) | 16 | 351.5 | 0.58% | 247.57% | 9 | 7 |
| NTC | 100 | 4 | 0.00% | 571.69% | 1 | 99 |
Study design and demonstration of the preliminary diagnostic performance of an EBOV MIPS diagnostic assay. This table shows results from a mock clinical trial using 48 Zaire ebolavirus Makona positive samples at three different concentrations (10 × , 5×, and 1 × ) and 100 Ebola negative samples
EBOV mock clinical trial diagnostic performance
|
| Positive predictive value | Negative predictive value | Sensitivity | Specificity | Prevalence |
|---|---|---|---|---|---|
| 148 | 97.37% (83.95–99.62%) | 90.00% (84.26–93.80%) | 77.08% (62.69–87.97%) | 99.00% (94.55–99.97%) | 32.43% (24.98–40.61%) |
Study design and demonstration of the preliminary diagnostic performance of an EBOV MIPS diagnostic assay. This table shows the diagnostic performance of the EBOV MIPS mock clinical trial. Numbers in parentheses represent the 95% confidence interval
EBOV mock clinical trial prior probabilities
| Prior probability of infection | Positive predictive value | Negative predictive value |
|---|---|---|
| 0 | 0 | 1 |
| 0.01 | 0.44 | 1 |
| 0.05 | 0.8 | 0.99 |
| 0.1 | 0.9 | 0.97 |
| 0.15 | 0.93 | 0.96 |
| 0.2 | 0.95 | 0.95 |
| 0.25 | 0.96 | 0.93 |
| 0.3 | 0.97 | 0.91 |
| 0.4 | 0.98 | 0.87 |
| 0.5 | 0.99 | 0.81 |
| 0.6 | 0.99 | 0.74 |
| 0.7 | 0.99 | 0.65 |
| 0.75 | 1 | 0.59 |
| 0.8 | 1 | 0.52 |
| 0.85 | 1 | 0.43 |
| 0.9 | 1 | 0.32 |
| 0.95 | 1 | 0.18 |
| 0.99 | 1 | 0.04 |
| 1 | 1 | 0 |
Study design and demonstration of the preliminary diagnostic performance of an EBOV MIPS diagnostic assay. This table shows positive and negative predictive values for prior probabilities of infection ranging from 0 to 1