| Literature DB >> 25101166 |
Bruce Budowle1, Nancy D Connell2, Anna Bielecka-Oder3, Rita R Colwell4, Cindi R Corbett5, Jacqueline Fletcher6, Mats Forsman7, Dana R Kadavy8, Alemka Markotic9, Stephen A Morse10, Randall S Murch11, Antti Sajantila12, Sarah E Schmedes13, Krista L Ternus8, Stephen D Turner14, Samuel Minot8.
Abstract
High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.Entities:
Keywords: Bioinformatics; High throughput sequencing; Library preparation; Microbial forensics; Sample preparation; Validation
Year: 2014 PMID: 25101166 PMCID: PMC4123828 DOI: 10.1186/2041-2223-5-9
Source DB: PubMed Journal: Investig Genet ISSN: 2041-2223
Validation criteria for analytical performance metrics
| Analytical sensitivity | Likelihood that the assay will detect a target (for example, organism variant, sequence region, functional element, and so on) in a sample (that is, target), if present; can include target attribution when defined as strain- or isolate-level detection. Also known as the true positive rate. Calculated by dividing number of true positives by the sum of true positive and false negatives (TP/(TP + FN)). |
| Analytical specificity | Likelihood that the assay will not detect a target, if not in the sample; can include false target attribution. Also known as the true negative rate. Calculated by dividing true negatives by the sum of true negatives plus false positives (TN/(TN + FP)). May be impractical to calculate for methods designed to detect the known universe of organisms. |
| Precision | The degree that individual measurements of the same sample are similar with regard to the presence and absence of target. Determined by the distribution of random errors and not the true or underlying value. |
| Accuracy | Degree that the material measured is similar to its true value. Calculated by (TP + TN)/(TP + FP + FN + TN). |
| Reproducibility | The degree to which the same result(s) is obtained for a sample when the assay is repeated between/among different operators and/or detection instruments. |
| Repeatability | The degree to which the same result(s) is obtained for a sample when the assay is repeated by the same operator and/or detection instrument. |
| Limit of detection | Minimum level of input material for a target as a proportion of the total at which all replicates are consistently positive for that target. |
| Reportable range | The region(s) of genome(s) that are sequenced and from which information is drawn for comparison or attribution. |
| False positive rate | The rate at which a target is incorrectly called as present. Also known as Type I error. Calculated as 1 – specificity |
| False negative rate | The rate at which a target organism is incorrectly called as absent. Also known as Type II error. Calculated as 1 – sensitivity. |
| Assay robustness | Stability of analytical performance under variable conditions, that is, likelihood of assay success. |
| Reference materialsa | Materials/samples used to test the performance of the assay (for example, reference panels of the target and mock or non-probative materials) relevant to the intended application of the assay. |
| Databasesa | Collection of data and reference genomes, genes and genomic elements to be used for interpretation of results. |
| Interpretation criteria for resultsa | Analysis (quantitative or qualitative) used and confidence level of a result (match, association, most recent common ancestor, and so on). |
aThese last three items – Reference materials, Databases, and Interpretation criteria – typically have not been considered validation criteria. However, they have been included here primarily because interpretation of results is an essential part of generating reliable and appropriate results, which should be described within a standard operating protocol (SOP). The data used to test a system are reliant on reference materials and, depending on the situation, databases. See [58-62].
Figure 1Basic schematic of data flow through an analysis process. The first step of base calling generally is completed by the instrument software, and each downstream step must be included in the validated analytical pipeline. Additional data processing after generating sequence reads is required, for example with contig building and/or alignment, and will depend on the application.
Figure 2Alternate alignments of identical sequences. Reads 1 and 2 are aligned in equally optimal ways that indicate different locations for a 2 bp deletion relative to the reference. Differences in alignment can be problematic when an evidence sample’s consensus alignment is based on a different approach than that of the reference sample or entries in a database.