| Literature DB >> 32575676 |
Marc Noguera-Julian1,2, Emma R Lee3, Robert W Shafer4, Rami Kantor5, Hezhao Ji3,6.
Abstract
External quality assessment (EQA) is a keystone element in the validation and implementation of next generation sequencing (NGS)-based HIV drug resistance testing (DRT). Software validation and evaluation is a critical element in NGS EQA programs. While the development, sharing, and adoption of wet lab protocols is coupled with the increasing access to NGS technology worldwide, rendering it easy to produce NGS data for HIV-DRT, bioinformatic data analysis remains a bottleneck for most of the diagnostic laboratories. Several computational tools have been made available, via free or commercial sources, to automate the conversion of raw NGS data into an actionable clinical report. Although different software platforms yield equivalent results when identical raw NGS datasets are analyzed for variations at higher abundance, discrepancies arise when variations at lower frequencies are considered. This implies that validation and performance assessment of the bioinformatics tools applied in NGS HIV-DRT is critical, and the origins of the observed discrepancies should be determined. Well-characterized reference NGS datasets with ground truth on the genotype composition at all examined loci and the exact frequencies of HIV variations they may harbor, so-called dry panels, would be essential in such cases. The strategic design and construction of such panels are challenging but imperative tasks in support of EQA programs for NGS-based HIV-DRT and the validation of relevant bioinformatics tools. Here, we present criteria that can guide the design of such dry panels, which were discussed in the Second International Winnipeg Symposium themed for EQA strategies for NGS HIVDR assays.Entities:
Keywords: HIV; drug resistance testing; dry panel; external quality assessment; next generation sequencing
Mesh:
Year: 2020 PMID: 32575676 PMCID: PMC7354622 DOI: 10.3390/v12060666
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.818
Figure 1Types of proposed accuracy measures to be evaluated for next generation sequencing (NGS)-based HIV drug resistance testing (DRT) dry panels. (a) Using NGS-derived consensus sequences, a phylogenetic tree can be calculated with ground truth and software results in order to assess phylogenetic identity; (b) Using mutation frequencies as detected in NGS data, mutations can be classified as detected/not-detected at specified thresholds and derived accuracy measures can be used to validate software tools. In addition, a direct correlation using ground truth and obtained mutation frequencies can be calculated for each sample or samples group.
Specific dataset and software features to be included and validated by next generation sequencing (NGS) HIV drug resistance testing (DRT) dry panels.
| Type of Data | Strategy for Inclusion | |
|---|---|---|
|
| ||
| Minimum quality score = 25 (Error probability = 0.3%) | Real Data | Include sample with insufficient quality |
| Minimum read length = 75 bp pair | Synthetic Data | Include short good quality reads with SM |
| Contamination control | Synthetic Data | Include non-viral contamination |
| APOBEC mutation check | Forged Data | Include APOBEC signature codons or hypermutated reads with SM |
|
| ||
| Allow use of HXB2 reference | Check for content | |
| Whole | Real Data | Include PR, RT, and IN genes data, check for content |
| Management of InDels | Synthetic Data | Include samples with in-frame full-codon InDels |
| Codon-aware alignment | Synthetic Data | Include samples with miss-aligning InDels |
|
| ||
| Codon level variant calling | Synthetic Data | Include within-codon nucleotide mixes to discard nucleotide-level variant calling |
| Variant count | Synthetic Data | Include controlled number of reads with specific mutation |
| Variant depth of coverage | Real Data | Design specific region with limited depth of coverage |
|
| ||
| Software/pipeline version | Check for content | |
| Consensus sequence export 15% | Synthetic Data | Include mixes >20% |
| Quantitative AAVF export | Check for content format compliance | |
|
| ||
| Consensus sequence similarity | Real Data | Calculate phylogenetic distance vs. ground truth, establish threshold |
| Mutation correlation | Synthetic/Forged Data | |
| TP/TN/FP/FN | Synthetic/Forged Data | Calculate accuracy at different thresholds vs. ground truth |
SM, Specific (target) mutations that can be used for flagging; PR, Protease; RT, reverse transcriptase; IN, integrases; TP, true positive; TN, true negative; FP, false positive; FN, false negative.