| Literature DB >> 32005884 |
Emma R Lee1, Neil Parkin2, Cheryl Jennings3, Chanson J Brumme4,5, Eric Enns6, Maria Casadellà7, Mark Howison8, Mia Coetzer9, Santiago Avila-Rios10, Rupert Capina1, Eric Marinier6, Gary Van Domselaar6,11, Marc Noguera-Julian7, Don Kirkby4,5, Jeff Knaggs4,5, Richard Harrigan12, Miguel Quiñones-Mateu13, Roger Paredes7,14, Rami Kantor9, Paul Sandstrom1,11, Hezhao Ji15,16.
Abstract
Next generation sequencing (NGS) is a trending new standard for genotypic HIV-1 drug resistance (HIVDR) testing. Many NGS HIVDR data analysis pipelines have been independently developed, each with variable outputs and data management protocols. Standardization of such analytical methods and comparison of available pipelines are lacking, yet may impact subsequent HIVDR interpretation and other downstream applications. Here we compared the performance of five NGS HIVDR pipelines using proficiency panel samples from NIAID Virology Quality Assurance (VQA) program. Ten VQA panel specimens were genotyped by each of six international laboratories using their own in-house NGS assays. Raw NGS data were then processed using each of the five different pipelines including HyDRA, MiCall, PASeq, Hivmmer and DEEPGEN. All pipelines detected amino acid variants (AAVs) at full range of frequencies (1~100%) and demonstrated good linearity as compared to the reference frequency values. While the sensitivity in detecting low abundance AAVs, with frequencies between 1~20%, is less a concern for all pipelines, their specificity dramatically decreased at AAV frequencies <2%, suggesting that 2% threshold may be a more reliable reporting threshold for ensured specificity in AAV calling and reporting. More variations were observed among the pipelines when low abundance AAVs are concerned, likely due to differences in their NGS read quality control strategies. Findings from this study highlight the need for standardized strategies for NGS HIVDR data analysis, especially for the detection of minority HIVDR variants.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32005884 PMCID: PMC6994664 DOI: 10.1038/s41598-020-58544-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of pipelines for automated NGS-based HIVDR data analysis.
| MiCall[ | HyDRA[ | PASeq.[ | Hivmmer[ | DEEPGEN[ | |
|---|---|---|---|---|---|
| URL | N/A | ||||
| Bioinformatic IT needs | No | No | No | Yes | N/A |
| Compatible NGS Platform | Illumina, Ion Torrent | Illumina, Ion Torrent | Illumina, Ion Torrent | Illumina, Ion Torrent | Illumina, Ion Torrent |
| Web Interface | Yes | Yes | Yes | No | No |
| Designed for HIVDR | Yes | Yes | Yes | Yes | Yes |
| Ref Database | HIVdb | HIVdb | HIVdb | HIVdb | HIVdb |
| Output (aa) | csv | AAVF | csv | AAVF/csv | csv |
Figure 1Comparison of NGS HIVDR data analysis pipelines workflow. Abbreviations: NIAID, National Institute of Allergy and Infectious Diseases; VQA, Virology Quality Assurance; BC-CfE, British Columbia Center for Excellence in HIV/AIDS; NHRL, National HIV and Retrovirology Laboratories; BU, Brown University; CWRU, Case Western Reserve University; CIENI, Center for Research in Infectious Diseases; IrisCaxia, AIDS Research Institute. *There are only 57 instead of 60 FASTQ files because 1 lab only processed 7 samples instead of 10. ** Each sample’s pipeline result (AAVF/csv) from each lab was compared, and subsequently all analyses were combined.
Figure 2Linearity in AAV frequency measurements between 1% ~ 100% variant frequency.
Figure 3Distribution of sensitivity of NGS HIVDR data analysis pipelines at various AAV frequency thresholds. The scatter plot shows the median and interquartile range for the sensitivity of each pipeline where each point represents one of the six different labs that genotyped VQA specimens at 1%, 2%, 5%, 10%, 15% and 20% thresholds respectively.
Figure 4Distribution of specificity of NGS HIVDR data analysis pipelines at various AAV frequency thresholds. The scatter plot shows the median and interquartile range for specificity for each pipeline where each point represents one of the six different labs that genotyped VQA specimens at 1%, 2%, 5%, 10%, 15% and 20% thresholds respectively.
Figure 5Distribution of %CV measurements of AAV frequencies between 1~100%. The scatter plot shows the median and interquartile range for %CV at AAV frequencies between 1~100%. Thresholds for outliers are shown by the red line and are equal to twice the %CV median for each range (see Methods).
Summary of NGS HIVDR data analysis pipeline outliers.
| %AAV Frequency Range | %CV threshold | Number of outliers above %CV threshold* | ||||
|---|---|---|---|---|---|---|
| HyDRA | MiCall | DEEPGEN | PASeq | Hivmmer | ||
| 90-100 | ≤1 | 14 | 2 | 62 | 27 | 12 |
| 70-90 | ≤3 | 4 | 3 | 10 | 6 | 4 |
| 50-70 | ≤5 | 1 | 2 | 2 | 4 | 2 |
| 30-50 | ≤7 | 3 | 2 | 6 | 2 | 3 |
| 20-30 | ≤10 | 4 | 0 | 10 | 8 | 5 |
| 15-20 | ≤12 | 3 | 0 | 4 | 4 | 5 |
| 10-15 | ≤12 | 8 | 1 | 3 | 10 | 5 |
| 5-10 | ≤20 | 8 | 1 | 11 | 11 | 11 |
| 2-5 | ≤20 | 10 | 8 | 10 | 15 | 20 |
| 1-2 | ≤24 | 11 | 11 | 12 | 21 | 11 |
*Data from one of the six participating labs was removed from outlier analysis because the results from one pipeline were missing. In this case, there were 47 data sets as opposed to 57 (see Methods). The %CV thresholds for outliers are equal to twice the %CV median for each %AAV frequency range.