Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome.

Literature DB >> 35020793

Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome.

Maximillian Marin^1,2, Roger Vargas^1,2, Michael Harris³, Brendan Jeffrey³, L Elaine Epperson⁴, David Durbin⁵, Michael Strong⁴, Max Salfinger⁶, Zamin Iqbal⁷, Irada Akhundova⁸, Sergo Vashakidze^9,10, Valeriu Crudu¹¹, Alex Rosenthal³, Maha Reda Farhat^1,12.

Abstract

MOTIVATION: Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content.
RESULTS: Reference based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality (MQ) filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness, and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms. AVAILABILITY: All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2022 PMID： 35020793 PMCID： PMC8963317 DOI： 10.1093/bioinformatics/btac023

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.931

36 in total

1. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors: Heng Li
Journal: Bioinformatics Date: 2011-09-08 Impact factor: 6.937

2. Snakemake--a scalable bioinformatics workflow engine.

Authors: Johannes Köster; Sven Rahmann
Journal: Bioinformatics Date: 2012-08-20 Impact factor: 6.937

3. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

Review 4. Toward better understanding of artifacts in variant calling from high-coverage samples.

Authors: Heng Li
Journal: Bioinformatics Date: 2014-06-27 Impact factor: 6.937

5. M. tuberculosis T Cell Epitope Analysis Reveals Paucity of Antigenic Variation and Identifies Rare Variable TB Antigens.

Authors: Mireia Coscolla; Richard Copin; Jayne Sutherland; Florian Gehre; Bouke de Jong; Olumuiya Owolabi; Georgetta Mbayo; Federica Giardina; Joel D Ernst; Sebastien Gagneux
Journal: Cell Host Microbe Date: 2015-11-11 Impact factor: 21.023

Review 6. Consequences of genomic diversity in Mycobacterium tuberculosis.

Authors: Mireia Coscolla; Sebastien Gagneux
Journal: Semin Immunol Date: 2014-10-22 Impact factor: 11.130

7. Sequence-specific error profile of Illumina sequencers.

Authors: Kensuke Nakamura; Taku Oshima; Takuya Morimoto; Shun Ikeda; Hirofumi Yoshikawa; Yuh Shiwa; Shu Ishikawa; Margaret C Linak; Aki Hirai; Hiroki Takahashi; Md Altaf-Ul-Amin; Naotake Ogasawara; Shigehiko Kanaya
Journal: Nucleic Acids Res Date: 2011-05-16 Impact factor: 16.971

8. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes.

Authors: Nicola De Maio; Liam P Shaw; Alasdair Hubbard; Sophie George; Nicholas D Sanderson; Jeremy Swann; Ryan Wick; Manal AbuOun; Emma Stubberfield; Sarah J Hoosdally; Derrick W Crook; Timothy E A Peto; Anna E Sheppard; Mark J Bailey; Daniel S Read; Muna F Anjum; A Sarah Walker; Nicole Stoesser
Journal: Microb Genom Date: 2019-08-30

9. Sequencing error profiles of Illumina sequencing instruments.

Authors: Nicholas Stoler; Anton Nekrutenko
Journal: NAR Genom Bioinform Date: 2021-03-27

10. Exact mapping of Illumina blind spots in the Mycobacterium tuberculosis genome reveals platform-wide and workflow-specific biases.

Authors: Samuel J Modlin; Cassidy Robinhold; Christopher Morrissey; Scott N Mitchell; Sarah M Ramirez-Busby; Tal Shmaya; Faramarz Valafar
Journal: Microb Genom Date: 2021-01-27

1 in total

1. A convolutional neural network highlights mutations relevant to antimicrobial resistance in Mycobacterium tuberculosis.

Authors: Anna G Green; Chang Ho Yoon; Andrew Beam; Maha Farhat; Michael L Chen; Yasha Ektefaie; Mack Fina; Luca Freschi; Matthias I Gröschel; Isaac Kohane
Journal: Nat Commun Date: 2022-07-02 Impact factor: 17.694

1 in total