| Literature DB >> 31197312 |
Hein Chun1, Sangwoo Kim1.
Abstract
SUMMARY: Mislabeling in the process of next generation sequencing is a frequent problem that can cause an entire genomic analysis to fail, and a regular cohort-level checkup is needed to ensure that it has not occurred. We developed a new, automated tool (BAMixChecker) that accurately detects sample mismatches from a given BAM file cohort with minimal user intervention. BAMixChecker uses a flexible, data-specific set of single-nucleotide polymorphisms and detects orphan (unpaired) and swapped (mispaired) samples based on genotype-concordance score and entropy-based file name analysis. BAMixChecker shows ∼100% accuracy in real WES, RNA-Seq and targeted sequencing data cohorts, even for small panels (<50 genes). BAMixChecker provides an HTML-style report that graphically outlines the sample matching status in tables and heatmaps, with which users can quickly inspect any mismatch events.Entities:
Mesh:
Year: 2019 PMID: 31197312 PMCID: PMC6853765 DOI: 10.1093/bioinformatics/btz479
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) Overall workflow of BAMixChecker. (B) Score distribution of BAMixChecker in five datasets. Each dot reflects a comparison result between two samples. Red dots indicate unmatched pairs; blue dots are matched pairs. (C) Accuracies of the four tools in five cohorts. NGSCheckMate contains two different modes (BAM and FASTQ input). WES/RNA-Seq represents a WES-RNA-Seq pair. (D) Accuracy of the four tools in downsampled cohorts. (E) Running times of the four tools. The running times of BAMixChecker and NGSCheckMate were measured in two different modes (p1: single-thread, p4: multi-thread with four processors).*: default