Literature DB >> 34962257

Detection of minor variants in Mycobacterium tuberculosis whole genome sequencing data.

Sander N Goossens¹, Tim H Heupink¹, Elise De Vos¹, Anzaan Dippenaar¹, Margaretha De Vos², Rob Warren³, Annelies Van Rie¹.

Abstract

The study of genetic minority variants is fundamental to the understanding of complex processes such as evolution, fitness, transmission, virulence, heteroresistance and drug tolerance in Mycobacterium tuberculosis (Mtb). We evaluated the performance of the variant calling tool LoFreq to detect de novo as well as drug resistance conferring minor variants in both in silico and clinical Mtb next generation sequencing (NGS) data. The in silico simulations demonstrated that LoFreq is a conservative variant caller with very high precision (≥96.7%) over the entire range of depth of coverage tested (30x to1000x), independent of the type and frequency of the minor variant. Sensitivity increased with increasing depth of coverage and increasing frequency of the variant, and was higher for calling insertion and deletion (indel) variants than for single nucleotide polymorphisms (SNP). The variant frequency limit of detection was 0.5% and 3% for indel and SNP minor variants, respectively. For serial isolates from a patient with DR-TB; LoFreq successfully identified all minor Mtb variants in the Rv0678 gene (allele frequency as low as 3.22% according to targeted deep sequencing) in whole genome sequencing data (median coverage of 62X). In conclusion, LoFreq can successfully detect minor variant populations in Mtb NGS data, thus limiting the need for filtering of possible false positive variants due to sequencing error. The observed performance statistics can be used to determine the limit of detection in existing whole genome sequencing Mtb data and guide the required depth of future studies that aim to investigate the presence of minor variants.

Entities: Chemical

Keywords: LoFreq; M. tuberculosis; benchmarking; low-frequency variant calling; minor variant calling; whole genome sequencing

Mesh：

Substances：
Bacterial Proteins

Year: 2022 PMID： 34962257 PMCID： PMC8769888 DOI： 10.1093/bib/bbab541

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Introduction

Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), is one of the top 10 causes of death worldwide and the leading cause of death from a single infectious agent [1]. Mtb continuously evolves through genomic acquisition of single nucleotide polymorphisms (SNPs), insertions and deletions (indels). The acquisition of genomic variants that confer drug resistance and the acquisition of compensatory mutations to overcome the fitness effect of these drug resistance causing mutations [2] have been widely studied. Understanding the patient’s isolate’s genomic drug resistance profile will help to define the optimal individualized treatment [3, 4]. Investigation of genomic variants in Mtb associated with other characteristics such as virulence, bacterial fitness, transmissibility and drug tolerance remains limited. For decades, it was believed that Mtb infection was clonal with a single genome to be representative of an infection [5]. Advances in molecular biology have highlighted genomic diversity in Mtb and thereby frequent occurrence of polyclonal infections [6]. Within a single patient, multiple unrelated clones can be present from different infections (mixed infection) or multiple closely related clones can be present reflecting micro-evolution in a previously clonal Mtb population within a lesion. Variants that represent polyclonal infection can be present at different levels, showing frequencies anywhere between 0 and 100% and can become the only (fixed) variant of the Mtb population or become lost [7]. Confidently calling the presence of minor variants in Mtb whole genome sequence (WGS) data is essential to study population dynamics. For example, the dynamics of heteroresistance—where both wild type and mutant alleles co-occur in genes associated with resistance—is of clinical interest as this has been associated with poor treatment outcomes [8]. Understanding drug tolerance, by studying the change in Mtb populations in response to drug exposure, and gaining insight in how Mtb populations can overcome drug pressure in the absence of drug resistance mutations can help with the development of novel treatment strategies [9]. Improved sensitivity to characterize Mtb population structures can greatly benefit the accuracy of TB transmission studies, especially in high burden settings where mixed infections occur frequently [6]. Until recently, detection of minor variants in WGS data was difficult due to the inability of ad hoc trimming, filtering and threshold approaches to distinguish true low-frequency variants from sequencing error [10]. Minor variants are therefore usually excluded from bioinformatic analyses even though they may be biologically relevant and fundamental to the analysis of population evolution and dynamics [11]. Lately, a bioinformatic tool called BinoSNP has been developed to detect minor drug resistant variants in Mtb next generation sequencing (NGS) data [4]. BinoSNP evaluates a user-defined list of genomic positions using a binomial test procedure to determine the presence of low-frequency variants in Mtb NGS data. The tool can accurately distinguish between true variants and sequencing error at a 1% frequency with a coverage ≥400x [4], but is restricted to the detection of SNPs in (per default) resistance conferring genes. Consequently, tools such as BinoSNP are not suitable for detection of variants that are not pre-specified such as de novo detection of minor populations in genetic regions not associated to drug resistance. LoFreq [12], a genome wide variant calling tool that models sequencing run-specific error rates and position-specific sequencing biases to call minor (<0.05%) variants overcomes these limitations as it allows detection of both low-frequency SNPs and indels in both pre-specified genetic regions such as resistance associated loci and previously unidentified genetic regions. The performance of LoFreq has been assessed for several pathogens (dengue virus [12], respiratory syncytial virus [10]). Unfortunately, as is the case for most variant-calling tools, performance evaluation has been restricted to the typical ‘average’ human or viral dataset, containing variants present at various frequencies and/or at a mixture of sequencing depths [10, 12–21]. In practice, however, researchers are interested in more precise information on the performance of bioinformatic tools that they can use to evaluate the performance of a tool for their own specific application. Due to differences in genome size, GC content, presence of highly repetitive regions and ploidy, the appropriateness of the assumptions and statistics used in variant callers may differ for microbial genomes [22]. Benchmarking of a WGS variant calling tool that is suitable to detect minor Mtb variants at different coverage depths and different variant frequencies remained to be done. We generated in silico (simulated) WGS datasets to assess the performance (sensitivity and precision) of LoFreq for the detection of SNPs and indels when present at a range of low-level frequencies and at a range of sequencing depths. We also applied the LoFreq tool to call minor variants in the Rv0678 gene in clinical Mtb WGS data and compared this to the findings obtained by targeted deep sequencing (TDS) [23].

Methods

Generation of in silico datasets

We used the ART next-generation sequencing simulator (Version 2.5.8) [24] to generate in silico sequencing reads in a way that mimics the technology-specific sequencing process. To simulate pair-end reads that would be obtained by sequencing Mtb DNA on the Illumina MiSeq v3 system, we used ART to generate reads with a length of 150 bp and a mean DNA fragment size of 350 bp with a standard deviation of 18.7 bp. ART’s default parameters were used except for masking that was turned off to include repetitive regions. Ten randomly mutated H37Rv reference genomes containing 1000 SNPs, 50 single base deletions and 50 single base insertions were generated to (1) guarantee sufficient statistical power, and (2) to reflect the observation that clinical samples typically differ 600–2600 SNPs from the H37Rv reference genome [25]. The performance of LoFreq to detect minor variants present at 0.1–20% frequency was investigated. Levels of coverage depth ranged from 30x to 1000x to explore the extremes of sequencing depth obtained when sequencing at minimalistic (30x) or deep sequencing (1000x) levels for Mtb WGS. In silico generated H37Rv (NC_000962.3) reference genome sequence reads were then merged with in silico generated random (but known) mutant sequence reads (SNPs and indels) to generate a WGS dataset of 640 Mtb genomes that represent the 64 combinations of minor variant frequency (0.1, 0.5, 1, 3, 5, 10, 15 and 20%) and depth of coverage (30, 50, 100, 200, 300, 400, 500 and 1000x). For example, to generate a dataset where a 3% variant is present at 400x depth, we merged 12x in silico generated mutant sequence reads with 388x in silico generated H37Rv (NC_000962.3) reference genome sequence reads.

Clinical WGS and TDS data

Clinical data used in this study have been published earlier by De Vos et al. [23] and are available online. WGS data from serial isolates from a previously reported case study were retrieved from the public European Nucleotide Archive (Project number: PRJEB32109) and TDS of the full Rv0678 region were retrieved from Bioproject at NCBI (project number PRJNA531707) [23].

Variant calling

Prior to LoFreq variant calling, in silico generated and clinical WGS fastq files were processed using the CompleX Bacterial Samples (XBS) pipeline to generate bam files [26]. Briefly, Fastq sequence data were mapped to the reference genome (H37Rv NC_000962.3) using BWA mem, after which the XBS pipeline performed identified read deduplication in the bam files using GATKMarkDuplicates (Picard) (Reference paper accepted, still to be added). Of note, unlike other Mtb pipelines, base quality score recalibration (BQSR) is not applied by XBS to avoid that variants in contaminant (non-Mtb) DNA are interpreted as systematic error by BQSR, which would result in reduced base qualities, including for genuine Mtb variants. The resulting bam files were indexed using the H37Rv reference genome and indel quality scores were assigned using LoFreq indelqual prior to variant calling. LoFreq (v2.1.2) was run using default parameters with the indel variant calling function turned on to evaluate the performance of both SNPs and indels [12]. Variant calling using Lofreq was performed on the whole Mtb genome, including highly repetitive regions such as PE/PPE regions. For clinical isolates, variant frequencies called by LoFreq were compared to the variant frequencies earlier reported by De Vos et al. [23], where variant calling was performed (1) on TDS data (using the ASAP pipeline) and (2) on WGS data by means of a combination of GATK and a visual approach using Tablet (after preprocessing WGS data with Novoalign).

Statistical analysis of LoFreq’s performance

To assess the performance of LoFreq at each of the 64 combinations of variant frequency (0.1, 0.5, 1, 3, 5, 10, 15 and 20%) and depth of coverage (30, 50, 100, 200, 300, 400, 500 and 1000x), we compared the truth, i.e. the mutations introduced in the in silico generated mutated H37Rv reference genome, to the observed, i.e. the absence or presence of each minor variant as reported in the VCF files generated by LoFreq. This allowed us to calculate the number of true positive (TP), false positive (FP), and false negative (FN) variants reported by LoFreq in each of the 640 WGS datasets. Using these results, we assessed the performance of LoFreq by calculating the sensitivity (defined as the ratio of true positives over true positives plus false negatives) and precision (defined as the ratio of true positives over true positives plus false positives) at each of the 64 combinations. All analyses of the performance of LoFreq on in silico generated Mtb WGS data were done in Rstudio. Regression lines were constructed applying locally estimated scatterplot smoothing (LOESS) using the ggplot package in RStudio. For the patient samples, the detection of minor variants by LoFreq could not be compared to a known set of all true variants (as was done for in silico datasets). Instead, the detection of minor variants by LoFreq in the Rv0678 gene in the WGS data was compared to the variants identified by TDS of the Rv0678 gene of the same samples as previously described by De Vos et al. [23] In addition, we also compared the ability of Lofreq to detect these minor variants in the WGS data to what has been previously been reported by De Vos et al. [23], where a combination of GATK and a visual inspection was used. To compare the variant frequencies that were predicted by the different variant calling methods (TDS, GATK/Visual-method and LoFreq) two proportion Z-tests were performed.

Results

At very high Mtb depth of coverage of 1000x, the limit of detection of LoFreq to call minor variants was 3% for SNPs, with a sensitivity of 48.6% (95% CI 47.7%, 49.6%) (Figure 1, Supplementary Figure S1). At this depth, sensitivity increased to ≥98% for variants present at frequency ≥5%. For variants present at 10% frequency and higher, the sensitivity increased rapidly with increasing depth of coverage: at 50, 200 and ≥400 depth, sensitivities were 19.6% (95% CI 18.9–20.4%), 90.7% (95% CI 90.1–91.2%) and 98.5% (95% CI 98.2–98.7%), respectively.

Figure 1

Sensitivity of the LoFreq tool for whole genome calling of minor SNP variants (present at 0.1–20%) at different levels of coverage (30–1000x) when evaluated on in silico whole genome sequence data sets. Lowfreq’s limit of detection was lower for insertions than for SNPs. The sensitivity of LoFreq for the detection of insertions present at 0.5% frequency was 43.6% (95% CI 39.2–48.1%) at 1000x coverage. Insertions present at 1% were detected with a sensitivity of 56.6% (95% CI 52.1–61.0%) at 500x coverage and 92.2% (95% CI 89.5–94.4%) at 1000x coverage (Figure 2, Supplementary Figure S2). At higher minor variant frequencies (≥3%), sensitivity to detect insertions increased fast with increasing coverage before plateauing around 98.8%. At low coverage (30x), insertions present at 10, 15 or 20% could be detected with 59.4% (95% CI 55.0–63.7%), 78.8% (95% CI 75.0–82.3%) and 91.8% (95% CI 89.0–94.1%) sensitivity, respectively.

Figure 3

Sensitivity of the LoFreq tool for whole genome calling of minor deletion variants (present at 0.1–20%) at different levels of coverage (30–1000x) when evaluated on in silico whole genome sequence data sets.

Sensitivity of the LoFreq tool for whole genome calling of minor insertion variants (present at 0.1–20%) at different levels of coverage (30–1000x) when evaluated on in silico whole genome sequence data sets. Similar results were obtained for the detection of deletions. When present at 0.5% frequency, the sensitivity for detection of deletions was 46.2% (95% CI 41.8–50.7%) at 1000x coverage. The sensitivity to detect minor variants present at 1% was 56.8% (95% CI 52.3–61.2%) at 500x coverage and 88.8% (95% CI 85.7–91.4%) at 1000x coverage (Figure 3, Supplementary Figure S3). Sensitivity to detect deletions increased quickly with increasing coverage before plateauing around 98.2% for variants present at ≥3%. At low coverage (30x), the sensitivity to detect deletions present at 10, 15 and 20% was 59.8% (95% CI 55.4–64.1%), 80.6% (95% CI 76.9–84.0%) and 89.8% (95% CI 86.8–92.3%), respectively. In case of FP indel mutations, LoFreq reported 91.8% (45/49) of the FP deletions and 5.4% (2/37) of the FP insertions to contain a large (>10 base-pair) insertion or deletion region. Sensitivity of the LoFreq tool for whole genome calling of minor deletion variants (present at 0.1–20%) at different levels of coverage (30–1000x) when evaluated on in silico whole genome sequence data sets. With regards to precision, our analysis of in silico data showed that LoFreq had a very low rate of calling false positives resulting in a precision of 1 for the detection of SNPs independent of frequency of the minor variant and depth of coverage (Supplementary Table S1). In the in silico datasets, a few false positive indels were reported, resulting in precision estimates between ≥99.5% for insertions and ≥96.7% for deletions (Supplementary Table S1). In addition to simulation experiments, we assessed LoFreq’s ability to detect minor variants in clinical WGS data (at 62x average depth) using four serial Mtb samples collected from a patient with XDR-TB who failed a BDQ-containing treatment regimen [23] and compared results to when variant calling was performed by GATK followed by visual inspection using Tablet or on TDS data. LoFreq detected all variants in Rv0678 as detected by TDS, including the variants present at lower frequency (5.7% and 3.2%). In contrast, the GATK/visual detection method detected variants present at a frequency exceeding 25%, but missed the variants that occurred at a frequency of ≤5.7% (Table 1). Predicted variant frequencies also differed between TDS and LoFreq for all (four) high frequency (>65%) variants, with LoFreq systematically predicting a lower frequency of variants to be present. A similar observation was made for three out of four of these variants when comparing predicted variant frequency when variant calling was done using GATK and visual inspection using Tablet or by LoFreq (Table 1). For lower frequency variants no significant difference was observed when comparing predicted variant frequency between the different variant calling methods.

Table 1

Sample accession number European nucleotide archive	Rv0678 variant	Variant frequency % (number of mutant/total reads)			P-value
Sample accession number European nucleotide archive	Rv0678 variant	TDS	WGS GATK/Visual	WGS LoFreq	TDS versus GATK/Visual	TDS versus LoFreq	GATK/Visual versus LoFreq
SAMEA5562524	192 G ins	96.66% (17 551/18 158)	100% (56/56)	87.06% (74/85)	0.31	<0.0001	0.013
SAMEA5562526	138 GA ins	97.52% (13 299/13 638)	100% (75/75)	80.91% (89/110)	0.31	<0.0001	0.0002
SAMEA5562527	138 G ins	65.48% (9317/14 230)	63% (45/71)	55.21% (53/96)	0.81	0.046	0.37
	138 GA ins	28.35% (4034/14 230)	25% (18/71)	19.79% (19/96)	0.67	0.08	0.50
	192 G ins	3.22% (461/14 230)	No MV detected: data missing	4.49% (4/89)		0.71
SAMEA5562528	138 G ins	91.68% (13 029/14 212)	96% (79/82)	81.55% (84/103)	0.18	0.0004	0.004
SAMEA5562528	138 GA ins	5.86% (832/14 212)	No MV detected:data missing	6.80% (7/103)		0.85

ins = insertion; WGS = whole genome sequencing; TDS = targeted deep sequencing; MV = minor variant.

Minority populations identified in Rv0678 when variant calling was performed on TDS data or whole genome sequencing data with variant calling either performed by a combination of GATK and visual inspection (using Tablet) or by the LoFreq variant caller ins = insertion; WGS = whole genome sequencing; TDS = targeted deep sequencing; MV = minor variant. In terms of speed, we found LoFreq’s runtime to scale linearly with sequencing depth. Using a single core QEMU Virtual CPU version 0.12 operating at 2.5GHz the runtime for LoFreq was roughly 10 min for assigning indel quality scores using the LoFreq Indelqual command and approximately 2 h to perform the variant calling using the LoFreq call command for a single sample where the full Mtb genome (size = 4.4 Mbp) was sequenced at 1000× coverage.

Discussion

The finding that LoFreq’s sensitivity to detect minor variants increases with sequencing depth and variant frequency is consistent with the general assumption that sequencing populations at higher coverages reduces the uncertainties associated with random sequencing errors [27]. This finding is in sheer contrast with two previous studies where similar coverage ranges where investigated and LoFreq’s sensitivity was not found to be significantly affected by depth of coverage [10, 13]. Moreover, our results indicate that LoFreq’s performance depends on the type of variant to be detected, with greater sensitivity to detect indel mutations than SNPs. This likely reflects that random sequencing errors generated by short-read sequencers are mostly SNPs rather than indels [28], allowing LoFreq to more rapidly and confidently call an observed indel variant as true as opposed to a sequencing error. Congruent with what is reported in literature, our in silico assessment confirms LoFreq to be a conservative variant caller with high precision, minimizing the need for subsequent filtering of false positive variants and potentially losing a significant proportion of true positive variants [10]. In contrast to the perfect precision of 1 when variant-calling is performed on in silico introduced SNPs, in some cases a precision smaller than 1 (but >0.96) for calling indel variants was observed, which is in agreement with what has been previously reported and has been attributed to mis-alignment of the indel supporting reads [14]. We further observed that for a considerable proportion of FP indels (91.8% for deletions and 5.4% for insertions) LoFreq reported considerably large (>10 bp) indels to be present, suggesting that additional filtering of indel variants on length might further decrease FP-rate. From a clinical perspective, previous publications have suggested that only variants occurring at a frequency of ≥19% become fixed in Mtb populations [7, 11]. This clinically relevant variant frequency threshold is further supported by the observation that the presence of low-frequency resistance mutations (<5%) does not affect treatment outcomes of patients infected with drug-susceptible TB [29]. Our in silico results indicate high specificity (>0.97) to detect minor Mtb variants at such clinically relevant frequency levels (20%) when sequenced at ongoing sequencing depth (100X), supporting LoFreq to be a clinically relevant variant calling tool for Mtb WGS data. On the other hand, from a biological perspective, it can be expected that variants with biological advantages (such as drug resistance, drug tolerance or higher fitness) may be selected even when initially occurring at very low frequencies [30]. For variants occurring below or at the detection limit (3% for SNP calling and 0.5% for indel calling) for which—even at high coverage (1000X)—sensitivity of LoFreq is low (<50%), very high coverages currently seem indispensable, favoring a targeted sequencing approach. The study by McCrone et al. [15] found LoFreq’s sensitivity to call variants in data resembling clinical samples to be lower than what was expected from previous benchmarking, pointing out potential variation in LoFreq’s performance on clinical data. In contrast to this study (performed on viral populations with very low to low (0.16–5%) frequency variant populations), we found LoFreq able to detect all TDS-determined minor variants in WGS data from clinical isolates. In our setup, LoFreq detected two low-frequency variants (3.22% and 5.86%) that could not be identified by a combination of GATK and visual inspection. The ability of LoFreq to detect these low-frequency variants at a median coverage of only 62X indicates that the results of in silico simulations described above may underestimate sensitivity compared to when clinical data are used. In our hands, LoFreq did not have lower sensitivity to detect variants in clinical data compared to in silico datasets. Other studies have shown that sequential sequencing of serial patient samples generates a large number of transient variants. It is however unclear whether such variants are genuine or the result of sequencing error [7]. The fact that for our clinical dataset LoFreq finds exactly and exclusively the same variants that are found by TDS (which is known to be precise and where called variants are unlikely to be due to sequencing error when present >1%), suggests that variants called by LoFreq are genuine rather than sequencing error. For the transient variants described in the clinical study referred to in this paper, we would thus argue that variants are thus genuine rather than sequencing error. For resistance-associated genes, variants are often categorized into micro- (<5%) and macro- (5–95%) heteroresistant variants [30]. However, our observation that absolute variant frequencies diverge between the chosen variant caller for multiple samples indicates that such categorization should be done with care. Multiple factors could result in discrepant variant frequencies: (1) stochasticity of the sequenced reads (which decreases with sequencing depth and thus favors variant frequency as predicted by TDS (>13.000X) as compared to WGS (62X)); (2) Some forms of bias tend to push variants to above 50% allele frequency, particularly at higher coverages and when more selective events occur. Forms of PCR biases present in both TDS and WGS library preparations could thus underly the observed variation in variant frequency and explain why lower frequency variants increase and higher frequency variants decrease when variant calling is done using TDS compared to WGS.

Strengths and Limitations

To our knowledge, this is the first study benchmarking a WGS variant calling tool suitable to detect minor Mtb variants. This study yielded precise in silico as well as clinical information on the performance of bioinformatic tools, allowing researchers to evaluate the performance of tools for their own specific application. One limitation of the in silico validation performed in this study is that the introduced indel mutations were limited to single nucleotide insertion or deletions, while the power to sensitively detect indels has been reported to decrease with indel-length [14]. This is particularly relevant as longer indel regions (>50 bp) have shown to be present in the Mtb genome [31]. Moreover, indels have been reported to occur with increased frequency in low-complexity regions in the Mtb genome where indel-calling is known to be more error-prone, while in silico generated indels were randomly distributed throughout the genome [32]. Therefore, the sensitivity of LoFreq to detect longer indels and indels present in low-complexity regions requires further investigation. Another limitation of the study is that the small clinical sample size did not allow to statistically validate LoFreq’s in silico predicted performance metrics when applied on clinical sequencing data. For the same reason, our serial data (corresponding to a single patient) do not allow us to properly address the debated question whether transient variants observed upon sequential sampling are in general genuine (as suggested by our data) or due to sequencing error. Similar studies containing larger sample size would be required to generalize our findings. In addition, all variants present in the clinical data were insertions. Further validation of LoFreq’s performance on larger clinical sample sizes containing both SNP and indel mutations thus remains to be done and would be highly complementary to the in silico findings reported in this paper. A benchmarked genome wide minor variant calling tool is currently missing for Mtb. Sensitivity of LoFreq to detect minor variants ranges from up to 98.8%, improving with increasing frequency of minor variants in the Mtb genome and increasing levels of coverage depth. Sensitivity of LoFreq is found to be higher for calling indel mutations than single nucleotide polymorphisms. LoFreq shows to be a highly precise, conservative variant caller, limiting the need for subsequent filtering of false positive variants. LoFreq is a clinically valuable whole genome sequence (WGS) variant calling tool. Performance statistics as reported in this study are required to guide future studies aiming to investigate the presence of minor variants in Mtb WGS datasets, such findings are necessary to improve our understanding on various—currently understudied—tuberculosis topics including bacterial transmissibility, bacterial fitness, virulence and drug tolerance. Click here for additional data file.

31 in total

Review 1. Mechanisms of Drug-Induced Tolerance in Mycobacterium tuberculosis.

Authors: Sander N Goossens; Samantha L Sampson; Annelies Van Rie
Journal: Clin Microbiol Rev Date: 2020-10-14 Impact factor: 26.132

2. Measurements of Intrahost Viral Diversity Are Extremely Sensitive to Systematic Errors in Variant Calling.

Authors: John T McCrone; Adam S Lauring
Journal: J Virol Date: 2016-07-11 Impact factor: 5.103

3. Genomic analyses of Mycobacterium tuberculosis from human lung resections reveal a high frequency of polyclonal infections.

Authors: Miguel Moreno-Molina; Natalia Shubladze; Iza Khurtsilava; Zaza Avaliani; Nino Bablishvili; Manuela Torres-Puente; Luis Villamayor; Andrei Gabrielian; Alex Rosenthal; Cristina Vilaplana; Sebastien Gagneux; Russell R Kempker; Sergo Vashakidze; Iñaki Comas
Journal: Nat Commun Date: 2021-05-11 Impact factor: 17.694

4. Bedaquiline Microheteroresistance after Cessation of Tuberculosis Treatment.

Authors: Margaretha de Vos; Serej D Ley; Kristin B Wiggins; Brigitta Derendinger; Anzaan Dippenaar; Melanie Grobbelaar; Anja Reuter; Tania Dolby; Scott Burns; Marco Schito; David M Engelthaler; John Metcalfe; Grant Theron; Annelies van Rie; James Posey; Rob Warren; Helen Cox
Journal: N Engl J Med Date: 2019-05-30 Impact factor: 91.245

5. Dual Deep Sequencing Improves the Accuracy of Low-Frequency Somatic Mutation Detection in Cancer Gene Panel Testing.

Authors: Hiroki Ura; Sumihito Togi; Yo Niida
Journal: Int J Mol Sci Date: 2020-05-16 Impact factor: 5.923

6. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.

Authors: Stephen J Bush; Dona Foster; David W Eyre; Emily L Clark; Nicola De Maio; Liam P Shaw; Nicole Stoesser; Tim E A Peto; Derrick W Crook; A Sarah Walker
Journal: Gigascience Date: 2020-02-01 Impact factor: 6.524

7. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.

Authors: Andreas Wilm; Pauline Poh Kim Aw; Denis Bertrand; Grace Hui Ting Yeo; Swee Hoe Ong; Chang Hua Wong; Chiea Chuen Khor; Rosemary Petric; Martin Lloyd Hibberd; Niranjan Nagarajan
Journal: Nucleic Acids Res Date: 2012-10-12 Impact factor: 16.971

8. Evaluation of variant detection software for pooled next-generation sequence data.

Authors: Howard W Huang; James C Mullikin; Nancy F Hansen
Journal: BMC Bioinformatics Date: 2015-07-29 Impact factor: 3.169

9. Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage.

Authors: Jean P Zukurov; Sieberth do Nascimento-Brito; Angela C Volpini; Guilherme C Oliveira; Luiz Mario R Janini; Fernando Antoneli
Journal: Algorithms Mol Biol Date: 2016-03-11 Impact factor: 1.405

10. Genome-wide somatic variant calling using localized colored de Bruijn graphs.

Authors: Giuseppe Narzisi; André Corvelo; Kanika Arora; Ewa A Bergmann; Minita Shah; Rajeeva Musunuri; Anne-Katrin Emde; Nicolas Robine; Vladimir Vacic; Michael C Zody
Journal: Commun Biol Date: 2018-03-22