| Literature DB >> 36140751 |
Asmaa A Helal1, Bishoy T Saad1, Mina T Saad1, Gamal S Mosaad1, Khaled M Aboshanab2.
Abstract
The goal of biomarker testing, in the field of personalized medicine, is to guide treatments to achieve the best possible results for each patient. The accurate and reliable identification of everyone's genome variants is essential for the success of clinical genomics, employing third-generation sequencing. Different variant calling techniques have been used and recommended by both Oxford Nanopore Technologies (ONT) and Nanopore communities. A thorough examination of the variant callers might give critical guidance for third-generation sequencing-based clinical genomics. In this study, two reference genome sample datasets (NA12878) and (NA24385) and the set of high-confidence variant calls provided by the Genome in a Bottle (GIAB) were used to allow the evaluation of the performance of six variant calling tools, including Human-SNP-wf, Clair3, Clair, NanoCaller, Longshot, and Medaka, as an integral step in the in-house variant detection workflow. Out of the six variant callers understudy, Clair3 and Human-SNP-wf that has Clair3 incorporated into it achieved the highest performance rates in comparison to the other variant callers. Evaluation of the results for the tool was expressed in terms of Precision, Recall, and F1-score using Hap.py tools for the comparison. In conclusion, our findings give important insights for identifying accurate variants from third-generation sequencing of personal genomes using different variant detection tools available for long-read sequencing.Entities:
Keywords: Clair; Clair3; Longshot; Medaka; NanoCaller; human-SNP-wf; nanopore; variant detection
Mesh:
Year: 2022 PMID: 36140751 PMCID: PMC9498802 DOI: 10.3390/genes13091583
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Summary of the tools used in both SNP and indel detection.
| Tool | Version | Function |
|---|---|---|
| Guppy | v5.0.16 | data processing toolkit that contains Oxford Nanopore’s base-calling algorithms. Guppy is integrated into MinKNOW and is also available as a standalone version. |
| Minimap2 | v2.22 | A sequence alignment tool that aligns DNA or mRNA sequences to a vast library of reference sequences. |
| Samtools | v.1.14 | a collection of programs for manipulating alignments in the SAM, BAM, and CRAM formats. It converts between formats, sorts, merges, and indexes data, it can quickly remove PCR duplicates and calculate the mean coverage for a target region |
| Medaka | v1.4.4 | a program that uses Nanopore sequencing data to generate consensus sequences and calling of variants. |
| Clair | v2.11 | a tool that uses single molecule sequencing data to call germline small variants quickly and accurately. |
| Longshot | v0.4.1 | a tool for detecting variants in diploid genomes using long error-prone reads. It takes an aligned BAM/CRAM file as input and outputs a phased VCF file containing variant and haplotype information. |
| NanoCaller | v2.1.2 | a computational method for detecting SNPs/indels in long-read sequencing data that integrates long reads in a deep convolutional neural network and generates predictions for each SNP candidate variant site by considering pileup information from other candidate sites that share reads. |
| Clair3 | v0.1-r11 | a long-read germline small variant caller excels in two major method categories: pileup calling, which handles most variant candidates quickly, and full alignment, which tackles complex candidates to maximize precision and recall. |
| Hap.py | v0.3.15 | To compare a VCF with a gold standard dataset vcf |
| SnpEff | v5.1 | Toolbox for genetic variant annotation and functional effect prediction. It describes and estimates the effects of genetic variants on genes and proteins (such as amino acid changes) |
| Epi2me-labs/wf-human-SNP | v0.3.1 | includes a nextflow workflow for calling diploid variants in whole genome data. Clair3 is used in this workflow to identify small variants in long reads. |
SAM: Sequence Alignment Map, BAM: Binary Alignment Map, CRAM: Compressed Reference-oriented Alignment Map, VCF: Variant call format
The coverage difference before removing duplicates and after removing duplicates.
| Sample | Before Removing Duplicates | After Removing Duplicates | ||
|---|---|---|---|---|
| BRCA1 | BRCA2 | BRCA1 | BRCA2 | |
| HG001 | 32.62 X | 36.89 X | 32.55 X | 36.89 X |
| HG002 | 53.85 X | 70.06 X | 53.85 X | 70.06 X |
The total no. of the output variants (SNPs, INDELs, and MNPs) of the six variant callers in comparison to both BRCA1 and BRCA2 genes in the HG001.
| Tool Name | Total No. of BRCA1 Variants | Total No. of BRCA2 Variants | Total |
|---|---|---|---|
| Clair | 482 | 348 | 830 |
| Longshot | 124 | 108 | 232 |
| NanoCaller | 121 | 97 | 218 |
| Medaka | 221 | 221 | 442 |
| Clair3 | 225 | 172 | 397 |
| Epi2me-labs/wf-human-SNP | 370 | 285 | 655 |
The total no. of the output variants (SNPs, INDELs, and MNPs) of the six variant callers in comparison to both BRCA1 and BRCA2 genes in the HG002.
| Tool Name | Total No. of BRCA1 Variants | Total No. of BRCA2 Variants | Total |
|---|---|---|---|
| Clair | 482 | 372 | 854 |
| Longshot | 124 | 108 | 232 |
| NanoCaller | 121 | 97 | 218 |
| Medaka | 111 | 98 | 209 |
| Clair3 | 370 | 172 | 542 |
| Epi2me-labs/wf-human-SNP | 370 | 285 | 655 |
Summary for the benchmarking output for HG001 with 6 different variant callers, highlighting the recall, precision, and F1-score.
| HG001 (NA12878) | Recall | Precision | F1 Score | Total Time Taken | |
|---|---|---|---|---|---|
| 1. Human-SNP-wf | BRCA1-SNP | 98.04% | 95.24% | 96.62% | 1 h |
| BRCA1-INDEL | 94.12% | 80.00% | 86.49% | ||
| BRCA2-SNP | 95.24% | 96.15% | 95.69% | ||
| BRCA2-INDEL | 94.74% | 75.00% | 83.72% | ||
| 2. Clair3 | BRCA1-SNP | 99.02% | 96.19% | 97.58% | 1 h 22 min |
| BRCA1-INDEL | 94.12% | 80.00% | 86.49% | ||
| BRCA2-SNP | 96.19% | 97.12% | 96.65% | ||
| BRCA2-INDEL | 94.74% | 81.82% | 87.80% | ||
| 3. Medaka | BRCA1-SNP | 92.16% | 89.52% | 90.82% | 1 h 29 min |
| BRCA1-INDEL | 58.82% | 50.00% | 54.05% | ||
| BRCA2-SNP | 94.29% | 95.19% | 94.74% | ||
| BRCA2-INDEL | 57.89% | 50.00% | 53.66% | ||
| 4. Nanocaller | BRCA1-SNP | 96.08% | 93.33% | 94.69% | 42 min |
| BRCA1-INDEL | 76.47% | 65.00% | 70.27% | ||
| BRCA2-SNP | 95.24% | 96.15% | 95.69% | ||
| BRCA2-INDEL | 80.00% | 54.55% | 64.86% | ||
| 5. Longshot | BRCA1-SNP | 95.10% | 92.38% | 93.72% | 48 min |
| BRCA1-INDEL | 70.59% | 60.00% | 64.86% | ||
| BRCA2-SNP | 93.33% | 94.23% | 93.78% | ||
| BRCA2-INDEL | 68.42% | 59.09% | 63.41% | ||
| 6. Clair | BRCA1-SNP | 96.08% | 93.33% | 94.69% | 2 h |
| BRCA1-INDEL | 64.71% | 55.00% | 59.46% | ||
| BRCA2-SNP | 93.33% | 94.23% | 93.78% | ||
| BRCA2-INDEL | 63.16% | 54.55% | 58.54% |
Summary for the benchmarking output for HG002 with 6 different variant callers, highlighting the recall, precision, and F1-score.
| HG002 (NA24385) | Recall | Precision | F1-Score | Total Time Taken | |
|---|---|---|---|---|---|
| 1. wf-Human-SNP | BRCA1-SNP | 97.20% | 99.05% | 98.11% | 43 min |
| BRCA1-INDEL | 93.33% | 70.00% | 80.00% | ||
| BRCA2-SNP | 97.06% | 98.02% | 97.54% | ||
| BRCA2-INDEL | 95.00% | 90.48% | 92.68% | ||
| 2. Clair3 | BRCA1-SNP | 96.26% | 98.10% | 97.17% | 1 h 7 min |
| BRCA1-INDEL | 86.67% | 65.00% | 74.29% | ||
| BRCA2-SNP | 95.10% | 96.04% | 95.57% | ||
| BRCA2-INDEL | 85.00% | 80.95% | 82.93% | ||
| 3. Medaka | BRCA1-SNP | 91.59% | 93.33% | 92.45% | 39 min |
| BRCA1-INDEL | 60.00% | 45.00% | 51.43% | ||
| BRCA2-SNP | 90.20% | 91.09% | 90.64% | ||
| BRCA2-INDEL | 60.00% | 57.14% | 58.54% | ||
| 4. Nanocaller | BRCA1-SNP | 95.33% | 97.14% | 96.23% | 28 min |
| BRCA1-INDEL | 80.00% | 60.00% | 68.57% | ||
| BRCA2-SNP | 94.12% | 95.05% | 94.58% | ||
| BRCA2-INDEL | 85.00% | 80.95% | 82.93% | ||
| 5. Longshot | BRCA1-SNP | 94.39% | 96.19% | 95.28% | 38 min |
| BRCA1-INDEL | 73.33% | 55.00% | 62.86% | ||
| BRCA2-SNP | 92.16% | 93.07% | 92.61% | ||
| BRCA2-INDEL | 75.00% | 71.43% | 73.17% | ||
| 6. Clair | BRCA1-SNP | 93.46% | 95.24% | 94.34% | 1 h 11 min |
| BRCA1-INDEL | 66.67% | 50.00% | 57.14% | ||
| BRCA2-SNP | 91.18% | 92.08% | 91.63% | ||
| BRCA2-INDEL | 65.00% | 61.90% | 63.41% |