| Literature DB >> 35720974 |
Nathan D Olson1,2, Justin Wagner1, Jennifer McDaniel1, Sarah H Stephens3, Samuel T Westreich4, Anish G Prasanna3, Elaine Johanson5, Emily Boja5, Ezekiel J Maier3, Omar Serang4, David Jáspez6, José M Lorenzo-Salazar6, Adrián Muñoz-Barrera6, Luis A Rubio-Rodríguez6, Carlos Flores6,7,8,9, Konstantinos Kyriakidis10,11, Andigoni Malousi11,12, Kishwar Shafin13, Trevor Pesout13, Miten Jain13, Benedict Paten13, Pi-Chuan Chang14, Alexey Kolesnikov14, Maria Nattestad14, Gunjan Baid14, Sidharth Goel14, Howard Yang14, Andrew Carroll14, Robert Eveleigh15, Mathieu Bourgey15, Guillaume Bourque15, Gen Li16, ChouXian Ma16, LinQi Tang16, YuanPing Du16, ShaoWei Zhang16, Jordi Morata17,18, Raúl Tonda17,18, Genís Parra17,18, Jean-Rémi Trotta17,18, Christian Brueffer19, Sinem Demirkaya-Budak20, Duygu Kabakci-Zorlu20, Deniz Turgut20, Özem Kalay20, Gungor Budak20, Kübra Narcı20, Elif Arslan20, Richard Brown20, Ivan J Johnson20, Alexey Dolgoborodov20, Vladimir Semenyuk20, Amit Jain20, H Serhat Tetikol20, Varun Jain21, Mike Ruehle21, Bryan Lajoie21, Cooper Roddey21, Severine Catreux21, Rami Mehio21, Mian Umair Ahsan22, Qian Liu22, Kai Wang22,23, Sayed Mohammad Ebrahim Sahraeian24, Li Tai Fang24, Marghoob Mohiyuddin24, Calvin Hung25, Chirag Jain26, Hanying Feng27, Zhipan Li27, Luoqi Chen27, Fritz J Sedlazeck28, Justin M Zook1.
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.Entities:
Year: 2022 PMID: 35720974 PMCID: PMC9205427 DOI: 10.1016/j.xgen.2022.100129
Source DB: PubMed Journal: Cell Genom ISSN: 2666-979X
Figure 1.Truth challenge V2 structure
Participants were provided sequencing reads (FASTQ files) from Illumina, PacBio HiFi, and ONT for the GIAB Ashkenazi trio (HG002, HG003, and HG004). Participants uploaded VCF files for each individual before the end of the challenge, and then the new benchmarks for HG003 and HG004 were made public.
Sequencing dataset characteristics
| Technology | GIAB ID | Read length (bp) | Number of reads | Coverage |
|---|---|---|---|---|
| Illumina | HG002 | 2×151 | 415,086,209 | 35 |
| HG003 | 2×151 | 419,192,650 | 35 | |
| HG004 | 2×151 | 420,312,085 | 35 | |
| PacBio HiFi | HG002 | 12,885 | 8,449,287 | 36 |
| HG003 | 14,763 | 7,288,357 | 35 | |
| HG004 | 15,102 | 7,089,316 | 35 | |
| ONT | HG002 | 50,380 | 19,328,993 | 47 |
| HG003 | 44,617 | 23,954,632 | 85 | |
| HG004 | 48,060 | 29,319,334 | 85 |
For read length, N50 was used to summarize PacBio and ONT read lengths; coverage was median coverage across autosomal chromosomes.
Figure 2.Challenge submission breakdown and performance overview
(A) Challenge submission breakdown by technology and type of variant caller used. Deep-learning methods use either a convolutional neural network or a recurrent neural network architecture for learning the variant-calling task, while non-deep-learning methods use techniques that broadly arise from statistical techniques (e.g., Bayesian and Gaussian mixture models) or other ML techniques (e.g., random forest) to differentiate variant and non-variant loci based on expert-designed features of the sequencing data.
(B and C) Overall performance (B) and submission rank (C) varied by technology and stratification (log scale). Generally, submissions that used multiple technologies (MULTI) outperformed single-technology submissions for all three genomic context categories. (B) A histogram of F1 percentage (higher is better) for the three genomic stratifications evaluated. Submission counts across technologies are indicated by light gray bars and individual technologies by colored bars. (C) Individual submission performance. Data points represent submission performance for the three stratifications (difficult-to-map regions, all benchmark regions, MHC), and lines connect submissions. Category top performers are indicated by diamonds with Ws and labeled with team names. F1 is plotted on a phred scale with axes labels and ticks indicating F1 percentage values.
Summary of challenge top performers
| Technology | Genomic region | Participant | Performance metrics | F1 Rank | ||||
|---|---|---|---|---|---|---|---|---|
| F1 | Recall | Precision | All | Diff | MHC | |||
| MULTI | all[ | Sentieon | 0.999 | 0.999 | 0.999 | 1 | 4 | 1 |
| MULTI | all[ | Roche Sequencing Solutions | 0.999 | 0.999 | 0.999 | 1 | 1 | 7 |
| MULTI | all[ | The Genomics Team in Google Health | 0.999 | 0.999 | 0.999 | 1 | 2 | 4 |
| MULTI | diff | Roche Sequencing Solutions | 0.994 | 0.992 | 0.996 | 1 | 1 | 7 |
| MULTI | MHC | Sentieon | 0.998 | 0.998 | 0.998 | 1 | 4 | 1 |
| ILLUMINA | all | DRAGEN | 0.997 | 0.996 | 0.998 | 1 | 1 | 5 |
| ILLUMINA | diff | DRAGEN | 0.969 | 0.961 | 0.978 | 1 | 1 | 5 |
| ILLUMINA | MHC | Seven Bridges Genomics | 0.992 | 0.989 | 0.996 | 6 | 9 | 1 |
| PACBIO | all | The Genomics Team in Google Health | 0.998 | 0.998 | 0.998 | 1 | 2 | 4 |
| PACBIO | diff | Sentieon | 0.993 | 0.991 | 0.994 | 4 | 1 | 1 |
| PACBIO | MHC | Sentieon | 0.995 | 0.993 | 0.997 | 4 | 1 | 1 |
| ONT | all | The UCSC CGL and Google Health | 0.965 | 0.947 | 0.984 | 1 | 1 | 2 |
| ONT | diff | The UCSC CGL and Google Health | 0.983 | 0.976 | 0.988 | 1 | 1 | 2 |
| ONT | MHC | Wang Genomics Lab | 0.972 | 0.964 | 0.980 | 3 | 3 | 1 |
One winner was selected for each technology/genomic region combination, and multiple winners were awarded in the case of ties. Winners were selected based on submission’s F1 score for the semi-blinded samples, HG003 and HG004 (harmonic mean of the parents’ F1 scores for combined SNVs and INDELs). Overall submission rank for all three genomic categories indicates submission overall performance: all, all benchmark regions; diff, difficult-to-map regions.
Tie.
Figure 3.Submission performance comparison for F1 metric between MHC, all benchmark regions, and difficult-to-map regions
F1 is plotted on a phred scale with axis labels and ticks indicating F1 percentage values. Points above the diagonal black line perform better in MHC relative to all benchmark regions or the difficult-to-map regions. Submissions with the largest difference in performance between MHC and difficult-to-map or all benchmark regions for each subplot are labeled. Seven Bridges is a graph-based short-read variant caller. ONT ensemble is an ensemble of ONT variant callers; NanoCaller, Clair, and Medaka. PEPPER-DV is the ONT PEPPER-DeepVariant haplotype-aware ML variant-calling pipeline.
Figure 4.Performance comparisons by sample, benchmark version, and challenges
Ratio of error rates using semi-blinded parents’ benchmark versus public son’s (HG002) benchmark.
(A) Submissions ranked by error-rate ratio.
(B) Comparison of error-rate ratio with the overall performance for the parents (F1 in all benchmarking regions, as defined in Equation 1). Error rate defined as 1 – F1. F1 is plotted on a phred scale with axis labels and ticks indicating F1 percentage values.
(C and D) Comparison of benchmarking performance for (C) different benchmark sets and (D) challenges. (C) The 2016 (V1) Truth Challenge top performers F1 performance metric for SNVs and INDELs benchmarked against the V3.2 benchmark set (used to evaluate the first challenge) and V4.2 benchmark set (used to evaluate the second challenge). Performance metrics for the same variant calls decrease substantially versus the V4.2 benchmark set because it includes more challenging regions. (D) Performance of V1 challenge top performers (using 50X Illumina sequencing) compared with V2 submissions (using only 35X Illumina sequencing) for the harmonic mean of the parents’ F1 scores for combined SNVs and INDELs and the V4.2 benchmark set used to evaluate the second truth challenge. The black horizontal lines represent the performance for the overall top performer, regardless of technology used, for each stratification. For the first challenge, variant call sets for the blinded HG002 against GRCh37 were used to evaluate performance, and, for the second challenge, variant calls for the semi-blinded HG003 and HG004 against GRCh38 were used to evaluate performance. F1 is plotted on a phred scale with axes labels and ticks indicating F1 percentage values.
Figure 5.Comparison of ONT PEPPER-DeepVariant variant call set performance with Illumina DeepVariant by genomic context
F1 is plotted on a phred scale with axis labels and ticks indicating F1 percentage values. Points above and below the diagonal line indicate stratifications where ONT PEPPER-DeepVariant submission performance metric was higher than the Illumina DeepVariant submission. The points are colored by stratification category.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| PacBio HiFi/CCS Sequel II | Wagner et al.[ | SRA: SRX7083054 to SRA: SRX7083057, SRA: SRX8136474 to SRA: SRX8136477, SRA: SRX8137018 to SRA: SRX8137021 |
|
|
| |
| Experimental models: Cell lines | ||
| Son of Ashkenazi Jewish ancestry (HG002) | NIST Office of Reference Materials; Coriell/NIGMS; PGP | NIST RM8391/RM8392; GM24385; RRID:CVCL_1C78 |
| Father of Ashkenazi Jewish ancestry (HG003) | NIST Office of Reference Materials; Coriell/NIGMS; PGP | NIST RM8392; GM24149; RRID:CVCL_1C54 |
| Mother of Ashkenazi Jewish ancestry (HG004) | NIST Office of Reference Materials; Coriell/NIGMS; PGP | NIST RM8392; GM24143; RRID:CVCL_1C48 |
| Software and algorithms | ||
| hap.py |
| |
| seqtk |
| |
| Code used to analyze challenge results and benchmarking results files | This paper |
|
| Other | ||
| Sequence data, analyses, and resources related to the NIST Genome in a Bottle Consortium samples in this manuscript | This paper |
|
| GIAB stratifications used for benchmarking | This paper | |
Summary table of the V2.0 GIAB genome stratifications
| Stratification Group | Description | # Strats | Example Stratifications | Useful for |
|---|---|---|---|---|
| FunctionalRegions | Coding regions | 2 | CDS, not in CDS | Evaluating performance in coding regions more likely to be functional |
| GC-content | Various ranges of GC-content | 14 | GC < 25%; 30% < GC < 55% | identifying GC bias in variant calling performance |
| Low Complexity | 22 | evaluating performance in locally repetitive, difficult to sequence contexts | ||
| Homopolymers | Identification of homopolymers by length | 4 | Homopolymers >101 bp; imperfect homopolymers >10 bp | evaluating performance in homopolymers, where systematic sequencing errors and complex variants frequently occur |
| Simple Repeats | Di, tri, and quad-nucleotide repeats of different lengths | 9 | Di-nucleotide repeats 11–50 bp; di-nucleotide repeats >200 bp | evaluating performance in exact Short Tandem Repeats where systematic sequencing errors and complex variants frequently occur, and variant calls are challenging if the read length is insufficient to traverse the entire repeat |
| Tandem Repeats | Tandem repeats of different lengths | 5 | Tandem repeats between 51 and 200 bp; tandem repeats >10 kb | evaluating performance in exact Short Tandem Repeats and Variable Number Tandem Repeats where systematic sequencing errors and complex variants frequently occur, and variant calls are challenging if the read length is insufficient to traverse the entire repeat |
| Other Difficult | Various difficult regions of the genome | 6 | MHC; VDJ | evaluating performance in or excluding regions where variants are difficult to call and represent due to limitations of the reference genome (e.g. gaps or errors) or being highly polymorphic in the population (MHC). |
| Segmental Duplications | Segmental duplications defined using multiple methods and limited to segdups >10kb | 9 | Segdups >10 kb; selfChain | Regions with multiple similar copies in the reference, making them challenging to map and assemble. |
| Genome Specific | Difficult regions of the genome specific to one or more of the GIAB genomes. Including but not limited to complex variants, copy number variants, and structural variants. | 65 | CNVs, complex variants | evaluating performance in or excluding regions in each GIAB reference sample where small variants can be challenging to call (e.g., complex variants) or represent (e.g., CNVs and SVs) |
The updated stratification set includes the union of multiple stratifications as well as “not in” stratifications, which are useful in evaluating performance outside specific difficult genomic contexts.