| Literature DB >> 35869426 |
Chaoyang Li1, Xue Fan2, Xin Guo3, Yongfeng Liu1, Miao Wang1, Xiao Chao Zhao1, Ping Wu1, Qin Yan1, Lei Sun4.
Abstract
BACKGROUND: GenoLab M is a recently developed next-generation sequencing (NGS) platform from GeneMind Biosciences. To establish the performance of GenoLab M, we present the first report to benchmark and compare the WGS and WES sequencing data of the GenoLab M sequencer to NovaSeq 6000 and NextSeq 550 platform in various types of analysis. For WGS, thirty-fold sequencing from Illumina NovaSeq platform and processed by GATK pipeline is currently considered as the golden standard. Thus this dataset is generated as a benchmark reference in this study.Entities:
Keywords: GenoLab M; NA12878; Nextseq 550; NovaSeq 6000; WES; WGS
Mesh:
Year: 2022 PMID: 35869426 PMCID: PMC9308344 DOI: 10.1186/s12864-022-08775-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Fig. 1The flowchart of combinations using three sequencers and two variant calling pipelines for germline variants. Key process for NGS data generation and analysis were shown on the left. Squares in the flowchart represent data files, and rhombus indicate processes. NovaSeq means NovaSeq 6000, NextSeq means NextSeq 550
Statistics of the multiple sequencing datasets in our study
| Samples | Library Type | Sequencing Platform | Read (M) | Bases (Gb) | Duplication rate (%) | >Q20 | >Q30 | Alignment rate (%) | Mean coverage (X) | %_bases_above_15x |
|---|---|---|---|---|---|---|---|---|---|---|
| GL_WGS_22 | WGS | GenoLab M | 442.77 | 66.42 | 1.73% | 95.35% | 88.26% | 99.88% | 22.39 | 81.30% |
| GL_WGS_33 | WGS | GenoLab M | 662.66 | 99.40 | 1.93% | 95.22% | 87.99% | 99.88% | 33.50 | 93.90% |
| NA_WGS_22 | WGS | NovaSeq 6000 | 424.9 | 63.73 | 3.57% | 95.92% | 90.05% | 99.64% | 21.37 | 87.30% |
| NA_WGS_33 | WGS | NovaSeq 6000 | 655.83 | 98.38 | 5.32% | 95.92% | 90.05% | 99.64% | 32.99 | 97.70% |
| GL_WES_100 | WES Agilent V8 | GenoLab M | 41.87 | 6.28 | 6.00% | 93.95% | 84.71% | 99.95% | 112.42 | 98.00% |
| GL_WES_raw | WES Agilent V8 | GenoLab M | 70.36 | 10.55 | 9.71% | 93.95% | 84.71% | 99.95% | 188.90 | 99.00% |
| NA_WES_100 | WES Agilent V8 | NovaSeq 6000 | 39.35 | 5.90 | 14.85% | 98.01% | 94.05% | 99.95% | 107.72 | 99.30% |
| NA_WES_raw | WES Agilent V8 | NovaSeq 6000 | 81.16 | 12.17 | 26.78% | 98.01% | 94.05% | 99.95% | 222.19 | 99.60% |
| NT_WES_100 | WES Agilent V8 | NextSeq 550 | 37.54 | 5.56 | 5.67% | 86.62% | 79.06% | 99.83% | 101.13 | 99.30% |
| NT_WES_raw | WES Agilent V8 | NextSeq 550 | 131.76 | 19.50 | 17.54% | 86.61% | 79.06% | 99.83% | 354.92 | 99.60% |
Fig. 2Comparison of variants calling performances in GenoLab M and NovaSeq 6000 from 33X and 22X coverage of the NA12878 sample. A SNP and B InDel on whole genome, C SNP and D InDel F-score on stratification region. Precision, positive predictive value, is the fraction of relevant instances among the retrieved instances, Recall, sensitivity is the fraction of relevant instances that were retrieved. F-score is the harmonic mean of the precision and recall, chr 20 means chromosome 20, NIADR means Not in all Difficult Regions, SDR means Segmental Duplications Regions
Fig. 3Venn diagram of variants calling performances in WGS datasets. A SNP and B InDel
Fig. 4Comparison of variants calling performances in six WES datasets..A SNP and B InDel. Precision, positive predictive value, is the fraction of relevant instances among the retrieved instances, Recall, sensitivity is the fraction of relevant instances that were retrieved. F-score is the harmonic mean of the precision and recall
Fig. 5Upset diagram of variant Calling results of all combinations in WES datasets. A SNP and B InDel