| Literature DB >> 30909888 |
Yu Xu1, Zhe Lin1, Chong Tang1, Yujing Tang1, Yue Cai1, Hongbin Zhong1, Xuebin Wang1, Wenwei Zhang2,3, Chongjun Xu4, Jingjing Wang2,3, Jian Wang2,5, Huanming Yang2,5, Linfeng Yang6, Qiang Gao7.
Abstract
BACKGROUND: Whole exome sequencing (WES) has been widely used in human genetics research. BGISEQ-500 is a recently established next-generation sequencing platform. However, the performance of BGISEQ-500 on WES is not well studied. In this study, we evaluated the performance of BGISEQ-500 on WES by side-to-side comparison with Hiseq4000, on well-characterized human sample NA12878.Entities:
Keywords: BGISEQ-500; Variation detection; Whole exome sequencing
Mesh:
Year: 2019 PMID: 30909888 PMCID: PMC6434795 DOI: 10.1186/s12859-019-2751-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data production
| Hiseq-1 | Hiseq-2 | Hiseq-3 | Hiseq-4 | BGISEQ-1 | BGISEQ-2 | BGISEQ-3 | BGISEQ-4 | |
|---|---|---|---|---|---|---|---|---|
| Read length | PE150a | PE150 | PE150 | PE150 | PE100 | PE100 | PE100 | PE100 |
| Raw data/Gb | 10.04 | 9.83 | 9.78 | 9.00 | 7.81 | 7.90 | 7.65 | 7.56 |
| Mean depth | 99.77 | 99.78 | 100.37 | 99.66 | 102.27 | 101.88 | 101.92 | 101.77 |
| Bases on targetb (%) | 56.02 | 56.84 | 58.79 | 62.43 | 71.54 | 70.22 | 72.65 | 73.20 |
| Reads on target (%) | 70.85 | 71.32 | 71.70 | 70.31 | 83.36 | 82.14 | 83.18 | 84.15 |
| Duplication rate | 7.12 | 6.59 | 8.79 | 7.92 | 7.10 | 7.00 | 7.18 | 6.75 |
| Coverage (%) | 99.74 | 99.66 | 99.72 | 99.75 | 99.82 | 99.83 | 99.82 | 99.82 |
| 4x coverage (%) | 99.51 | 99.37 | 99.43 | 99.49 | 99.63 | 99.64 | 99.62 | 99.61 |
| 10x coverage (%) | 98.89 | 98.67 | 98.60 | 98.74 | 98.89 | 98.93 | 98.85 | 98.84 |
| 20x coverage (%) | 97.10 | 96.80 | 96.11 | 96.16 | 96.29 | 96.30 | 96.01 | 96.18 |
aPE, pair-end
bBases aligned on target region/raw data amount
Fig. 1Cumulative depth distribution. The cumulative frequency is the fraction of target regions covered by given depth or higher
Variation detection and annotation
| Hiseq-1 | Hiseq-2 | Hiseq-3 | Hiseq-4 | BGISEQ-1 | BGISEQ-2 | BGISEQ-3 | BGISEQ-4 | ||
|---|---|---|---|---|---|---|---|---|---|
| SNV | total number a | 41,554 | 41,506 | 41,627 | 41,540 | 41,264 | 41,294 | 41,292 | 41,172 |
| found in dbSNP(%) | 99.75 | 99.73 | 99.70 | 99.78 | 99.74 | 99.76 | 99.76 | 99.80 | |
| homozygous | 15,741 | 15,723 | 15,723 | 15,758 | 15,671 | 15,666 | 15,692 | 15,642 | |
| heterozygous | 25,813 | 25,783 | 25,904 | 25,782 | 25,593 | 25,628 | 25,600 | 25,530 | |
| Ti/Tv | 2.56 | 2.57 | 2.56 | 2.56 | 2.56 | 2.56 | 2.56 | 2.56 | |
| het/hom b | 1.64 | 1.64 | 1.65 | 1.64 | 1.63 | 1.64 | 1.63 | 1.63 | |
| intronic variations c | 17,298 | 17,351 | 17,321 | 17,288 | 17,217 | 17,232 | 17,220 | 17,229 | |
| exonic variations | 21,540 | 21,497 | 21,561 | 21,533 | 21,408 | 21,407 | 21,433 | 21,322 | |
| coding variations | 19,353 | 19,332 | 19,354 | 19,364 | 19,269 | 19,273 | 19,291 | 19,210 | |
| nonsynonmous | 9446 | 9439 | 9437 | 9466 | 9393 | 9391 | 9400 | 9343 | |
| Ti/Tv on exome | 3.09 | 3.10 | 3.08 | 3.09 | 3.08 | 3.08 | 3.08 | 3.09 | |
| het/hom on exome | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | 1.52 | |
| indel | total number | 3461 | 3436 | 3470 | 3445 | 3503 | 3559 | 3506 | 3538 |
| found in dbSNP(%) | 94.42 | 95.08 | 94.55 | 94.83 | 94.78 | 94.44 | 94.69 | 94.04 | |
| homozygous | 1491 | 1491 | 1492 | 1502 | 1433 | 1444 | 1432 | 1420 | |
| heterozygous | 1970 | 1945 | 1978 | 1943 | 2070 | 2115 | 2074 | 2118 | |
| het/hom | 1.32 | 1.30 | 1.33 | 1.29 | 1.44 | 1.46 | 1.45 | 1.49 | |
| intronic variations | 2493 | 2493 | 2518 | 2496 | 2558 | 2585 | 2553 | 2581 | |
| exonic variations | 703 | 689 | 702 | 694 | 705 | 706 | 703 | 702 | |
| coding variations | 460 | 465 | 459 | 464 | 473 | 477 | 468 | 473 | |
| het/hom on exome | 1.12 | 1.11 | 1.11 | 1.11 | 1.17 | 1.21 | 1.14 | 1.17 |
aOnly variants on target region were used in these statistics
bhet/hom, heterozygous to homozygous variation ratio
cVariations located at splicing sites are considered as nonsynonymous and not count as intronic
Fig. 2Indel length distribution. Deletions are shown as negative length whereas insertions are shown as positive. The fraction of insertions and deletions sum up to 1 separately. All datasets showed similar length distribution
Fig. 3Concordance of variation detection. The Jaccard similarity for variation detection results from datasets was calculated for SNV (top-left triangle) and indel (bottom-right triangle) separately. SNV detection showed excellent intra- and inter-platform concordance, while indel detection showed inferior concordance. The inter-platform concordance is slightly lower than intra-platform concordance
Fig. 4Variation detection accuracy versus sequence depth. Raw data were down-sampled to 20x, 30x, 50x, 70x, 100x, and 150x to generate this plot. Variations on the high confidence regions from the genome in the bottle project were used as the reference
Variation accuracy estimation by comparison with GIAB
| detected variants | GIAB-specific variations | Sensitivity (%) | Precision (%)a | F-measure (%)b | ||||
|---|---|---|---|---|---|---|---|---|
| total | in GIAB | not in GIAB | ||||||
| SNV | Hiseq-1 | 35,051 | 34,851 | 200 | 359 | 98.98 | 99.43 | 99.20 |
| Hiseq-2 | 35,026 | 34,793 | 233 | 417 | 98.82 | 99.33 | 99.07 | |
| Hiseq-3 | 35,030 | 34,821 | 209 | 389 | 98.90 | 99.40 | 99.15 | |
| Hiseq-4 | 35,037 | 34,842 | 195 | 368 | 98.95 | 99.44 | 99.20 | |
| BGISEQ-1 | 35,073 | 34,883 | 190 | 327 | 99.07 | 99.46 | 99.26 | |
| BGISEQ-2 | 35,069 | 34,876 | 193 | 334 | 99.05 | 99.45 | 99.25 | |
| BGISEQ-3 | 35,071 | 34,886 | 185 | 324 | 99.08 | 99.47 | 99.28 | |
| BGISEQ-4 | 35,048 | 34,881 | 167 | 329 | 99.07 | 99.52 | 99.29 | |
| Hiseq-1 PE100 | 35,110 | 34,905 | 205 | 305 | 99.13 | 99.42 | 99.27 | |
| Hiseq-2 PE100 | 35,056 | 34,855 | 201 | 355 | 98.99 | 99.43 | 99.21 | |
| Hiseq-3 PE100 | 35,174 | 34,880 | 294 | 330 | 99.06 | 99.16 | 99.11 | |
| Hiseq-4 PE100 | 35,143 | 34,897 | 246 | 313 | 99.11 | 99.30 | 99.21 | |
| indel | Hiseq-1 | 2501 | 2453 | 48 | 197 | 92.57 | 98.08 | 95.24 |
| Hiseq-2 | 2493 | 2454 | 39 | 196 | 92.60 | 98.44 | 95.43 | |
| Hiseq-3 | 2507 | 2457 | 50 | 193 | 92.72 | 98.01 | 95.29 | |
| Hiseq-4 | 2508 | 2453 | 55 | 197 | 92.57 | 97.81 | 95.11 | |
| BGISEQ-1 | 2542 | 2480 | 62 | 170 | 93.58 | 97.56 | 95.53 | |
| BGISEQ-2 | 2571 | 2498 | 73 | 152 | 94.26 | 97.16 | 95.69 | |
| BGISEQ-3 | 2553 | 2478 | 75 | 172 | 93.51 | 97.06 | 95.25 | |
| BGISEQ-4 | 2564 | 2488 | 76 | 162 | 93.89 | 97.04 | 95.44 | |
| Hiseq-1 PE100 | 2538 | 2498 | 40 | 152 | 94.26 | 98.42 | 96.30 | |
| Hiseq-2 PE100 | 2512 | 2471 | 41 | 179 | 93.25 | 98.37 | 95.74 | |
| Hiseq-3 PE100 | 2541 | 2474 | 67 | 176 | 93.36 | 97.36 | 95.32 | |
| Hiseq-4 PE100 | 2523 | 2484 | 39 | 166 | 93.74 | 98.45 | 96.04 | |
aPrecision = true positive/(true positive + false positive). Precision instead of specificity was used because true negative dominate the region thus specificity is very close to 1
bF-measure is the harmonic average of the sensitivity and precision. It combines sensitivity and precision in a single measurement