| Literature DB >> 32094380 |
Reuben J Pengelly1, Daniel Ward2, David Hunt3, Christopher Mattocks2, Sarah Ennis4.
Abstract
Next generation sequencing has disrupted genetic testing, allowing far more scope in the tests applied. The appropriate sections of the genome to be tested can now be readily selected, from single mutations to whole-genome sequencing. One product offering within this spectrum are focused exomes, targeting ~5,000 genes know to be implicated in human disease. These are designed to offer a flexible platform offering high diagnostic yield with a reduction in sequencing requirement compared to whole exome sequencing. Here, we have undertaken sequencing of control DNA samples and compare two kits, the Illumina TruSight One and the Agilent SureSelect Focused Exome. Characteristics of the kits are comprehensively evaluated. Despite the larger design region of the Agilent kit, we find that the Illumina kit performs better in terms of gene coverage, as well as coverage of clinically relevant loci. We provide exhaustive coverage statistics for each kit to aid the assessment of their suitability and provide read data for control DNA samples to allow for bioinformatic benchmarking by users developing pipelines for these data.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32094380 PMCID: PMC7039898 DOI: 10.1038/s41598-020-60215-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary coverage statistics for triplicate sequenced samples.
| SSFE | TSO | ||||
|---|---|---|---|---|---|
| Mean | CVa (%) | Mean | CV (%) | ||
| Read generated | 86,509,698 | 8.1 | 80,712,320 | 17.7 | |
| Reads mapped | 84,816,034 | 8.0 | 74,590,089 | 18.0 | |
| Duplicate reads | 21,918,522 | 14.0 | 19,674,723 | 24.0 | |
| Reads mapped to target | 46,466,309 | 5.9 | 38,474,359 | 15.6 | |
| Reads mapped to target ± 150 bp | 61,283,075 | 6.0 | 48,581,312 | 15.9 | |
| Mean coverage of target | 211 | 5.8 | 317 | 15.2 | |
| Targetb size (bp) | 17,846,036 | — | 11,946,514 | — | |
| Targetb covered (%) with read depth≥: | 1 X | 100 | 0.1 | 99 | |
| 5 X | 100 | 0.1 | 99 | 0.1 | |
| 10 X | 99 | 0.1 | 98 | 0.2 | |
| 20 X | 98 | 0.3 | 98 | 0.3 | |
| 30 X | 97 | 0.5 | 97 | 0.4 | |
| 50 X | 92 | 1.1 | 96 | 0.8 | |
| 100 X | 76 | 2.8 | 91 | 3.0 | |
aCoefficient of variation.
bTarget regions as defined by vendor target BED file.
Figure 1Euler diagram showing size (Mb) of regions as defined by the vendor provided target locations (A) and RefSeq transcripts covered to ≥ 20 X in our data without regard for vendor defined targets (B).
Figure 2Capture biases for the two kits. The degree of GC bias seen is greater for the SSFE data (r2 = 0.56 vs. r2 = 0.09 for SSFE and TSO respectively, p < 2.2×10−16 for both, Pearson’s correlation).
Region of interest coverage.
| SSFE | TSO | ||
|---|---|---|---|
| Genes listed | 5,576 | 4,663 | |
| Genes with proportion covered | > 0.99 | 3,774 | 3,970 |
| 0.95 < x ≤ 0.99 | 597 | 368 | |
| 0.90 < x ≤ 0.95 | 349 | 151 | |
| 0.80 < x ≤ 0.90 | 237 | 117 | |
| HGMDa | Disease causing (133,378 total) | 125,000 | 128,784 |
aHuman Gene Mutation Database release 2015[23]
Figure 3Proportion of genes covered to varying levels in downsampled datasets representing 12–48 samples being included in a single sequencing run. A reduction in the number of genes covered to a high level can be seen with the data down-sampling.
Sensitivity and precision of variant detection for the two capture kits.
| SSFE | TSO | ||||
|---|---|---|---|---|---|
| Mean | CV | Mean | CV | ||
| SNP | Count | 10,883 | — | 5,798 | — |
| Sensitivity | 0.995 | 0.08% | 0.996 | 0.04% | |
| Precision | 0.994 | 0.05% | 0.997 | 0.01% | |
| Indel | Count | 626 | — | 178 | — |
| Sensitivity | 0.791 | 0.71% | 0.757 | 0.43% | |
| Precision | 0.869 | 0.48% | 0.766 | 2.72% | |
Figure 4Overview of laboratory processes for the two capture kits. For SSFE physical fragmentation is followed by DNA fragment repair and ligation of adapter sequences; hybridisation of patient DNA with the baits and pulldown is then performed, followed by library indexing and pooling[19]. For TSO, combined fragmentation and adapter ligation is performed enzymatically, followed by sample amplification, indexing and pooling. Two iterations of bait hybridisation and pulldown are then performed, with a final pooled library amplification[18].