| Literature DB >> 25973577 |
Stefan H Lelieveld1, Malte Spielmann2,3, Stefan Mundlos2,3, Joris A Veltman1,4, Christian Gilissen1.
Abstract
For next-generation sequencing technologies, sufficient base-pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole-genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole-exome sequencing (WES) platforms, and compared single-base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x-160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87-fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose.Entities:
Keywords: coverage; exome sequencing; genome sequencing
Mesh:
Year: 2015 PMID: 25973577 PMCID: PMC4755152 DOI: 10.1002/humu.22813
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Overview of Tested Datasets, Average Coverage, Used Sequencing Systems, and Enrichment Kits
| Sequencing platform | Enrichment/library | Average exome coverage | Coverage range | # Samples |
| Illumina HiSeq | Agilent SureSelect V4 | 77.92 | 70–90 | 12 |
| Illumina HiSeq | Agilent SureSelect V4 | 159.92 | 151–170 | 12 |
| Illumina HiSeq | Agilent SureSelect V5 | 100.17 | 81–117 | 12 |
| Illumina HiSeq | NimbleGen SeqCap V3 | 94.50 | 92–97 | 12 |
| Complete Genomics | Whole genome | 44.17 | 41–48 | 12 |
| Complete Genomics | Whole genome | 87.42 | 83–95 | 12 |
| Illumina HiSeq | Whole genome | 28.09 | 26–30 | 11 |
| Illumina HiSeq | Whole genome | 56.20a | 56–57 | 5 |
| Illumina X Ten | Whole genome | 39.58 | 30–47 | 12 |
Columns depict (from left to right) the sequencing platform that was used; the exome enrichment kit or library preparation that were used; the average coverage across the RefSeq exome; the range of coverage; the number of samples used in the analysis.
aFor comparison, the 28.09x genomes sequenced on the Illumina HiSeq system are merged to resemble five samples sequenced to 56.20x coverage.
Figure 1Coverage of the Ensembl and RefSeq annotated protein‐coding regions and (full) coverage of 2,759 clinically relevant OMIM+ genes. A: The percentage of base pairs of Ensembl (in yellow) and RefSeq (in blue) annotated protein‐coding regions covered by at least 20 reads for the tested platforms. B: Percentage of base pairs covered by at least 20 reads for the longest OMIM+ transcripts (in green). The red bars depict the percentage of the (longest) OMIM+ transcript base pairs that are fully covered by at least 20 reads.
Figure 2Overview of 56 genes and the percentage of coding bases not covered at 20x. The boxplots depicting the percentage of bases not covered by at least 20x reads. For each of the 56 ACMG‐recommended genes, the coverage of the longest RefSeq transcript was analyzed. A: Shows the performance of all tested exome capture libraries. B: Shows the performance of the tested WGS platforms.
Figure 3Assessments of three different sequence coverage biases. A: Evenness scores (a measure for uniform read mapping) for the different platforms based on the RefSeq annotated protein‐coding regions. B: The difference in average coverage at 20x of WGS libraries for RefSeq transcripts grouped by strand. The symbols + and – indicate average coverage level at the plus and minus strand. C: Idem for WES libraries. D: Density plot of allele ratio distribution for heterozygous SNPs. The green line depicts the ideal heterozygous allele ratio of 0.5.